5

This is the opposite of the issue that all my searches kept coming up with answers to, where people wanted plain text, but got compressed.

I'm writing a bash script that uses curl to fetch the mailing list archive files from a Mailman mailing list (using the standard Mailman web interface on the server end).

The file (for this month) is http://lists.example.com/private.cgi/listname-domain.com/2013-September.txt.gz (sanitized URL).

When I save this with my browser I get, in fact, a gzipped text file, which when ungzipped contains what I expect.

When I fetch it with Curl (after previously sending the login password and getting a cookie set, and saving that cookie file to use in the request), though, what comes out stdout (or is saved to a -o file) is the UNCOMPRESSED text.

How can I get Curl to just save the data into a file like my browser does? (Note that I am not using the --compressed flag in my Curl call; this isn't a question of the server compressing data for transmission, it's a question of downloading a file that's compressed on the server disk and I want to keep it compressed.)

(Obviously I can hack around this by re-compressing it in my bash script. Waste of CPU resources, and a problem waiting to happen in the future, though. Or I can leave it uncompressed, and hack the name and store it as just September.txt; that wastes disk space instead. Again, that would break if the behavior changed in the future, though. The problem seems to me to be that Curl is getting confused between compressed transmittal, and and actual compressed data.)

8
  • Show your HTTP response headers. Commented Oct 1, 2013 at 5:04
  • Are you specifying --tr-encoding?
    – devnull
    Commented Oct 1, 2013 at 5:06
  • How are you verifying that the file has been ungzipped? (I know that sounds like a weird question, but if the answer is "I looked at it", then with what tool?) (Or, to be less mysterious, if you looked at the file with less, try less -L)
    – rici
    Commented Oct 1, 2013 at 5:12
  • Also the exact command you are using. Commented Oct 1, 2013 at 5:43
  • possible duplicate of How to properly handle a gzipped page when using curl?
    – Jayan
    Commented Oct 1, 2013 at 8:50

2 Answers 2

5

Is it possible the server is decompressing the file based on headers sent (or not sent) by curl? Try the following header with curl:

--header 'Accept-Encoding: gzip,deflate'
3

You can download the *.txt.gz directly, without any uncompressing, with 'wget' instead of 'curl'.

wget http://lists.example.com/private.cgi/listname-domain.com/2013-September.txt.gz

If curl is essential, then check out the details here

Not the answer you're looking for? Browse other questions tagged or ask your own question.