1

I'm using curl to fetch a webpage, I need to detect if the response is gzip or not.

This works perfectly fine if Content-Encoding is specified in the response headers, but some servers instead return "Transfer-Encoding": "Chunked" and no Content-Encoding header.

Is there any way to detect gzip or get the raw (encoded) server response?

I tried looking at curl_getinfo but the content_encoding isn't specified either.

Thanks.

3 Answers 3

2

You can check if response starts with gzip magic numbers, specifically 1f 8b.

1

Is there any way to detect gzip

Yes. You can use cURLs Header functions. For example you can define an function, which handles the header responses. Use curl_setopt()with the CURLOPT_HEADERFUNCTION option. Or write it to an file (which you have created with fopen()) with the CURLOPT_WRITEHEADER option.

There may are more options you could use. Look out the possibilities at the curl_setopt() manual. The header you are looking for have the name: Content-Encoding.

If you have the output in a file, you could also use PHPs finfo with some of its predefined constants. Or mime_content_type() (DEPRECATED!) if finfo is not available to you.

[...] or get the raw (encoded) server response?

Yes. You can specify the accept-encoding header. The value you are look for is identity. So you can send:

Accept-Encoding: identity

May have look to the HTTP/1.1 RFC To get an unencoded/uncompressed output (for example to directly write it into a file). Use CURLOPT_ENCODING for this purpose. You can set it also with curl_setopt.

1

You can either issue a separate HEAD request:

CURLOPT_HEADER => true
CURLOPT_NOBODY => true

Or request the header to be prefixed to your original request:

CURLOPT_HEADER => true

But, if you just want to get the (decoded) HTML, you can use:

CURLOPT_ENCODING => ''

And CURL will automatically negotiate with the server and decode it for you.