15

I would really appreciate some help understanding this Apache behaviour.

I am communicating to PHP from an iPhone Objective-C app in application/json. Gzip compression is enabled on the server, and requested by the client.

From my .htaccess:

AddOutputFilterByType DEFLATE text/html text/plain text/xml application/x-httpd-php application/json

For small requests, Apache is setting the 'Content-Length' header. For example (these values are output in Objective-C from the header):

Connection = "Keep-Alive";
"Content-Encoding" = gzip;
"Content-Length" = 185;     <-------------
"Content-Type" = "application/json";
Date = "Wed, 22 Sep 2010 12:20:27 GMT";
"Keep-Alive" = "timeout=3, max=149";
Server = Apache;
Vary = "Accept-Encoding";
"X-Powered-By" = "PHP/5.2.13";
"X-Uncompressed-Content-Length" = 217;

X-Uncompressed-Content-Length is a header I am adding set to the size of the uncompressed JSON string.

As you can see, this request is very small (217 bytes).

Here's the headers from a larger request (282888 bytes):

Connection = "Keep-Alive";
"Content-Encoding" = gzip;
"Content-Type" = "application/json";
Date = "Wed, 22 Sep 2010 12:20:29 GMT";
"Keep-Alive" = "timeout=3, max=148";
Server = Apache;
"Transfer-Encoding" = Identity;
Vary = "Accept-Encoding";
"X-Powered-By" = "PHP/5.2.13";
"X-Uncompressed-Content-Length" = 282888;

Notice that Content-Length is not given.

My questions:

  1. Why doesn't Apache send the Content-Length for the larger request?
  2. Does the fact that 'Contend-Encoding=gzip' is set mean that gzip compression is still working on the larger request, even though I can't verify the size difference?
  3. Is there a way I can get Apache to include the actual Content-Length for these larger requests to more accurately report the data usage to the users?

This app can be used on data plans that are expensive, hence my desire to report the actual usage to the user, not 30-70% inflated usage (a few hundred extra KB may not sound like much – but these plans can cost between $1 and $10 per MB!).

Thanks in advance.

3 Answers 3

14

Addition to Martin Fjordvalds answer:

Apache uses chunked encoding only if the compressed file size is larger than the DeflateBufferSize. Increasing this buffer size will therefore prevent the server using chunked encoding also for larger files, causing the Content-Length to be sent even for zipped data.

More Information is available here: http://httpd.apache.org/docs/2.2/mod/mod_deflate.html#deflatebuffersize

1
  • Nice one. This is probably the fastest way to solve this problem. If anyone needs a higher level of customisation (e.g. chunk some requests, not others), see my answer serverfault.com/a/183856/54957 for a manual solution. Commented Jul 18, 2013 at 4:01
7

Sounds like Apache is doing chunked encoding, this means it can send the data as it's being gzipped rather than waiting for the full response to be gzipped. It's fairly standard practice, I'm not familiar enough with Apache to say if it can be disabled, though.

4
  • Thanks for the info, you pointed me in the right direction, and I solved it. Commented Sep 23, 2010 at 4:16
  • Accepted. For anyone reading this question though – please read my answer for a detailed solution. Basically, you can avoid chunking (and thus the zero content-length) by buffering and compressing the reply manually. Commented Sep 23, 2010 at 6:23
  • It's a little confusing that the accepted answer isn't the answer to the original question, but rather something that helped you get it. Maybe you should accept the answer you posted below to make things a little more clear.
    – redbmk
    Commented Mar 19, 2013 at 3:12
  • @redbmk fair point, I just didn't want to seem ungrateful. Philippe actually has the perfect simple fix for this, so I've accepted his over mine. Commented Jul 18, 2013 at 4:04
5

OK, I managed to solve this. As Martin F correctly points out, Apache is chunking the reply so the content size is not known. For many people this is desirable (page loads faster). This comes at a cost of not being able to report the download progress.

For those like me who really want to report the download progress, if you use Apache or PHP's automatic gzip support, there is little you can do. The solution is to do it manually. It's easier than it sounds:

If you're sending whole files, then this is a great example in PHP to force a single chunk (with the Content-Length): http://www.php.net/manual/en/function.ob-start.php#94741

If you're sending generated data, then use gzencode to encode your data, like in the above sample. A pre-requisite is that all your output data is stored in a variable (you can use ob_start to help this if you need to buffer, then get contents of buffer).

        // $replyBody is the entire contents of your reply

        header("Content-Type: application/json");  // or whatever yours is

        // checks if gzip is supported by client
        $pack = true;
        if(empty($_SERVER["HTTP_ACCEPT_ENCODING"]) || strpos($_SERVER["HTTP_ACCEPT_ENCODING"], 'gzip') === false)
        {
            $pack = false;
        }

        // if supported, gzips data
        if($pack) {
            header("Content-Encoding: gzip");
            $replyBody = gzencode($replyBody, 9, FORCE_GZIP);
        }

        // compressed or not, sets the Content-Length           
        header("Content-Length: " . mb_strlen($replyBody, 'latin1'));

        // outputs reply & exits
        echo $replyBody;
        exit;

And voila!

Another great benefit of doing it yourself is that you can set the compression level. This is great for my mobile application, as I can set to the highest compression level (so my users pay less for data!) – whereas the server probably only uses a medium compression level for a better CPU/size trade-off. Compression levels are something I believe you can only change if you can edit the httpd.conf (which on shared hosting, I can't).

So I've kept my DEFLATE .htaccess directive for everything but my application/json replies which I now encode in the above way.

Thanks again Martin F, you gave me the spark I needed to solve this :)

2
  • 1
    Incidentally, the savings with JSON data (with heavily repeated keys) are huge, 77% reduction in one case. That's a big deal at $1 per MB... Commented Sep 23, 2010 at 4:32
  • 1
    You should probably just use strlen($replyBody) instead of mb_strlen($replyBody, 'latin1'). The content-length is just the number of bytes (not characters), which is what strlen() gives you. Using mb_strlen() with 'latin1' sort of works since latin1 characters are always 8 bits, but it may have issues with encodings that produce bytes that aren't valid latin1 characters.
    – orrd
    Commented Sep 8, 2015 at 20:54

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .