2

Proxy servers are intended to make browsing the web faster and reduce traffic by storing copies of HTTP objects, correct? But when a browser sends an HTTP request to a proxy server and the proxy server has a copy, the proxy server will still send an HTTP request to the original server with an if modified since header before sending the copy to the browser. If the object hasn't changed the original server will simply send an HTTP reply with status 304 reducing the transmission time a little bit, since no additional data is included in the HTTP response. However, the proxy server still has to initiate a TCP connection and await the server's response, which adds up to two RTTs. On top of that, before actually initiating the TCP connection the proxy server also has to check its memory if there is a cached copy of the HTTP object the client asked for. I can't see how this possibly makes web browsing faster, all we save in the best case is a little bit of transmission time, since the original server does not need to send the object if it has not been modified, but in the worst case the memory access in the proxy server takes longer than what we have saved in transmission time. What am I misunderstanding? Clearly proxy servers must save a tremendous amount of time since they are widely used but I can't see how.

1
  • 6
    As the saying goes, “There are only two hard problems in computing: naming things, cache invalidation and off-by-one errors”. This is thus one of the hard problems.
    – Mike Scott
    Commented Jun 1, 2021 at 9:41

4 Answers 4

3

HTTP requests and responses can include headers which influence caching; some relevant to this question:

public - The response may be stored by any cache, even if the response is normally non-cacheable.

max-age= - The maximum amount of time a resource is considered fresh

immutable - Indicates that the response body will not change over time

Website developers can use these to declare that static files will never change and can be cached forever (e.g. include a version in the filename so image-v1.jpg gets updated to image-v2.jpg and never edited with the same name, so the cache of image-v1.jpg is always useful), or that pages can be cached for a week before being revalidated using the "check with server" process that you describe.


However, the proxy server still has to initiate a TCP connection and await the server's response, which adds up to two RTTs.

If the response to this check is "no change" and is small enough to come in one packet, it avoids the TCP slow start lag that may be involved in transfering the full response content over many packets. It can also be a lot faster for a server to generate a "this has not changed" response than for the server to get the current content (e.g. out of a database, or generat it using templates) and queue it for sending.

On top of that, before actually initiating the TCP connection the proxy server also has to check its memory if there is a cached copy of the HTTP object the client asked for.

This is enormously fast compared to a web request. This page from 2012 (sadly with the graph missing) says their Varnish Cache instance returns data from the cache in 100 microseconds, and sometimes faster than 10 microseconds. An internet round trip could be 5ms on a fast connectino or 200ms halfway round the world on a slow connection. 50-1000 times longer.

I can't see how this possibly makes web browsing faster, all we save in the best case is a little bit of transmission time

We save:

  • actual data transfer. Pages and JavaScript files can be in the Megabyte range, and not all connections are fast even today.

  • connection setup times. Browsers can be limited to 2 - 6 connections per domain, if the page has more files than that to load, they will have to queue. (More recent HTTP2 and QUIC protocols address this).

  • avoid congestion points. A work or college or shared library internet connection might well have a slow device along the way, like an underpowered router or a busy network link. Getting results from a proxy before that would help.

  • money. A business or college (etc.) saves the bandwidth costs. Commercial internet connections are often billed based on average bandwidth use, if you think all your business desktops or college students will be downloading the same Windows updates, the same Linux ISOs at the start of a course, the same educational software installers and schedule pages, it's cheaper for you to serve cached content internally, even if it is not faster for the end users.

3

The proxy server can be configured to not send any request to the remote server if the last request was was within a particular timeframe.

If it sent a request one second ago and got a copy of the page then gets another request for that same page a couple of seconds later then it could be assumed that the local copy is perfectly valid.

If the last time the page was copied is a minute or two ago you will probably still want to do the checks you mentioned. If the page is large or hosted over a slow link then you may well still be saving a second or two transmission time.

Modern dynamic webpages probably need more frequent checks, but there are still savings to be had.

3

There are at least 3 savings to be had -

  1. Lower trip times. A proxy will likely be closer to the end user, resulting in lower latencies, which can make a significant difference.

  2. Bandwidth savings. At least in many parts of the world, long distance links were expensive and thus saturated and slow. Proxies reduced this congestion. (25 years ago I remember paying over US$1500 for 64k of data to serve a number of dialup accounts. You bet proxies helped)

  3. Depending on the design of the website in question proxies can significantly reduce CPU utilization on the server (this is more a consideration in reverse proxies though).

3
  • "A proxy will likely be closer to the end user, resulting in lower latencies" This doesn't address the issue in the OP though. If the proxy has to make a request to the real server, there the same request is being made to the far away server but now you also have to make a request to the proxy first.
    – Qwertie
    Commented Jun 2, 2021 at 4:28
  • @Qwertie - it only needs to make the request if it is not already cached [ or the cache has expired ]. For static content, and content is not dynamic, caches can serve the request without referring to the server.
    – davidgo
    Commented Jun 2, 2021 at 5:12
  • Right, I guess the key part OP doesn't understand is how you know the cache has expired. Some configurations will just use an expiry date but you risk being left with stale content for that time.
    – Qwertie
    Commented Jun 2, 2021 at 6:06
1

Proxy servers do not make web browsing faster at all because almost all web traffic is now encrypted which makes proxies unable to function.

Proxies have largely been replaced with other technology like:

  • Geo replication. Duplicated servers and databases at a closer location to you
  • CDNs
  • Caches inside ISPs. Popular services like netflix and steam will partner with ISPs to provide them with a copy of the data directly

These modern methods have the advantage of being tied in to the service itself so they can be updated with fresh content before anyone requests for it.

Clearly proxy servers must save a tremendous amount of time since they are widely used but I can't see how.

This assumption itself should be challenged. I'd say things like download mirrors are more common because they still work on the modern web. A university could host it's own debian mirror and have all debian machines update via that rather than one over the wider internet.

There are still caches similar to http proxies used on other things like DNS. Your ISP dns server does not make a request to a parent DNS server every time someone makes a request for google.com. It simply holds the record for a set amount of time (this is meant to be configurable by the domain owner but it is often ignored) and then after that time expires, the ISP dns server will make a new request for the record.

If you access a record not cached it does take a little longer but most records should be cached. This also means when a domain record changes, it takes quite a while to filter down to users.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .