IWMW 1997: WWW Caching
- 2. Overview of Presentation
• Why caching?
• Caching Infrastructures.
• National Caching.
• Caching hardware and software
• Implementation of caching
• Non-Technical Issues
- 3. Why Caching?
• 1000s of users ‘surfing’ the Internet each
with their own browser.
• Users and browsers are ‘independant’
resulting in a large amount of replication
of information carried over the network.
• Popular Web sites may have many
simultaneous connections transmitting
identical copies of a single item over the
same network trunk routes. This state of
affairs is highly undesirable because...
- 4. Why Caching?
• Bandwidth - especially
international bandwidth - is
very expensive, and must be
used cost-effectively.
• Web ‘hot-spots’ are created.
• Web object retrieval times are
increased.
- 5. Why Caching?
• Caching, or Web Caches are an
attempt to:
– Minimise bandwidth wastage.
– Decrease object retrieval times.
– reduce number of ‘Hot-Spots’
- 6. Caching Infrastructures
• Caches may be implemented:
– Within departments
– Within Institutions
– Nationally
– Internationally
• Caches can co-operate. So we
have meshes of caches or
caching infrastructures.
- 7. Caching Infrastructures
• Caching infrastructures are
developing at every level.
– Quite a few departmental caches.
– Many Institutions now operate
caches.
– Within the UK a National caching
infrastructure is developing.
– International infrastructures in
place and developing.
- 8. Caching Infrastructures
• Cooperation between caches.
– Achieved by the ICP cache
communication protocol in one of
two modes:
• Unicast mode - individual
connections established to
interrogate caches.
• Multicast mode - an ICP multicast
packet is ‘multicast’ to a group of
cooperating caches.
– Intuitively the multicast approach
should be more efficient - reduce
bandwidth, etc.
- 9. Caching Infrastructures
• For example at Manchester:
– Central campus cache and several
departmental caches use it in
unicast mode.
– Parent relationships with other
caches in the UK, Europe and
USA.
- 10. National Caching
• HENSA pioneered caching with
their Public Caching Proxy
Server. Initiated around 1992.
– Used Lagoon initially
– Then the CERN server
– Then Netscape Proxy
– And some Squid
• Details described at First
International WWW
Conference:
http://www.hensa.ac.uk/www94
- 11. National Caching
• The existing service is hosted
by University of Kent at
Canterbury and University of
Leeds.
• From 1st August 1997 it will be
hosted by the University of
Manchester and Loughborough
University.
• Selection by a recent
competitive tendering process.
- 12. National Caching
• The situation so far.
– Service still at HENSA and
Leeds. We are preparing for the
transition.
– Initially exisiting equipment will
be used.
– Projection of demand performed
and hardware upgrade path
budgeted for.
- 13. National Caching
• The ‘new’ service will have:
– a service ‘arm’
– a development ‘arm’
• The National service will be
directed by a steering
committee and will be, as far as
possible, user driven.
• National Caching Web site,
regular newsletter, mailing lists,
help desk system, fault
reporting mechanism, etc, etc.
- 14. Benefits of National
Caching
• Trans-Atlantic bandwidth and
bandwidth to Europe are both
very expensive and in great
demand. Caching reduces
bandwidth consumption. The
resulting cost savings can be
used to fund other things.
• Faster document retrieval time -
in theory!
- 15. National Caching - Useful
addresses and URLs
• Email addresses:
– wwwcache-users@wwwcache.ja.net
general mailing list for users.
– cybercache@wwwcache.ja.net mailing
list for Special Interest Group.
– natcache@wwwcache.ja.net, National
Cache Joint Team mailing list.
• Some URLS:
– http://www.hensa.ac.uk
– http://www.net.lboro.ac.uk/caching/
– http://www.mcc.ac.uk/Cache/
- 18. Using Caches
• Users interact with caches
directly using their favourite
browser.
• Caches interact or co-operate
with other caches using ICP.
• Browser - cache interaction is a
‘client-server’ type interaction.
- 19. Implementation -
Browsers
• Netscape
– Manual configuration - Select
network preferences from Options
menu...
– Automatic configuration - proxy
configuration can be automated
with Javascript...
• Others: Lynx, Mosaic, Microsoft
Internet Explorer.
- 20. Implementation - caches
• With reference to Squid
– Installation
– Configuration
– Operations
• Some problems
– disk space
– discarding documents
- 21. Implementation -Installation
• Retrieve from:
– http://squid.nlanr.net/Squid/
– Decompress and extract.
– configure
– compile
– install
• Operating Systems
– Unix, AIX, FreeBSD, HP-UX,
IRIX, Linux, OSF/1, Solaris,
SunOS
- 22. Implementation - Configuration
• Configuration file
– http_port
– icp_poty
– mcast_groups
– Cache_host
– cache_host_domain
– cache_swap
– cache_swap_low
– cache_swap_high
– cache_dir
– cache_access_log
- 23. Implemetation - configuration
• Configuration file continued...
– pid_filename
– debug_options
– ftpget_program
– negative_ttl
• Access Control lists
– http_access allow
– htp_access deny
- 25. Operation
• Parent or sibling?
• Log files
• Statistics
• Number of requests per day
• Machine loading
• Network loading
• Improvement in latency?
• Reduction in bandwidth usage?
- 27. Should I run a cache?
• Should I run a:
– Departmental cache?
– Institutional cache?
• Should I link together
departmental caches?
• Should I link departmental
caches to my Institutional
cache?
• Should I link my institutional
cache to the National Cache?
- 28. Should I run a cache?
• There are no hard and fast rules.
Clearly caching saves bandwidth and
improves latency, but it is not
obvious how best to construct a
hierarchy to achieve this.
• We are are at the learning stage. Part
of the remit of the National Web
Network Caching Service will be to
investigate this and produce
guidelines and recommendations for
individual sites.
- 29. Should I run a cache?
• The answer is yes!
• Consider
– number of users
– Type of work
– Local Area Network
• loading
• Bottlenecks
– Expected demand
• Analyse statistics
- 30. Futures
• The National WWW Network
Caching Service will be
involved in the development of
caching in the UK. Will
investigate hardware and
software. Findings will be
published on the National
Cache Web site:
URL: http://www.wwwcache.ac.uk