7.2.14

Browser Caching Explained

BY JEFF ROBERTSON

One of our clients had an issue recently. We made a change to the company’s website, and instead of seeing the change, the client saw a strange, poorly laid out version of the page. Quick tests from a few machines showed the problem was not on the web server - the client’s computer was essentially loading part of the new content and part of the old content.

Many users, including our client, know the solution to this type of problem - refresh or clear your browser cache.

But why, specifically, does this happen? Will anyone else see this problem? And how can we avoid it in the future?

Here's how caching works on a user's machine:

Situation 1: User has never visited the site before
In this case, loading a page is simple. The browser requests the page. When the page is returned, it contains many assets (images, javascript, CSS, etc.) and the browser goes about loading those files and any assets they contain. Everything is loaded fresh.

Each asset may contain zero to many cache-related headers. These headers tell the browser how long it may cache the asset before checking with the server to see if it has changed. The browser stores the asset in its cache in most situations.

A few possible cache headers include:

  • Expires or cache-control: max-age
    • These headers tell the browser when the asset expires (a specific date/time) or how many seconds the browser may continue to use the asset before checking for a new version (max-age).
  • Last-modified or Etag
    • These headers provide a timestamp or a code (Etag) that say when the asset was last changed on the server.

These approaches can be used alone or in tandem. The first two tell a browser how long they can continue to use an asset before they check the server for a new version. The second two allow the browser to check whether the version it has is the most recent version. If it does, the browser may continue to use its local version instead of downloading it again.

Situation 2: User has visited the site/page before (recently)
The browser already knows all the assets this page needs since the page is in cache. The browser checks each asset that is loaded from cache:

  • For the main page itself...
    • Browsers almost always request the main page regardless of whether it has expired, just to make sure it has the latest content. This is important since it causes strange issues sometimes.
  • Next, the browser checks every asset and asks is this asset still fresh? (i.e. has it passed its expiration date or max age?)
    • If fresh, load the asset from cache. Do not ask the server about this asset.
    • If stale, check with the server -- send the asset name and its last-modified date or Etag
      • The server will respond with 304 Not Modified if the version requested is the same as the version on the server. If so, the browser can load its asset from cache and save time and bandwidth by not downloading a copy of a file it already has.
      • If the asset has changed on the server, the server returns the appropriate header code (usually “200 OK”) and the browser downloads a new version of the file, replacing the old one in cache.

This process depends on the browser still having an item in cache. An item could be removed from cache for many reasons, such as if the cache got full (the computer clears things automatically... otherwise your cache would eventually take up all disk space on the computer) or if the user cleared their cache manually.

Our client’s website...
Our client’s website was originally set to provide both a max-age and an Etag header to the browser with every request. Max age is set somewhat conservatively -- 10 days. This means your browser will keep using the same files 10 days after the first use before checking to see if a new version exists. After 10 days, the browser checks in with the server to see if the Etag has changed. If not, it will keep using the cached version for another 10 days.

Example:

  • I loaded one of the site’s pages in a browser I rarely use. This browser had all the website’s files in cache, but they were pretty old (a month or two).
  • On first load, my browser asked whether a certain CSS file had changed since it knew it had a copy of the file from more than 10 days ago (the max age allowed). It sent the Etag of that file (the "If-None-Match line).
  • The server responded that this CSS file has a different Etag, and my browser downloaded the new copy of the file. cache-header-no-match
  • On other assets, however, my old copy is still good -- see below: cache-header-match

Now comes the important part, and the cause of our client’s issue. If the local copy of a file is still fresh, your browser will not even ask the server whether it needs a new version or not.

To demonstrate, I recorded every request between my computer and the server for three loads of the same page.

browsercaching1

Notice how few requests there are when I click a link to a page I’d already visited. That's because the browser knew everything in the cache was good for another 10 days -- it didn't need to even check with the server for 85 files. (Why does it even bother to check for those 15? Some assets like analytics and other tracking codes have special headers that force the browser not to cache them.)

So here is what happened to our client:

  • Our client had visited the site the day before the changes were made and all his files were in cache.
  • When visiting a location page the next day, the browser checked the main page and found the page content had changed slightly. The browser downloaded a new copy of all the HTML.
  • Next, the browser looked at each asset (images/CSS/JS/etc.) to see if it is fresh or stale.
    • Since he visited the site recently, everything in the cache was still fresh, and the browser did not check with the server to see if the item had changed.
  • This is where the problem lies -- the browser got the new HTML for the page because it always checks the main page. But, it still had an old copy of the CSS file that styles the HTML. New content + old CSS = strange looking layout.
    • However, it's important to note that this is exactly what was supposed to happen. This would only affect people who had visited the site in the last 10 days and who hadn't cleared their cache.

There are several ways to deal with this issue:

  1. Ignore it
    • This completely depends on how many repeat visitors you have to your site. New visitors won’t have anything in cache and wouldn’t see the problem.
    • Also, people who do see strange issues on a site typically hit the refresh button -- the cure-all for webpages. Unbeknownst to most users, this forces the browser to check for a new version of every file regardless of whether it is fresh or not (see table above).
  2. Set the max age to a low number
    • It's possible to set the cache's max age very low... one day or even a few hours. This means nobody will ever see a weird CSS issue after a change (unless they check it immediately after the change).
    • The downside here is it slows loading of the page ever so slightly. The browser has to check every file to see whether it's changed on the server (but at least it doesn't have to download a new copy of the file). For the 99% of requests that don't happen right after a CSS change, it's a waste of bandwidth.
    • While tools like Google Page Speed will complain if you do things this way, it may be the best balance of effort vs. performance.
  3. More involved options
    • There are more complicated ways to deal with this problem -- one is fingerprinting. This is where you actually change the URL of an asset every time it changes. Read more here: https://developers.google.com/speed/docs/best-practices/caching.
    • This is a great option for sites with lots of repeat visitors or those where critical CSS changes frequently. For many small sites, the benefits may not justify the effort and complexity.

But wait a minute -- I cleared my cache and it still didn't work right...

If you can believe it, caching gets even more confusing than the info above. In addition to caching taking place on your local machine, your ISP (among others) can also cache files.

As you can imagine with millions of customers loading millions of pages, it is in an ISP's best interest to cache all possible files on their servers and give that copy to their customers rather than fetching a new copy for each user who wants, say, Amazon's home page.

Typically ISPs don't cache things very long, but it is difficult to know what is being cached and for what duration. This would be another article for another day.

There are a few steps that can usually rectify any caching issue:

  1. Clear the browser cache
  2. Close all open instances of the browser and reopen it
  3. Open the page desired and do a hard refresh -- Ctrl + F5 (as this can clear out some upstream caches as well)
Jeff Robertson
written by JEFF ROBERTSON

Jeff Robertson is a digital marketer and an online development expert with experience stretching back to dial-up. He is partner and Chief Technology Officer at Carbon8, where he helps bridge the gap between the technical and marketing worlds, as well as oversees technical infrastructure.

share this
iamges