Chapter 4. Caching: The sell-by date

This chapter covers

  • HTTP caching and why you should use it
  • How to set HTTP caching in IIS
  • Caching considerations
  • File versioning
  • Output caching
  • HTTP caching results

It’s important to know where the bulk of your website visits come from. If you have a lot of repeat visits or if many people view more than one page before leaving your website, then HTTP caching can have a positive effect on your page load times. Modern browsers are really clever; they can interpret and understand a variety of HTTP requests and responses and are capable of storing and caching data until it’s needed. With the introduction of HTML5 and CSS3, modern browsers have become capable of achieving so much more than they could a few years ago. I like to think of the browser’s ability to cache information as the sell-by date on milk. In the same way that you might keep milk in your fridge until it reaches its expiration date before replacing it with a new carton of milk, browsers can cache information about a website for a set duration of time. After the data has expired, it will simply go and fetch the updated version upon your next visit to the website.

We briefly covered the differences between a primed cache and an empty cache in chapter 2. When you visit a website for the first time, your browser stores the components you download in a temporary cache. It does this so it can easily retrieve the components the next time you visit the same website. This, in turn, speeds up your download time.

The chart at the top of figure 4.1 shows the components of a web page that have not yet been cached. The total page weight is quite high and the browser will need to make 13 HTTP requests to fully load all the components it needs when a user visits the website for the first time. The chart at the bottom of the figure shows a primed cache for a user who has already visited the website with HTTP caching enabled. The total page weight is down to 11.4K from 1146.6K and most of the HTTP requests that it needs to make all came from within the browser’s own cache.

Figure 4.1. The difference between an empty cache and a primed cache. Notice the difference in their total weights.

HTTP caching’s main purpose is to use the browser’s cache to its advantage. If a user revisits a website, their browser can simply retrieve the components it needs from the browser cache instead of hitting the server again.

4.1. What is HTTP caching?

Most websites today are made up of similar components. Often these components, such as CSS and JavaScript, are shared across multiple pages, so by caching them, you effectively speed up any other pages on your site that a first time visitor would see as he browses your website.

Although the majority of visitors to your website might be new, remember that a large percentage of them might be returning. These returnees will experience extremely fast load times and benefit greatly from HTTP caching. Depending on the nature of your website, you may experience high volumes of returning visitors or really low numbers of returning visitors. Whichever way you look at it, adding HTTP caching will benefit all users whether they spend a lot of time navigating through your website or if they simply glance at it and return at a later date.

A web server can take advantage of the browser’s ability to cache data and use it to improve the repeat request load time. If a user visits the same page twice within one session, there is usually no need to serve them a fresh version of the static files that the page requires. This way, a web server can use the Expires header to notify the web client that it can use the current copy of a component until the specified expiration date. In turn, the browser will then cache this component and only check again for a new version when the user revisits the site or when the component reaches its expiration date. Let’s look a little deeper at the exact HTTP request and response that takes place.

Figure 4.2 shows a typical HTTP request that would be made for a CSS file the first time a user visits a website. As you can see in figure 4.3, a response is returned from the server with an Expires header.

Figure 4.2. An HTTP request for a CSS file

Figure 4.3. An HTTP response for a CSS file

The Expires header has been added by the server and will notify the browser that it can cache the component until this expiration date has passed. In the case of figure 4.3 it’s only one day, but depending on how often you change your files, you might want to set this date a lot farther in the future. Once a website is stable, you’ll be surprised at how seldom the components on a web page change. In chapter 5, we’ll take a closer look at the best practices around expiration dates and how far in the future you should set your components to expire. Chapter 5 also goes into detail about the out-of-the-box support that Visual Studio 2012 provides for expiration dates.

In figure 4.3, notice that another type of response header called Cache-Control was returned. The Cache-Control header is an alternative to the Expires header, and it works with time slightly differently. Cache-Control was introduced in HTTP/1.1 and offers more options than the Expires header. In particular, the Expires header uses an exact date and time to specify an expiration; the Cache-Control header uses a max-age in seconds to determine the expiration date from the time it was requested. Both headers can be used together to notify the browser that it needs to cache the component for a certain time and both are perfectly acceptable methods of expiring data. Fortunately for you as a .NET developer, IIS will automatically determine when to use the Expires header or when to use the Cache-Control header. We can easily set the Expires headers in either our IIS web server or within the Web.config file of our application. Once the settings have been applied, they’ll all be handled for you by the IIS web server.

We have run through the steps that a browser makes when it requests a component on a web page and it’s then told to cache the component, but what about when a second request for a component is made? When a browser makes a repeat request for an object that is still in its cache, it needs to check if anything has changed on the server for that component since it was last requested. If nothing has changed, the server will respond with a 304 HTTP status code notifying the browser that it has not been modified and that it can use the version stored in the browser’s cache. The 304 HTTP status code is efficient because the server simply checks the component and returns a small 304 response instead of a full response with the contents of the component. Figure 4.4 shows a repeat request for a CSS file in action.

Figure 4.4. A repeat request for a CSS file which returns a 304 HTTP status

Notice that this is a repeat request and nothing has changed on the server since the component has been cached, so the server responds with a 304 HTTP status code. It needs to do this to ensure that the file hasn’t changed on the server and that it’s still okay to use the version that is stored in its cache.

4.2. IIS and HTTP caching

When you’re configuring Expires Response headers for your web application, it’s important to consider the following information:

  • Content that is updated regularly, on a daily or weekly basis, should be configured to expire periodically.
  • Content that contains sensitive information that you do not want cached or that is updated frequently should be configured to expire immediately.
  • Content that is not expected to change should be configured to expire in approximately one year. You could set the expiration date to ten years in the future, but given the frequency with which users clear and fill their cache, setting an expiration date one year or ten years in the future might not make much difference.

If you’re developing a website and have direct access to IIS, it’s easy to add an HTTP Expires Response header. We’ll apply the changes to the Surf Store application now. First, open up IIS Manager on your computer and navigate to the website that you want to update. In this case, navigate to the Surf Store application, as shown in figure 4.5.

Figure 4.5. Add an Expires header to the Surf Store application by first choosing the website in IIS Manager.

Using IIS, you can choose to apply the Expires headers to individual folders or at the root level of your website. In most cases, you’ll want to cache individual folders that contain static files, such as your CSS files, JavaScript files, images, and so on. In the case of the Surf Store application, I am choosing to apply the Expires header to the Styles folder because it contains CSS files that won’t change regularly. Once you’ve selected the folder to which you’re applying the caching, double-click HTTP Response Headers in IIS Manager (figure 4.6).

Figure 4.6. Set the HTTP Response Headers in IIS Manager.

In the Actions pane on the HTTP Response Headers page, click Set Common Headers. A window similar to the one in figure 4.7 will appear in IIS.

Figure 4.7. IIS–Adding custom HTTP Response Headers

Select Expire Web content and choose how long you want the browser to cache the components. For this example, I chose 30 days. There is also the option to choose Immediately and another option to expire the components on a specific date and time.

Using the Yahoo! YSlow tool, you can immediately see the difference after adding the Expires header. If you load up the website and refresh the page, you’ll see that all the CSS files in the Styles folder have been set to expire in the future and will not need to fetch the website components again.

In figure 4.8 you can see the two CSS files have an expiration date set from the date the file was requested. Whenever the browser reloads that page, it won’t need to request those two files again. If we apply the expiration to the Images and Scripts folders, there should be an even more marked improvement in the repeat page load time and primed cache. The weight of the repeat view of the web page in figure 4.9 has significantly reduced in size. It has gone from 1146.6K to 11.4K. We managed to drop over a megabyte from the repeat page load! Imagine saving that amount of data for each new and returning visitor to your site.

Figure 4.8. HTTP Response Expires for the CSS files in the Surf Store application

Figure 4.9. The Empty and Primed caches for the sample application after adding Expires headers

4.3. Web.config settings

Much like compression, the expiration details for static content can be set in the Web.config file. This is useful if you work in a shared hosting environment and you don’t have access to IIS. You might find that you’re working on a website that’s hosted with a lot of other websites and your vendor restricts your access to certain elements of the server. You can even achieve the same level of configuration detail you get with IIS when you use the Web.config file.

In order to add Expires headers to your application, you’ll need to add the lines of code shown in figure 4.10 to your Web.config file.

Figure 4.10. Adding HTTP cache settings in the Web.config

This code will add an Expires header to all static content that your application processes. This will be handy when you need to apply expiration details across your site, but what about when you need to apply different settings at the folder and file type level? Simply add the code in figure 4.10 to a Web.config at the folder level instead of applying the settings to the Web.config at the root level. Figure 4.11 shows the Web.config file inside the Scripts folder.

Figure 4.11. The Web.config file is inside the Scripts folder. This will cause the caching to occur at the folder level instead of throughout entire application.

By adding a Web.config file to a specific folder, you’re ensuring that the caching will occur for that folder instead of the entire application. This can be pretty useful, especially when you need to only cache a certain set of files.

4.4. Caching considerations

It would be great if we could cache every component of a web page for a long period of time, but this isn’t always possible in most modern web applications. Web applications are dynamic and need to serve up fresh content constantly. That’s why it’s important to think about the different components that make up a web page and determine each component’s caching needs.

Depending on your website’s purpose, you may find that you’re actually able to cache most of the components for a long period of time. However, what happens when you develop and redeploy changes to these components? Your users might have an old version of a file in their browser’s cache even after you’ve deployed a newer version. Depending on the HTTP response details that are sent back, their browsers might not check with the server for a newer version right away. This could lead to problems if your users are viewing outdated components and information. Keep in mind that as developers, we can easily press CTRL-F5 and force a full browser refresh, but the average website user won’t understand that they need to do this.

In chapter 1, you read about the performance cycle and the important role it plays when you’re optimizing your website. While reading this chapter, keep in mind the different stages of the performance cycle and where you currently stand. Once you’ve applied HTTP caching to your website, it’s important to monitor any performance changes and any effects this might have on your users. This includes any broken pages that may occur due to incorrect file versions stored in their browser. As a developer, instead of simply adding expiration dates to the web page components, think about your users. How often do you deploy changes? How often do you expect the CSS and JavaScript to change? These are all important questions to ask when analyzing how long to cache components.

To make sure the cache updates instantly when you deploy a new version of a file, the best option is to rename the file. Simply changing the name of the component forces the browser to request the new version of the file, because any references to the old file are lost with its name. For example, if you have a CSS file called site.css and it has an expiration date in a week, the browser won’t bother checking for any changes within that week. However, if you change the HTML and update the name of the file to site_v1.css, the browser will be forced to request this new file instead of using the old one from its cache. One downside of this is that you’ll need to change the name of your files each and every time you redeploy your application. This type of workaround is known as file versioning or file revving. In chapter 5, we’ll look at ways that Visual Studio 2012 automatically versions CSS and JavaScript for you when you combine this technique with file minifying.

4.5. Output caching

We’ve added HTTP caching to the static components on a web page, but what about dynamic components and especially the HTML web page itself? Fortunately, .NET has a great built-in feature called OutputCache. It allows you to cache the contents of a web page based on a number of different factors, such as the parameters passed in, where cache is stored, or how long you’ll store the data.

4.5.1. Output caching in an ASP.NET MVC application

In ASP.NET MVC, the output cache enables you to cache the content returned by a controller’s action so the same content doesn’t need to be generated each and every time the same controller action is invoked. If applied to your controller correctly, it can produce extremely fast repeat load times.

It will be a lot easier to explain how output caching works if we use the Surf Store application as an example. You can enable output caching by adding the OutputCache attribute on an individual controller or an entire class. The following listing shows the attribute being added to the sample Surf Store application.

Listing 4.1. Applying output caching to an Action

The code in the listing will cache the output of the Index() action for 100 seconds. Any additional requests made to the web page within that time frame will receive lightning-fast response times because the content of the web page has been cached on the server.

You can tell the OutputCache attribute to adapt its caching activities to meet various parameters. For example, you might want to display personalized content for individual users. Think about the sites, such as Amazon or Facebook, that you log onto daily. These sites need to cache frequently used data and vary the personalized content they display based for each user. The VaryByParam property you saw in listing 4.1 helps you accomplish this task. The property can be set to contain a semicolon-separated list of strings to vary the output cache. So depending on the parameters passed in, users will see a different cached version of the website. VaryByParam can use the following values:

  • None— The output will not be cached.
  • *— The cache will vary based on every parameter that is passed in.
  • Any valid query string or POST parameter name— The cache will vary based on a particular parameter or parameters.

If you used the output cache but didn’t vary it based on parameters, you’d find that different users would see the same content, rather than web pages that were personalized for their individual interests. This isn’t ideal in modern websites, where content must be dynamic and personalized for each user!

Let’s apply a different output cache to the Product page of the Surf Store application. In listing 4.2, the OutputCache attribute has been added to the Action and is being cached for an hour (or 3600 seconds). We’re varying the cache based on the "category" parameter, so every time the parameter changes, the server will store a different copy of the action based on the parameter. This VaryByParam property is particularly useful when dealing with dynamic web pages because it allows you to serve different content to your users depending on the parameters passed in.

Listing 4.2. Applying OutputCache to the Product page

As a default setting, the OutputCache attribute caches content in three locations: the web server, any proxy servers, and the web browser. In certain circumstances, you might want to specify exactly where the content is cached. You can do so by modifying the Location property of the OutputCache attribute.

The Location property can use the following values:

  • Any— This is the default setting. The output cache can be stored on either the requesting client or the server.
  • Client— The output cache is stored on the browser client where the request originated.
  • Downstream— The output cache can be stored on any device other than the web server.
  • Server— The output cache is located on the server where the request was processed.
  • None— The output cache is disabled for the requested page.
  • ServerAndClient— The output cache can be stored on the server and on the requesting client.

You might use the Location property when you’re caching information that’s personalized for each user. In that case, it’s better not to cache the information on the server, but rather on the client.

The next listing shows how to apply the Location property to the Surf Store application’s OutputCache, telling the server where the browser should store the data.

Listing 4.3. Applying the Location setting to the OutputCache

We added an Expires header to the components of a web page at the beginning of the chapter, and now we’ve added an Expires header to the web page itself. Let’s look at the HTTP response that’s returned for the Products page after adding output caching in figure 4.12.

Figure 4.12. Applying output cache to a web page

The Expires and Cache-Control headers in figure 4.12 have been set to expire 24 hours from the date of the request. The browser won’t bother checking with the server to see if the item has changed, which saves the user another request to the server for the HTML of the page.

4.5.2. Output caching in an ASP.NET Web Forms application

Adding output caching to an ASP.NET Web Forms page is similar to an MVC application, except it’s applied at the page level. The pages in a Web Forms application will render extremely quickly if output caching has been applied; by adding a few attributes, you’ll notice the speed of your application improve instantly. You’ll need to add the OutputCache attribute to the top of the web page, as shown in figure 4.13.

Figure 4.13. Applying OutputCache to an ASP.NET Web Forms page

The OutputCache attribute applied to the web page will start caching content with an expiration date of one hour. At the moment, it will serve the same content for every user regardless of the output from the server. Earlier, we discussed how a website such as Facebook or Amazon might want to serve different content to different users based on the user parameters passed to the page. The code in figure 4.13 doesn’t do that, but you can add that ability with the VaryByParam setting. Let’s take a look at this in figure 4.14.

Figure 4.14. Applying output cache to the Product page with the VaryByParam and Location properties

The code in figure 4.14 will cache the details of the Product page for the one hour, or 3600 seconds. The code also has more detail in the VaryByParam setting and will vary its cache according to the category parameter passed to the page. Finally, the cache will be stored on the web server and requesting client (browser) using the Location setting.

The Product page in the Surf Store application will need to serve dynamic content based on each different category that gets passed through, so you’re setting the cache with the "category" parameter. For example, if you’re on the Product page and see the wetsuits section, a value of "wetsuit" was passed through in the category parameter. To make sure you cache the correct category, you need to make sure the cache is varied based on this parameter.

4.6. The results of HTTP caching

Now that you’ve applied caching across our Surf Store application, let’s run it through Google PageSpeed and Yahoo! YSlow to see if you’ve improved its performance score.

In chapter 3, you applied compression to the sample application and bumped the PageSpeed score up to 85. After applying the HTTP caching changes in this chapter, the PageSpeed score has increased to 89, as shown in figure 4.15.

Figure 4.15. The Google PageSpeed score after applying HTTP caching

If you run the sample application through the Yahoo! YSlow tool, the results are pretty impressive! See figure 4.16.

Figure 4.16. The Yahoo! YSlow score after applying HTTP caching

The site increased its score from 81 to 91, simply by adding HTTP caching. This is a significant increase, taking you a step closer to achieving a perfect score for the Surf Store application.

4.7. Summary

You’ve covered a lot of ground in this chapter. You learned how browsers cache the components that make up a web page and how you can harness this cache to reduce repeat HTTP requests. HTTP caching allows you to store static components in the browser’s cache, so the next time a user requests any one of these components, it will immediately be retrieved from the cache. This means fewer repeat HTTP requests and a hugely decreased repeat page load time.

IIS plays an important role when it comes to HTTP caching. Using IIS, you have full control of the caching settings you wish to apply to your web page components. Understanding the duration to cache your components for is an important part of web page performance: too long and your users might have content that is out of date, too short and you don’t gain the benefits of caching. Using the performance cycle will help you analyze your application as a whole and think about how each component affects the overall load of each page.

In the next chapter, we will look at the new bundling and minification features that have been built into ASP.NET 4.5 and how you can use them to drastically improve your page load times.

