Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9. Image Processing

Tim Kadlec

So far in this book, we’ve spent a lot of time discussing the performance impact of images in terms of requests and file size—characteristics that primarily impact the network side of things. However, there’s much more work being done under the hood by the browser to get an image to be displayed on a screen. These additional steps in the image loading process can have a significant impact on the processing time and memory footprint of your site.

Decoding

As we saw in Chapters 2 and 3, when your graphic editor of choice creates the image file, it goes through a series of steps collectively called the encoding process. Consider the general steps included in the JPEG encoding process that we learned about in Chapter 4:

The graphic editor must covert RGB data to the YCbCr format.
The graphic editor applies some level of chroma subsampling to reduce file size.
The input is transformed from the color space to the frequency space by a Discrete Cosine Transformation (DCT) and further optimized using a quantization matrix.
Finally, the data goes through one last lossless compression step called Huffman encoding.

By the end of this process, the original color data has been transformed into a highly compressed bitmap. While this outputted format is exactly what we need to save the file efficiently, it’s not what the browser needs. The browser needs that color data—it needs to know what to actually paint for each pixel on the screen. Specifically, the browser needs an RGBA (red, green, blue, alpha) value for each pixel of the image. To get to that data, the browser needs to walk backward through these steps and decode the image.

If we look at the JPEG format again, the decoding process looks something like this:

The data goes through a Huffman decoding process.
The result then goes through a Inverse Discrete Cosine Transformation (IDCT) and dequantization process to bring the image back from the frequency space to the color space.
Chroma upsampling is applied.
Finally, the image is converted from the YCbCr format to RGB.

Figure 9-1 illustrates the JPEG encoding and decoding process

Whenever the browser must display an image, it has to grab this decoded data before it can draw it to the screen.

Measuring

This decode process is not cheap and can take quite a bit of time on the CPU. The amount of time the browser spends decoding images is revealed in several sets of developer tools.

Chrome

In Chrome, the image decode time is displayed inside of the Chrome Dev Tools, in the Timeline tab. If you record the loading of a new page, you can then filter using the search bar and display just the timings related to image decoding (Figure 9-2).

For more detail, you can use Chrome’s tracing functionality. Opening chrome://tracing in your browser will allow you to record a trace of all the work the browser is doing. Traces can be intimidating even to those who have spent some time digging into them, but there is an incredible amount of information in there. For our purposes, the task that holds the decode times is the ImageFrameGenerator:decodeAnd-Scale task. Thankfully, we can filter down to find those timings in the massive list of information.

To do that, you’ll want to select the area of the trace that you want to analyze, as shown in Figure 9-3.

With that area selected, you’ll see a long list of all the “slices” (essentially, any action the browser took) revealed at the bottom (Figure 9-4).

From here, sorting by CPU Self Time will let you see which tasks took the longest on the CPU. In Figure 9-5, the top three tasks are all related to decoding images.

You can also zoom in on an individual event within the trace to see all the related tasks that have to be run, and the timing of each. Figure 9-6 shows all the tasks being run in order to decode a pair of images.

Chrome on mobile devices

Both Chrome’s tracing and developer tooling allow you to easily record image decode times for mobile devices running Chrome as well.

Enabling Remote Debugging for Chrome

In order to profile a mobile device on your desktop, you’ll need to make sure USB debugging is enabled. The steps vary depending on the version of Android running, but you can find the latest information on the Chrome Developer site.

With your device connected to your machine using a USB cable, and USB debugging enabled (see “Enabling Remote Debugging for Chrome”), you can navigate to chrome://inspect/?tracing#devices. This will show you a list of all open tabs on the device you want to remotely debug (Figure 9-7).

Selecting “trace” will bring up the same tracing window you would see for desktop analysis, only now the trace will be conducted on your connected device. From here, you have all the same filtering and zooming capabilities we discussed previously.

Edge

The developer tools for Microsoft Edge also display the image decode timings inside their Performance tab. Whereas the Google Dev Tools show each individual call to the decoding process, the Edge tools take the approach of showing you the total time per image—arguably a more understandable and valuable view of the data.

Firefox and Safari

At the time of writing, neither Firefox nor Safari offers the ability to analyze image decode timings.

How Slow Can You Go?

This decoding process is not cheap. It can occupy the CPU for quite a bit of time, particularly for lower-powered devices or high-resolution images. Just how slow can the decode process be? The answer ultimately depends on the complexity and size of your images, but you can get a decent idea by creating a test page of 10 images or so at different sizes and see what happens.

The simple test I ran involved using three pages, each of which displayed images at 200-pixel wide. One page served images that were resized to the exact width they would be displayed at—200 px. A second page used 400-pixel wide images, and the third page used 1,200-pixel wide images. The test was run on a Nexus 5 device, and the differences were substantial, as you can see in Table 9-1.

Table 9-1. Time spent decoding different sized images
Image size	Decode time	Percentage increase
200 px	30.38 ms	-
400 px	102.77 ms	+238.3%
1,200 px	15,534.99 ms	+4,952.6%

While the results will undoubtedly vary depending on the different images you use—as well as the device tested on—the conclusion is the same: the browser must spend much more time decoding images as those images get larger in size. Just as serving appropriately sized images decreases overall page weight, resizing your images provides a substantial reduction in decode time as well—ensuring your content gets rendered to the screen as quickly as possible.

Memory Footprint

Resizing images in the browser can also impact battery life and the lifespan of the device. Ever notice your phone getting warm while you’re browsing an image heavy site? Much of that is from all the image decoding the browser is trying to do.

Decoding an image is a fairly involved process that the browser must go through for each and every image on the site, every time it needs to display it. Let’s say you have a large hero image at the top of your page. As you scroll down, the image is no longer visible. When you scroll back up, the browser needs that decoded data again to get the image back onto your screen.

To avoid the added overhead of having to possibly decode the same image multiple times, the browser maintains an image memory pool—a preallocated space in memory where decoded image data can be stored. Now, when the browser needs to put that image back on your screen, it doesn’t (necessarily) have to go through the decoding process again. Instead, it can look in the memory pool to see if the decoded data for a given image is already available. If it is, it uses that decoded data. If it isn’t, the browser will go through the process of decoding the image and, eventually, storing the newly decoded data in that memory pool for later.

This decoded data is much larger in size than the disk size of the original image downloaded. Remember: a huge part of the encoding process is reducing the final size of the generated image, and the browser has just redone all of that work.

Since we know that the image is represented by an RGBA value for each pixel, we can figure out exactly how much memory that image is going to take up by multiplying the height and width of the image by 4 (an RGBA value takes up 4 bytes—one byte each for red, green, blue, and alpha). The final formula is:

Width × Height × 4

Consider a hero image that is 1,024 pixels wide and 300 pixels high. We can plug those numbers into our formula to find out how much memory it’s taking up once decoded:

1,024 × 300 × 4 = 1,228,800 bytes

While the disk size of the image may not be particularly heavy, the decoded size stored in memory is a whopping 1.23 MB. As of 2015, 25% of all new Android phones were shipping with only 512 MB of RAM.¹ Factor in that the average page today uses around 30 images or so, and that memory gets eaten up pretty quickly. Generally speaking, the browser is nearly always going to need to use more memory than it has access to.

That’s where the image memory pool mentioned earlier comes back into play. A browser can offer memory back to the operating system for it to reclaim, if needed.

What happens is that as you scroll down a page, the browser may choose to offer some of the memory currently being used on images back to the operating system. A great example would be a large hero image at the top of the page. The farther you scroll down, the less likely the browser is to need that decoded image (and the more memory the browser is likely to be using as it decodes images scrolling into view).

At some point, the browser may decide that it’s safe to offer that memory back to the operating system. If the operating system does indeed reclaim the extra memory, the browser will discard the decoded data for the image. If you were to now scroll that image back into view, the browser would once more need to decode that image because it would no longer be included in the memory pool.

Image pooling is a necessary feature to ensure that the operating system is not crippled by image-heavy pages, particularly on lower-end devices. The tradeoff is that whenever decoded data is evicted from the pool, the already costly process of image decoding may be duplicated, wasting CPU cycles.

One of the interesting implications of this process is the impact on image spriting (discussed in Chapter 10). With spriting, you combine multiple smaller images into one large image. The idea is that you minimize the number of requests necessary to get your images down to the browser. The unfortunate side effect is that, because the sprited image is now quite large, it’s going to fill up that image pool much more quickly. If and when the browser needs that memory back, it’s going to evict the entire sprite. Now, if even one of those images contained in the sprite needs to be displayed again, the entire sprite will need to be decoded.

If, however, each of those images were served up individually, the browser would only evict as many images as necessary to free up the necessary memory—leaving more of the images in the memory pool and reducing the risk of heavy decodes recurring.

In addition to watching the size of your images, we can take advantage of a relatively recent improvement to how browsers handle decoding and enable GPU decoding where possible.

GPU Decoding

Given the many costs associated with displaying an image—potentially limited memory, cost of decode, and risk of having to decode the same image multiple times—it’s in the best interest of the user, the browser, and you as the developer to reduce the amount of memory used by as much as possible.

With this in mind, browsers started to experiment with how they might be able to reduce the memory impact of images by changing how and where the decoding occurs. The most significant optimizations involve the JPEG format.

JPEGs are saved as YCbCr data, which provides an opportunity for reduced memory usage. Using the YCbCr color space means images are stored using three channels: one luma channel and two chroma channels. If the image is decoded and stored as YCbCr data instead of RGBA, we move from 4 bytes per pixel to 3 (one each for chroma blue, chroma red, and luma). We’re kind of cheating here because we’re ditching that alpha data entirely. But since JPEGs don’t support alpha transparency, we can get away with it.

Traditionally, the decoding process has occurred on the CPU. Only after the image has been fully decoded does the CPU pass that decoded data over to the graphics processing unit (GPU) to be rendered. However, if browsers move the final step in the JPEG decoding process (converting from YCbCr data to RGBA) to the GPU as well, they can now store the data in YCbCR format—saving precious memory space. The GPU can handle the work.

If we look back at our hero image from earlier, when it was stored as RGBA data, it took up 1.23 MB of space:

1,024 × 300 × 4 = 1,228,800 bytes

That same image stored in the YCbCr color space takes up much less room:

1,024 × 300 × 3 = 921,600 bytes

Simply saving the decoded image in a different color space results in a 25% reduction in memory usage. It requires the GPU to do a little more work (instead of merely rendering the image, it must also convert from YCbCr to RGBA), but it reduces battery life, memory use, and precious CPU cycles—not a bad tradeoff!

The impact on memory reduction becomes even more significant depending on the level of chroma subsampling involved. Brace yourselves: it’s about to get mathy again.

Let’s revisit the savings in chroma data for the different levels of subsampling that we saw in Chapter 4 (see Table 9-2).

Table 9-2. Chroma data savings based on subsampling level
Subsampling level	Chroma data savings
4:4:4	0%
4:2:2	50%
4:1:1	75%
4:2:0	75%

Armed with these numbers we can come up with a new formula for memory usage when the browser uses GPU decoding:

(Height × Width × 3) – (Height × Width ×
  Subsample_Level × 2)

First, let me apologize for giving you flashbacks to ninth-grade algebra. It was sadly unavoidable.

Now, let’s break this down.

The first thing we need to figure out is how much the image would consume in YCbCr using no compression. As we saw a little earlier, that’s the first part of this formula:

Height × Width × 3

However, if there is subsampling involved, we aren’t actually using all of those bytes. If we’re using a 4:2:2 subsampling level, for example, our two chroma channels aren’t using 50% of their original data to be precise. So we need to subtract that. That’s the second part of our formula:

Height × Width × 2 (number of chroma channels) × Subsample_Level

Let’s walk through a few examples using our hero image. If the hero image were saved using 4:2:2 subsampling, then our subsample level is 50%, or .5. Here’s how we’d use it in our formula:

(1,024 × 300 × 3) – (1,024 × 300 × 2 × .5)
  = 614,400 bytes

If we encoded the same image using 4:2:0 subsampling, our subsample level is 75% or .75:

(1,024 × 300 × 3) – (1,024 × 300 × 2 x .75) =
  460,800 bytes

You can see in Table 9-3 that our memory usage really starts to add up the higher the level of subsampling used, peaking at a hefty 62.5% savings if images are saved using either the 4:1:1 or 4:2:0 subsampling levels.

Table 9-3. Memory usage for a 1024×300-pixel image, based on decoding method used
Decode method	Memory use (in bytes)	Memory savings
CPU (RGBA)	1,228,800	0%
GPU (4:4:4)	921,600	25%
GPU (4:2:2)	614,400	50%
GPU (4:1:1)	460,800	62.5%
GPU (4:2:0)	460,800	62.5%

The memory savings for using a 4:2:0 (or the less common 4:1:1) subsampling level is huge, particularly when you consider that the average site today is loading 1.4 MB of images and 45% of those are JPEGs. There’s a lot of room for improvement here. According to a study of 1 million images that was conducted by Colin Bendell,² only 40% of JPEGs online are currently using 4:2:0 subsampling.

Triggering GPU Decoding

At the time of this writing, Chromium-based browsers, Microsoft Edge, and Microsoft Internet Explorer 11+ all support GPU decoding. For Edge and Internet Explorer, GPU decoding is the default process.

Chrome has taken a slightly different approach (for now) and only enables GPU decoding under certain situations.

The meta viewport element is defined and includes "width=device-width".
There are not multiple rasterization threads available.
The device is using Android 4.x (and later) or is a Nexus device.

This means that if you’re using responsive design (and using the approaches mentioned in Chapter 11), then Chrome on mobile is already taking advantage of GPU decoding whenever it thinks it’s the best approach available.

Multiple What Now?

Chrome is a multithreaded browser—it uses different threads for different dedicated tasks. This enables different kinds of work to be done in parallel—without blocking progress. Rasterization, the process of converting vector shapes to raster format (pixels) to be displayed onto a screen, is done on a dedicated rasterization thread. At times, Chrome may decide the device being used is best served by running multiple rasterization threads in parallel—greatly speeding up the process of getting pixels onto your screen. If it’s able to do this, Chrome won’t use the GPU for decoding.

While you, the developer, have control over ensuring the proper meta viewport element is being used, you have no control over whether or not there are multiple rasterization threads. So, while you can provide Chrome what it needs to handle decoding on the GPU, ultimately it’s the browser that decides if it should happen or not.

Summary

The browser has to do a lot of work to display an image on your screen. Sizing your images appropriately, taking advantage of chroma subsampling on your JPEG files, and taking advantage of GPU decoding can all help to reduce the impact on both processing and memory—both very important considerations particularly on mobile devices.

With a working knowledge of how to optimize each image format as much as possible, as well as how to enable the browser to do its job efficiently, it’s now time to put it all together. How do you apply all of this knowledge into an efficient workflow? In the next chapter, we’ll explore just that.

¹ https://www.youtube.com/watch?v=7V-fIGMDsmE#t=81m30s

² http://bit.ly/2b5Gxt5

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
9. Image Processing

Chapter 9. Image Processing

Decoding

Figure 9-1. The JPEG encoding and decoding process

Measuring

Chrome

Figure 9-2. Image decode timings exposed in Chrome Dev Tools

Figure 9-3. Selecting a section of a trace in Chrome for deeper analysis

Figure 9-4. The list of all the actions the browser took during the selected portion of the trace.

Figure 9-5. Sorting by CPU Self Time lets you see which tasks have the highest amount of CPU overhead

Figure 9-6. Zooming in on an individual event within the trace gives you a lot of insight into all the related tasks the browser must run

Chrome on mobile devices

Enabling Remote Debugging for Chrome

Figure 9-7. With your device connected to your computer using a USB cable, you can use developer tools or Chrome’s tracing feature to analyze sites on a remote device

Edge

Figure 9-9. Microsoft Edge’s developer tools reveal the total amount of time spent decoding each image in a given page

Firefox and Safari

How Slow Can You Go?

Memory Footprint

GPU Decoding

Triggering GPU Decoding

Summary

Table of Contents for 9. Image Processing

Create new playlist

Sign In

Sign Up

Chapter 9. Image Processing

Decoding

Figure 9-1. The JPEG encoding and decoding process

Measuring

Chrome

Figure 9-2. Image decode timings exposed in Chrome Dev Tools

Figure 9-3. Selecting a section of a trace in Chrome for deeper analysis

Figure 9-4. The list of all the actions the browser took during the selected portion of the trace.

Figure 9-5. Sorting by CPU Self Time lets you see which tasks have the highest amount of CPU overhead

Figure 9-6. Zooming in on an individual event within the trace gives you a lot of insight into all the related tasks the browser must run

Chrome on mobile devices

Enabling Remote Debugging for Chrome

Figure 9-7. With your device connected to your computer using a USB cable, you can use developer tools or Chrome’s tracing feature to analyze sites on a remote device

Edge

Figure 9-9. Microsoft Edge’s developer tools reveal the total amount of time spent decoding each image in a given page

Firefox and Safari

How Slow Can You Go?

Memory Footprint

GPU Decoding

Triggering GPU Decoding

Summary

Table of Contents for
9. Image Processing