Tuesday, October 5, 2010

Proposal to enhance HTML5 App Cache for Better Performance

Disclaimer: this is a very initial idea, comments most welcome.

Background

The HTML5 Application Cache is a mechanism which allows a web app to operate offline.  A web app may create an App Cache manifest file which declares a set of static resources, such as images, stylesheets, scripts, etc.  These resources are then cached locally by the user agent (eg a browser) and are not updated nor retrieved from the web server until the App Cache manifest file itself is updated.  Learn more about the basics of the App Cache.  The App Cache allows for web apps to operate even when not connected to the internet, and web apps can start quicker as they can pull assets from local cache instead of the original web server.


An example App Cache manifest:






CACHE MANIFEST
index.html
stylesheet.css
images/logo.png
scripts/main.js

Problem


While the App Cache mechanism is generally helpful for the web and its users, there is still one scenario that is less than ideal.  When the App Cache manifest file is updated, the browser will then re-fetch each resource in the manifest.  Smart browsers will perform HTTP negotiations like checking the ETag (If-None-Matches) or the date of the resource (If-Modified-Since) in the hopes that the server merely has to reply with a 304 Not Modified instead of sending the whole resource over.


This is one of the main issues with the App Cache.  When the App Cache manifest is updated, the client has to now ask the web server if each and every resource listed in the manifest is newer on the server.  Even with responding with 304's, the client must initiate one Request/Response cycle for each asset.  This can be non-trivial in large applications.

Chrome Developer Tools tells me that, even over my pretty fast internet connection, my browser is spending 300ms to ask "is there a newer version of the resource" and for the browser to say "nope, 304, not modified".  If a modern web app has 20 assets, that's still six seconds for the browser to learn that nothing has changed on the server.

While this is pretty much the standard situation with web browsers and servers today, I would like to propose an addition to the App Cache which can really speed the web up.

Proposed Solution


Imagine if we could tell the browser everything it needed to know, in a single Request/Response, in order to make smart re-fetching decisions.  We would like to help the browser make its own local decisions on which assets to re-fetch, without having to ask the web server, over and over, if each asset has changed.


I propose that we add additional meta-data to the App Cache manifest which allows the browser to decide locally if it needs to re-fetch an asset.


Specifically, I propose we optionally allow a web app to specify the ETag and/or Last-Modified date for the resources in the App Cache manifest.  For example:







CACHE MANIFEST
index.html , ETag: W/fi32323cwnewf8 , Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
stylesheet.css , Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
images/logo.png , ETag: W/99vvj39jf30fj9340f
scripts/main.js

As you can see, each asset can also declare the ETag, Last-Modified, or both.  This gives the browser all the information it needs in order to determine if it needs to contact the server for a new version of the resource.

The only requirement for the ETag is that it must be quoted, so a server component can easily create a SHA1 hash or other identifier for the resource.  It's also trivial to get the last edited date for a resource.

The assumption here is that, while this data can be entered by hand, a small server component (such as Rack::Offline) may be used to easily generate the App Cache manifest.

Providing the ETag and the Last-Modified timestamp should give the client browser enough information to make a smart decision for which assets really and truly need re-fetching.

Summary

The App Cache, part of the HTML5 spec, is a mechanism to cache a set of resources locally, enabling offline web apps.  If the App Cache manifest is updated, the browser will then conditionally re-fetch each resource in the manifest, creating potentially wasteful network access and delay.

I propose to allow an App Cache manifest to additionally, and optionally, include the ETag and Last-Modified timestamps.  By including this meta-data, the browser is now able to make local decisions and only re-fetch assets that have definitely been updated.  This reduces network bandwidth, reduces latency, and decreases web app startup time and will make the web app feel faster.
Post a Comment

Disclaimer

I'm probably required to say that the views expressed in this blog are my own, and do not necessarily reflect those of my employer. Also, except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the BSD License.