System: Apache mod_pagespeed settings
Google's recently released mod_pagespeed module for Apache 2 is causing a stir in the developer community. While there are claims that it can reduce download times by up to 50% for some websites, many developers are seeing little or no improvement on already-optimised websites. Our testing seems to bear this out.
The Pagespeed module has been designed to drastically clean up those websites with the worst HTML coding practices, making them faster and more likely to validate according to W3C standards. One can imagine that Google has been using something like this internally to make sense of the billions of pages of badly formatted HTML and CSS in it's index.
The mod_pagespeed package is now available for installation using yum (RedHat, CentOS) and apt-get (Debian, Ubuntu). For Debian servers the update command is:
# apt-get upgrade mod-pagespeed-beta
# /etc/init.d/apache2 restart
For reference, our local configuration contains the following settings which enable/disable some filters:
Unless otherwise noted, the following comments relate to
0.9.8.1 0.9.10.1 (Nov 2010).
Filters enabled by default
The default settings for mod_pagespeed enable all the filters listed in this section (known collectively as CoreFilters).
Filters can be individually disabled using the ModPagespeedDisableFilters directive and others enabled using ModPagespeedEnableFilters as shown above. We have used colours in the list below to indicate which filters are currently active on this server.
You can also set ModPagespeedRewriteLevel to PassThrough which will disable all filters so you add back just the ones you want.
• Add Head (add_head)
When a page encountered has no <head>, the module inserts an XHTML-style <head/> tag at the top of the page. See also Combine Heads.
• Combine CSS (combine_css)
Multiple CSS files included anywhere on the page using the <link> tag, and from the same domain, are combined into a single file, which is made cacheable and can be minified (see rewrite_css for limitations).
• Extend Cache (extend_cache)
Important: the internal mod_pagespeed spider (Serf) relies on your server settings to determine how often it re-examines content to see if it has changed. If you have a very long TTL (set in mod_expires for example) it will take a long time for the re-written image to update and you may want to reduce the TTL to around 300 or 600 seconds.
This filter can have the effect of destroying
If-Modified-Since header handling and clash with other headers
(more details below).
Cache extension is not done on resources that have cache-control:private or cache-control:nocache unless you set the ModPagespeed value ModPagespeedForceCaching to on (for testing purposes only).
• Inline CSS (inline_css)
Includes the contents of small external CSS files (up to ModPagespeedCssInlineMaxBytes) in the page itself. The default cutoff is 2kb (2048 bytes).
This reduces the number of requests, but at the expense of cacheability. It may be useful for extremely lazy programmers. Not recommended.
Again, not recommended.
• Optimize Images (insert_img_dimensions)
To use this filter, you first need to have enabled rewrite_images (below).
Inserts any missing width and height attributes in <img> tags. Really something you should be taking care of yourself. Might be handy for some forums and blogs. Harmless.
What is more helpful is if you ever do have a good reason to set an image to a different width/height using HTML, mod_pagespeed will create a version of the image at the new size (see below).
• Optimize Images (rewrite_images)
This is the most powerful filter, and has also the largest footprint. It will attempt to compress and strip metadata from images. For us, this has only a limited effect we already do something similar during the upload process.
This filter will generate new versions of images where the width and height attributes are smaller than the actual image dimensions. So if you embed a 1Mb image as a 100 pixel thumbnail it will create and serve the thumbnail and save the world from the original.
One issue with this filter is that images are assigned a much longer filename (for example images/BikeCrashIcn.png.pagespeed.ic.HASH.png instead of images/BikeCrashIcn.png) which lengthens your HTML and may affect search engine optimisation if you're used to getting traffic via image search engines.
When the image file is modified, the HASH portion of the URL is updated, but references to the previous URL will continue to work. In fact requests with any (or no) characters in place of the hash will resolve to the same image.
A nice feature, embedded images with a size less than ModPagespeedImgInlineMaxBytes (set to 2kb by default) will be converted to inline data strings. And the module knows (we trust) to serve this only to supporting browsers. May also impact image search if these are served to search engines.
Update: In 0.9.16.9 images referenced from external CSS files will also be optimised, though never 'data-ified'. Also CSS backgrounds defined in inline CSS styles are not being optimised at all.
Another new parameter ModPagespeedImgMaxRewritesAtOnce lets you specify how many images will be re-written at once. This applies system-wide so the server will not be busy trying to optimise more than the specified number of images at the same time. The default value is 8.
• Trim URLs (trim_urls)
This filter removes unnecessary components from href and src URLs in the HTML and, if rewrite_css has been enabled, in CSS files.
This includes converting absolute links into relative ones, and even removing the protocol (e.g. http:) for links where the protocol of the target URL matches the current page. This is harmless for modern browsers, but it confusing the heck out of some spiders.
Filters NOT enabled by default
The following filters are not part of CoreFilters so are not active by default and need to be explicitly enabled in the configuration file. The reason they are not activated by default is that there are known issues which could break some websites. Be warned.
• Combine Heads (combine_heads)
One <head> is better than two. Might be handy if you're ripping other websites and inserting them into your template without any processing, but really, you should never need this.
• Strip Scripts (strip_scripts)
Completely remove scripts from a page. Usefule for testing and timing purposes.
• Outline CSS (outline_css)
Replaces CSS style blocks of a size greater than ModPagespeedCssOutlineMinBytes with an external, cacheable, CSS file. By default only blocks equivalent to 3,000 bytes or more are affected.
• Move CSS to HEAD (move_css_to_head)
CSS style blocks (not inline styles) are moved into the <head>. This is handy if you're working inside a fixed template and can't directly edit the <head> section for individual pages.
These blocks can also be re-written (minified), but are not currently combined into a single CSS style block. Instead the <style> tags back up against one another.
• Rewrite CSS (rewrite_css)
The biggest problem here is that the parser does not yet recognise a range of CSS3 selectors and styles and even a single unrecognised line causes the parser to 'bail' and not minify any of the CSS in the same CSS file or code block. When the parser does recognise all your CSS syntax it works perfectly.
Update: In 0.9.16.9 many more CSS3 styles are being recognised, including vendor-prefixes. Most, but not all, of our external style sheets are now being minified.
• Make Google Analytics Asynchronous (make_google_analytics_async)
Sorry, but what were they thinking having this as a filter? If anything it should just spit out a warning that you're using old code and link to the instructions for migrating to Async.
Works extremely well on locally hosted scripts. One potential problem is that comments are removed, which may be an issue if you're using scripts that require some form of attribution in the code.
• Remove Comments (remove_comments)
Removes all HTML comments except for IE conditional comments. You will want to check first that none of your comments are required. For example, when using ht://Dig, HTML comments can be used to exclude sections of the page from the site search.
Update: There may be a new feature coming to allow certain comments (specified using wildcard syntax) to be left in the page. While this would solve some problems, really being able to turn this filter on/off based on user again would be better.
• Collapse Whitespace (collapse_whitespace)
Removes unnecessary line breaks, spaces and indenting from the page. This could have a big impact on some generated or WYSIWYG-edited HTML pages with huge indents.
Unfortunately, breaks the CSS style white-space: pre; which is used on this website, for example, to style the <code> blocks.
• Elide Attributes (elide_atttributes)
Removes attributes from tags "when the specified value is equal to the default value". This can have unexpected consequences such as when removing type="text" which then breaks the CSS selector input[type="text"]. There is a workaround for this, however, so whether you enable this filter really depends on how you want your code to look.
• Remove Quotes (remove_quotes)
Whether to use this filter, like Elide Attributes above, is a matter of preference as to how you want your code to appear. Any savings will be marginal as we're only talking about a few quote characters.
• Add Instrumentation (add_instrumentation)
To enable the collecting of statistics your configuration file should contain the following (uncommented) commands:
Allow from localhost
After you reload Apache each page will include a request for a 'beacon' image passing the page load time in milliseconds. Other statistics are collected in the background.
To view the accumalated statistics just go to the address /mod_pagespeed_statistics and you will see something like the following:
If you get a 403 Forbidden error, try replacing localhost in the configuration with either the domain name or the ip address you are using to access the internet. (e.g. Allow from 22.214.171.124). Only one set of statistics is collected which includes data from all websites hosted on the server.
What all the different statistics mean is not yet clear.
There are a few global variables that are not very well explained in the documentation, but there are some clues in the code:
- Set the target size (in kilobytes) for file cache. (default: 100Mb)
- Set the interval (in ms) for cleaning the file cache. (default: 1hr)
- Set the total size, in KB, of the per-process in-memory LRU cache. (default: 1Mb)
- Set the maximum byte size entry to store in the per-process in-memory LRU cache. (default: 16kb)
- The timeout period for requests by the internal spider (Serf). Defaults to 5ms.
Basically what happens is a cache of files builds up at the specified location (ModPagespeedFileCachePath). Then every ModPagespeedFileCacheCleanIntervalMs milliseconds, if the cache has grown larger than ModPagespeedFileCacheSizeKb, the LRU (Least Recently Used) files are removed.
The other 'LRU Cache' variables let you control how much memory can be used for managing the cache. Note that these values apply per Apache process.
New features as of 0.9.11.3
.htaccess files and Directory scopes
The simplest (not the most efficient) way to disable mod_pagespeed for a specific website or directory, or to apply other site-specific settings, is to use the .htaccess file:
Just place this at the top of an .htaccess file in the directory for which you want to disable ModPagespeed. Commands can also be targeted using the <Directory> grouping option.
See also Bug Reports for some background and commentary.
Restricting Resouce Rewriting Via Wildcards
We can now tell mod_pagespeed to avoid processing certain requests. For example, to keep the Serf spider from ever requesting URLs ending in captcha.png - in any website - we add the following to the main configuration file:
There is also a ModPagespeedAllow directive. The regular expressions match the 'fully expanded URL', so should start with 'http://' or the wildcard *. Each Allow/Disallow directive will take priority over those preceding.
New features as of 0.9.16.9
Domain sharding is the practice of splitting your website content, even when it comes from the same location, over a number of different domains or subdomains.
Your browser limits the number of files that can be downloaded from a single domain at the same time, and additional items have to wait for one to finish. By sourcing page elements from multiple domains, you allow for more downloads to occur at once thus reducing wait times.
For details, read the official documentation on Sharding Domains.
You can also find information there on "Authorizing Domains", "Mapping Origin Domains" and "Mapping Rewrite Domains". These are all advanced features requiring Apache server configuration.
Related Articles - ModPagespeed
- System Apache mod_pagespeed settings
- System Apache mod_pagespeed issues