One of the services that CloudFlare provides to paying customers is called Polish. Polish automatically recompresses images cached by CloudFlare to ensure that they are as small as possible and can be delivered to web browsers as quickly as possible.
We've recently rolled out a new version of Polish that uses updated techniques (and was completely rewritten from a collection of programs into a single executable written in Go). As part of that rewrite we looked at the performance of the recently released mozjpeg 2.0 project for JPEG compression.
To get a sense of its performance (both in terms of compression and in terms of CPU usage) when compared to libjpeg-turbo I randomly selected 10,000 JPEG images (totaling 2,564,135,285 bytes for an average image size of about 256KB) cached by CloudFlare and recompressed them using the jpegtran program provided by libjpeg-turbo 1.3.1 and mozjpeg 2.0. The exact command used in both cases was:
jpegtran -outfile out.jpg -optimise -copy none in.jpg
Of the 10,000 images in cache, mozjpeg 2.0 failed to make 691 of them any smaller compared with 3,471 for libjpeg-turbo. So mozjpeg 2.0 was significantly better at recompressing images.
On average images were compressed by 3.0% using mozjpeg 2.0 (ignoring images that weren't compressed at all) and by 2.5% using libjpeg-turbo (again ignoring images that weren't compressed at all). This seems similar to Mozilla's reported 5% improvement compared to libjpeg-turbo.
So, mozjpeg 2.0 achieved better compression on this set of files and compressed many more of them (93.1% vs. 65.3%).
As example, here's an image, not from the sample set. Its original size was 1,984,669 bytes. When compressed with libjpeg-turbo it is 1,956,200 bytes (2.4% removed); when compressed with mozjpeg 2.0 it is 1,874,491 (5.6% removed). (The mozjpeg 2.0 version is 4.2% smaller than the libjpeg-turbo version).
The distribution of compression ratios seen using mozjpeg 2.0 is shown below.
This improved compression comes at a price. The run time for the complete compression (including where compression failed to create an improvement) was 273 seconds for libjpeg-turbo and 474 seconds for mozjpeg 2.0. So mozjpeg 2.0 took about 1.7x longer, but, of course, achieved better compression on more of the files.
Because we'd like to get the highest compression possible we've assigned an engineer internally to look at optimization of mozjpeg 2.0 (specifically for the Intel processors we use) and will contribute back improvements to the project.
We're investing quite heavily in optimization projects (such as improvements to gzip (code here) and LuaJIT, and things like a very fast Aho-Corasick implementation). If you're interested in low-level optimization for Intel processors, think about joining us.
PS After this blog post was published some folks pointed out that the best comparison would be when the -progessive flag is used. I went back and checked and I had in fact done that in the 10,000 file test and so the data there is correct. However, the command shown above is not. The actual command used was:
jpegtran -outfile out.jpg -optimise -progressive -copy none in.jpg
Also, the image shown above was generated using the incorrect command because I did it outside the 10,000 file test. That paragraph above should say:
As example, here's an image, not from the sample set. Its original size was 1,984,669 bytes. When compressed with libjpeg-turbo it is 1,885,090 bytes (4% removed); when compressed with mozjpeg 2.0 it is 1,874,491 (5.6% removed). (The mozjpeg 2.0 version is 0.6% smaller than the libjpeg-turbo version).