Gzip vs Bzip2 vs XZ Performance Comparison

Gzip, Bzip2 and XZ are all popular compression tools used in UNIX based operating systems, but which should you use? Here we are going to benchmark and compare them against each other to get an idea of the trade off between the level of compression and time taken to achieve it.

For further information on how to use gzip, bzip2 or xz see our guides below:

The Test Server

The test server was running CentOS 7.1.1503 with kernel 3.10.0-229.11.1 in use, all updates to date are fully applied. The server had 4 CPU cores and 16GB of available memory, during the tests only one CPU core was used as all of these tools run single threaded by default, while testing this CPU core would be fully utilized. With XZ it is possible to specify the amount of threads to run which can greatly increase performance, for further information see example 9 here.

All tests were performed on linux-3.18.19.tar, a copy of the Linux kernel from kernel.org. This file was 580,761,600 Bytes in size prior to compression.

The Benchmarking Process

The linux-3.18.19.tar file was compressed and decompressed 9 times each by gzip, bzip2 and xz at each available compression level from 1 to 9. A compression level of 1 indicates that the compression will be fastest but the compression ratio will not be as high so the file size will be larger. Compression level 9 on the other hand is the best possible compression level, however it will take the longest amount of time to complete.

There is an important trade off here between the compression levels between CPU processing time and the compression ratio. To get a higher compression ratio and save a greater amount of disk space, more CPU processing time will be required. To save and reduce CPU processing time a lower compression level can be used which will result in a lower compression ratio, using more disk space.

Each time the compression or decompression command was run, the ‘time’ command was placed in front so that we could accurately measure how long the command took to execute.

Below are the commands that were run for compression level 1:

time bzip2 -1v linux-3.18.19.tar
time gzip -1v linux-3.18.19.tar
time xz -1v linux-3.18.19.tar

All commands were run with the time command, verbosity and the compression level of -1 which was stepped through incrementally up to -9. To decompress, the same command was used with the -d flag.

The versions pf these tools were gzip 1.5, bzip2 1.0.6, and xz (XZ Utils) 5.1.2alpha.

Results

The raw data that the below graphs have been created from has been provided in tables below and can also be accessed in this spreadsheet.

Compressed Size

The table below indicates the size in bytes of the linux-3.18.19.tar file after compression, the first column numbered 1..9 shows the compression level passed in to the compression tool.

gzipbzip2xz
1153617925115280806105008672
2146373307107406491100003484
314128288810378754797535320
413095176110148313592377556
512558162610002695385332024
61234342389881538483592736
71228088619796656082445064
81224120999714607281462692
91223499849655267080708748

Compression Time

First we’ll start with the compression time, this graph shows how long it took for the compression to complete at each compression level 1 through to 9.

gzipbzip2xz
113.21378.83148.473
214.00377.55765.203
316.34178.27997.223
417.80179.202196.146
522.72280.394310.761
630.88481.516383.128
737.54982.199416.965
848.58481.576451.527
954.30782.812500.859

Gzip vs Bzip2 vs XZ Compression Time

Gzip vs Bzip2 vs XZ Compression Time

So far we can see that gzip takes slightly longer to complete as the compression level increases, bzip2 does not change very much, while xz increases quite significantly after a compression level of 3.

Compression Ratio

Now that we have an idea of how long the compression took we can compare this with how well the file compressed. The compression ratio represents the percentage that the file has been reduced to. For example if a 100mb file has been compressed with a compression ratio of 25% it would mean that the compressed version of the file is 25mb.

gzipbzip2xz
126.4519.818.08
225.218.4917.21
324.3217.8716.79
422.5417.4715.9
521.6217.2214.69
621.2517.0114.39
721.1416.8714.19
821.0716.7314.02
921.0616.6313.89

Gzip vs Bzip2 vs XZ Compression Ratio

Gzip vs Bzip2 vs XZ Compression Ratio

The overall trend here is that with a higher compression level applied, the lower the compression ratio indicating that the overall file size is smaller. In this case xz is always providing the best compression ratio, closely followed by bzip2 with gzip coming in last, however as shown in the compression time graph xz takes a lot longer to get these results after compression level 3.

Compression Speed

The compression speed in MB per second can also be observed.

gzipbzip2xz
143.957.3711.98
241.477.498.9
335.547.425.97
432.637.332.96
525.567.221.87
618.87.121.52
715.477.071.39
811.957.121.29
910.697.011.16

Gzip vs Bzip2 vs XZ Compression Speed

Gzip vs Bzip2 vs XZ Compression Speed

Decompression Time

Next up is how long each file compressed at a particular compression level took to decompress.

gzipbzip2xz
16.77124.2313.251
26.58124.10112.407
36.3923.95511.975
46.31324.20411.801
56.15324.51311.08
66.07824.76810.911
76.05723.19910.781
86.03325.42610.676
96.02623.48610.623

Gzip vs Bzip2 vs XZ Decompression Time

Gzip vs Bzip2 vs XZ Decompression Time

In all cases the file decompressed faster if it had been compressed with a higher compression level. Therefore if you are going to be serving out a compressed file over the Internet multiple times it may be worth compressing it with xz with a compression level of 9 as this will both reduce bandwidth over time when transferring the file, and will also be faster for everyone to decompress.

Decompression Speed

The decompression speed in MB per second can also be observed.

185.7723.9743.83
288.2524.146.81
390.924.2448.5
491.992449.21
594.3923.752.42
695.5523.4553.23
795.8825.0353.87
896.2622.8454.4
996.3824.7254.67

Gzip vs Bzip2 vs XZ Decompression Speed

Gzip vs Bzip2 vs XZ Decompression Speed

Performance Differences and Comparison

By default when the compression level is not specified, gzip uses -6, bzip2 uses -9 and xz uses -6. The reason for this is pretty clear based on the results. For gzip and xz -6 as a default compression method provides a good level of compression yet does not take too long to complete, it’s a fair trade off point as higher compression levels take longer to process. Bzip2 on the other hand is best used with the default compression level of 9 as is also recommended in the manual page, the results here confirm this, the compression ratio increases but the time taken is almost the same and differs by less than a second between levels 1 to 9.

In general xz achieves the best compression level, followed by bzip2 and then gzip. In order to achieve better compression however xz usually takes the longest to complete, followed by bzip2 and then gzip.

xz takes a lot more time with its default compression level of 6 while bzip2 only takes a little longer than gzip at compression level 9 and compresses a fair amount better, while the difference between bzip2 and xz is less than the difference between bzip2 and gzip making bzip2 a good trade off for compression.

Interestingly the lowest xz compression level of 1 results in a higher compression ratio than gzip with a compression level of 9 and even completes faster. Therefore using xz with a compression level of 1 instead of gzip for a better compression ratio in a faster time.

Based on these results, bzip2 is a good middle ground for compression, gzip is only a little faster while xz may not really worth it at its higher default compression ratio of 6 as it takes much longer to complete for little extra gain.

However decompressing with bzip2 takes much longer than xz or gzip, xz is a good middle ground here while gzip is again the fastest.

Conclusion

So which should you use? It’s going to come down to using the right tool for the job and the particular data set that you are working with.

If you are interactively compressing files on the fly then you may want to do this quickly with gzip -6 (default compression level) or xz -1, however if you’re configuring log rotation which will run automatically over night during a low resource usage period then it may be acceptable to use more CPU resources with xz -9 to save the greatest amount of space possible. For instance kernel.org compress the Linux kernel with xz, in this case spending extra time to compress the file well once makes sense when it will be downloaded and decompressed thousands of times resulting in bandwidth savings yet still decent decompression speeds.

Based on the results here, if you’re simply after being able to compress and decompress files as fast as possible with little regard to the compression ratio, then gzip is the tool for you. If you want a better compression ratio to save more disk space and are willing to spend extra processing time to get it then xz will be best to use. Although xz takes the longest to compress at higher compression levels, it has a fairly good decompression speed and compresses quite fast at lower levels. Bzip2 provides a good trade off between compression ratio and processing speed however it takes the longest to decompress so it may be a good option if the content that is being compressed will be infrequently decompressed.

In the end the best option will come down to what you’re after between processing time and compression ratio. With disk space continually becoming cheaper and available in larger sizes you may be fine with saving some CPU resources and processing time to store slightly larger files. Regardless of the tool that you use, compression is a great resource for saving storage space.

Leave a comment ?

30 Comments.

  1. Hi Jarrod –
    Al Wegener here, a serial entrepreneur living near Santa Cruz (Silicon Valley). I’m a compression researcher & inventor, and I wanted to compliment you on your well-written, well-researched comparison between gzip, bzip2, and XZ. If you’re ever in the SF Bay Area, please ping me and let’s have a beer !
    Cheers,
    Al

  2. You did a great job clearly laying out what you found about the various compression programs.

  3. Jarrod –
    Excellent article. We are doing a research project aimed at medical image (x-ray, CT, etc.) compression, focusing on lossless methods. We pre-process the medical images to optimize compression in commercial or open-source compressors, and also have algorithms that select certain image sub-areas for lossy or lossless compression. From your data, it looks like xz is worth testing (have already tried Bzip and gzip).

  4. Thank you for this very interesting comparison!

  5. I did some extremely thorough tests of this and found that xz -1 is nearly always a better drop in for gzip -9 with quite impressive gains for little to no increase in cost.

    If you run tests on a lot of data then you might notice that xz is a bit special. Sometimes higher compression levels will slightly worsen compression ratio. They turn on differing options at various levels but usually result in a net improvement but don’t guarantee it.

    It is also worth noting that the memory usage for xz can be extreme.

    As for how bzip2 relates to this I am not sure. xz makes bzip2 irrelevant so I have never bothered with it.

  6. This was a rather interesting read and for sure it will be referenced by many for years to come.

    However, I wonder if by adding in the parallel compressor/decompressor lbzip2, the conclusions might sway in the favor of it? As far as I know, xz only has parallel compression but not decompression.

    There was also a post by Antonio Diaz Diaz concerning the longevity of data compressed by xz vs say bzip2. This would be quite interesting for archivists, but I imagine backups of the original uncompressed data obviates the concern.

    http://www.nongnu.org/lzip/xz_inadequate.html

    Does anyone have any contrasting opinion with regard to these two points?

  7. You can use xz –threads=0 for solve speed problem
    For me with 8 core was a huge difference in time

  8. Excellent work here but I think redefining compression ratio to meet your explanation is a little confusing i.e a representation that compresses a 10MB file to 2MB would yield a space savings of 1 – 2/10 = 0.8, often notated as a percentage, 80% not 20%.

    https://en.wikipedia.org/wiki/Data_compression_ratio

  9. Somehow all the benchmarks of xz miss that it also has a “-0” compression level, which is significantly faster than “-1”.

  10. Hiya,

    In your conclusion it may be a good idea to under which conditions is it is wise to change compressors.
    It seems like if you contemplate using gzip at level 8+ it may be better to switch to xz as it compresses harder at similar speeds. bzip2 seems to be faster at higher compression levels .. so my conclusion
    use gzip to compression level 8, then switch to xz for level 1-3, then for anything better switch to bzip2

  11. As someone already said, this would be quite interesting for archivists.

  12. Nice comparison! The share buttons not working for me :-(

    Thanks!

  13. This is an excellent article!

  14. The Compression Ratio was very confusing. That’s not what compression ratio means.

  15. Great article! This answered a lot of questions for me. Will probably be using xz in the future :) Thanks!

  16. Looks like an interesting read with regards to why NOT to use ‘xz’. Note this may be biased, I did not read it.

    https://www.nongnu.org/lzip/xz_inadequate.html

  17. How can I get the compression speed for each level of compression in gzip?

  18. can u update your test with PIGZ as gzip compression replacement?

  19. Good article.
    Suggesting one improvement: add a chart that will show the 3 method modes overall performance.
    X scale should be time, y scale should be compression ration.
    Will allow understanding trade off between options

  20. If you are managing a large consumer-facing website, I would not follow the recommendation at the end of the article. You will want to compress all of your log files on all your servers “as fast as possible.” If space is an issue, then keep fewer local log files! You should only be keeping enough local log files for quick diagnostics during short network outages. Logs data should be pushed to remote systems with services like syslog, datadog, splunk, etc., for long-term retention and analysis.

    Standalone servers are rarely “idle”. I personally had an issue with batch jobs not completing in time because the log file compression jobs were hogging the CPU.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>