10 Simple Bzip2 Examples

Bzip2 is used to compress a file in order to reduce disk space, it is quite popular in Linux and UNIX operating systems for this reason. Bzip2 has been around since the late 1990s and is still widely used today. It may be preferable over gzip as it can produce smaller compressed files, at the cost of additional memory and processing time.

We are going to cover 10 examples of bzip2 here, showing you common tasks that can be completed and just how easy it is to use.

Requirements

Before starting you will need to have the bzip2 package installed, this may already be installed by default, however you can install it now if required.

RHEL:

yum install bzip2

Debian:

apt-get install bzip2

Example Bzip2 Commands

  • 1. Compress a single file
    This will compress file.txt and create file.txt.bz2, note that this will remove the original file.txt file.

    bzip2 file.txt
    
  • 2. Compress multiple files at once
    This will compress all files specified in the command, note again that this will remove the original files specified by turning file1.txt, file2.txt and file3.txt into file1.txt.bz2, file2.txt.bz2 and file3.txt.bz2

    bzip2 file1.txt file2.txt file3.txt
    

    To instead compress all files within a directory, see example 7 below.

  • 3. Compress a single file and keep the original
    You can instead keep the original file and create a compressed copy.

    bzip2 -c file.txt > file.txt.bz
    

    The -c flag outputs the compressed copy of file.txt to stdout, this is then sent to file.txt.bz2, keeping the original file.txt file in place.

    The version of bzip2 that I am testing with, 1.0.6, which is currently the latest available as of this writing also has the -k option which keeps the original file, so alternatively you could also run the below command to get the same result.

    bzip2 -k file.txt
    
  • 4. Decompress a bzip2 compressed file
    To reverse the compression process and get the original file back that you have compressed, you can use the bzip2 command itself or bunzip2 which is also part of the bzip2 package.

    bzip2 -d file.txt.bz2
    

    OR

    bunzip2 file.txt.bz2
    

    Both of these commands will produce the same result, decompressing file.txt.bz2 to file.txt, removing the compressed file.txt.bz2 file.

    Similar to example 3, it is possible to decompress a file and keep the original .bz2 file as below.

    bunzip2 -c file.txt.bz2 > file.txt
    

    OR

    bunzip2 -k file.txt.bz2
    
  • 5. List compression information
    With the -v or --verbose flag we can see useful information regarding the compression ratio of a file, which shows us how much disk space our compression is saving. Additional ‘v’ flags can be added for more in depth information.

    [root@centos ~]# bzip2 -v linux-3.18.19.tar
      linux-3.18.19.tar:  6.015:1,  1.330 bits/byte, 83.37% saved, 580761600 in, 96552670 out.
    
    [root@centos ~]# ls -lah
    -rw-r--r--. 1 root root 554M Jul 22 10:38 linux-3.18.19.tar
    -rw-r--r--. 1 root root  93M Jul 22 10:38 linux-3.18.19.tar.bz2
    

    In this example, a bzipped copy of the Linux kernel has compressed to 83.87% of its original size, taking up 93MB of space rather than 554MB.

  • 6. Adjust compression level
    The level of compression applied to a file using bzip2 can be specified as a value between 1 (less compression) and 9 (best compression). Option 1 sets the block size to 100k, option 2 sets it to 200k, all the way up to option 9 which uses 900k. This is because bzip2 compresses files in blocks, the block size affects the compression ratio and amount of memory needed for compression and decompression.

    The below example compares the differences between -1 and -9, as shown the -9 option only takes a few extra seconds and increases the compression level by over 3% while testing with the linux kernel.

    [root@centos ~]# time bzip2 -v -1 linux-3.18.19.tar
      linux-3.18.19.tar:  5.038:1,  1.588 bits/byte, 80.15% saved, 580761600 in, 115280806 out.
    
    real    1m18.487s
    user    1m18.081s
    sys     0m0.405s
    
    [root@centos ~]# time bzip2 -v -9 linux-3.18.19.tar
      linux-3.18.19.tar:  6.015:1,  1.330 bits/byte, 83.37% saved, 580761600 in, 96552670 out.
    
    real    1m21.730s
    user    1m21.505s
    sys     0m0.219s
    
    

    -1 can also be specified with the flag --fast, while option -9 can also be specified with the flag --best. By default bzip2 uses a compression level of -9, these options are primarily in place for gzip compatibility, for instance --fast doesn’t make things significantly faster as shown above, while --best invokes the default behaviour.

  • 7. Compress a directory
    With the help of the tar command, we can create a tar file containing a whole directory and compress the result with bzip2. We can perform the whole lot in one step, as the tar command allows us to specify a compression method to use.

    tar cjvf etc.tar.bz2 /etc/
    

    This example creates a compressed etc.tar.bz2 file of the entire /etc/ directory. The tar flags are as follows, ‘c’ creates a new tar archive, ‘j’ specifies that we want to compress with bzip2, ‘v’ provides verbose information, and ‘f’ specifies the file to create. The resulting etc.tar.bz2 file contains all files within /etc/ compressed using bzip2.

  • 8. Integrity test
    The -t or --test flag can be used to check the integrity of a compressed file.

    On a normal file, the result will be listed as OK, shown below.

    [root@centos ~]# bzip2 -tv linux-3.18.19.tar.bz2
      linux-3.18.19.tar.bz2: ok
    

    I have now manually modified this file with a text editor and added a random value, essentially introducing corruption and it is now no longer valid.

    [root@centos ~]# bzip2 -tv linux-3.18.19.tar.bz2
      linux-3.18.19.tar.bz2: data integrity (CRC) error in data
    
    You can use the `bzip2recover' program to attempt to recover
    data from undamaged sections of corrupted files.
    

    The compressed .bz2 file makes use of cyclic redundancy check (CRC) in order to detect errors. The CRC value can be viewed by running bzip2 command with the -vv flag, as shown below.

    [root@centos test]# bzip2 -vv file.txt
      file.txt:
        block 1: crc = 0x3f1075ca, combined CRC = 0x3f1075ca, size = 8
        final combined CRC = 0x3f1075ca
        0.174:1, 46.000 bits/byte, -475.00% saved, 8 in, 46 out.
    
  • 9. Concatenate multiple files
    Multiple files can be concatenated into a single .bz2 file.

    bzip2 -c file1.txt > files.bz2
    bzip2 -c file2.txt >> files.bz2
    

    The files.bz2 now contains the contents of both file1.txt and file2.txt, if you decompress files.bz2 you will get a file named ‘files’ which contains the content of both .txt files. The output is similar to running ‘cat file1.txt file2.txt’. If instead you want to create a single file that contains multiple files you can use the tar command which supports bzip2 compression, as covered above in example 7.

  • 10. Additional commands included with bzip2
    The bzip2 package provides some very useful commands for working with compressed files, such as bzcat, bzgrep and bzless/bzmore.

    As you can probably tell by the names of the commands, these are essentially the cat, grep, and less/more commands, however they work directly on compressed data. This means that you can easily view or search the contents of a compressed file without having to decompress it and then view or search it in a second step.

    [root@centos test]# bzcat test.txt.bz2
    test
    example
    text
    [root@centos test]# bzgrep exa test.txt.bz2
    example
    

    This is especially useful when searching through or reviewing log files which have been compressed during log rotation.

Summary

As shown the bzip2 package can be used in a number of helpful ways to compress data and save disk space. For further information on bzip2 you can refer to the bzip2 manual page, or leave a comment below!

  1. Thank you! I am compressing log files and bzip2 provided the best compression and comparable times to xz lrzip and others.

  2. when i am using bzip2 command in my script for 50Gb data compression, after 80% compression it’s stopped throwing no specific errors.when i run this script manually it’s running fine.

    may i know the reason?

    script used is #bzip2 file.txt

    • That’s a strange one, what is actually running the script? Is it scheduled by cron or something else? My first guess would be that there’s some sort of timeout regarding what ever is automatically starting the script.

  3. How can i use bzip2 to compress all files in a directory and exclude one file

    • I’d use the tar command with the -j option for bzip2 with the –exclude option, check out the man page for tar as there are quite a few ways you can exclude.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>