Bzip2 is used to compress a file in order to reduce disk space, it is quite popular in Linux and UNIX operating systems for this reason. Bzip2 has been around since the late 1990s and is still widely used today. It may be preferable over gzip as it can produce smaller compressed files, at the cost of additional memory and processing time.
We are going to cover 10 examples of bzip2 here, showing you common tasks that can be completed and just how easy it is to use.
Before starting you will need to have the bzip2 package installed, this may already be installed by default, however you can install it now if required.
yum install bzip2
apt-get install bzip2
Example Bzip2 Commands
1. Compress a single file
This will compress file.txt and create file.txt.bz2, note that this will remove the original file.txt file.
2. Compress multiple files at once
This will compress all files specified in the command, note again that this will remove the original files specified by turning file1.txt, file2.txt and file3.txt into file1.txt.bz2, file2.txt.bz2 and file3.txt.bz2
bzip2 file1.txt file2.txt file3.txt
To instead compress all files within a directory, see example 7 below.
3. Compress a single file and keep the original
You can instead keep the original file and create a compressed copy.
bzip2 -c file.txt > file.txt.bz
The -c flag outputs the compressed copy of file.txt to stdout, this is then sent to file.txt.bz2, keeping the original file.txt file in place.
The version of bzip2 that I am testing with, 1.0.6, which is currently the latest available as of this writing also has the -k option which keeps the original file, so alternatively you could also run the below command to get the same result.
bzip2 -k file.txt
4. Decompress a bzip2 compressed file
To reverse the compression process and get the original file back that you have compressed, you can use the bzip2 command itself or bunzip2 which is also part of the bzip2 package.
bzip2 -d file.txt.bz2
Both of these commands will produce the same result, decompressing file.txt.bz2 to file.txt, removing the compressed file.txt.bz2 file.
Similar to example 3, it is possible to decompress a file and keep the original .bz2 file as below.
bunzip2 -c file.txt.bz2 > file.txt
bunzip2 -k file.txt.bz2
5. List compression information
With the -v or --verbose flag we can see useful information regarding the compression ratio of a file, which shows us how much disk space our compression is saving. Additional ‘v’ flags can be added for more in depth information.
[[email protected] ~]# bzip2 -v linux-3.18.19.tar linux-3.18.19.tar: 6.015:1, 1.330 bits/byte, 83.37% saved, 580761600 in, 96552670 out. [[email protected] ~]# ls -lah -rw-r--r--. 1 root root 554M Jul 22 10:38 linux-3.18.19.tar -rw-r--r--. 1 root root 93M Jul 22 10:38 linux-3.18.19.tar.bz2
In this example, a bzipped copy of the Linux kernel has compressed to 83.87% of its original size, taking up 93MB of space rather than 554MB.
6. Adjust compression level
The level of compression applied to a file using bzip2 can be specified as a value between 1 (less compression) and 9 (best compression). Option 1 sets the block size to 100k, option 2 sets it to 200k, all the way up to option 9 which uses 900k. This is because bzip2 compresses files in blocks, the block size affects the compression ratio and amount of memory needed for compression and decompression.
The below example compares the differences between -1 and -9, as shown the -9 option only takes a few extra seconds and increases the compression level by over 3% while testing with the linux kernel.
[[email protected] ~]# time bzip2 -v -1 linux-3.18.19.tar linux-3.18.19.tar: 5.038:1, 1.588 bits/byte, 80.15% saved, 580761600 in, 115280806 out. real 1m18.487s user 1m18.081s sys 0m0.405s [[email protected] ~]# time bzip2 -v -9 linux-3.18.19.tar linux-3.18.19.tar: 6.015:1, 1.330 bits/byte, 83.37% saved, 580761600 in, 96552670 out. real 1m21.730s user 1m21.505s sys 0m0.219s
-1 can also be specified with the flag --fast, while option -9 can also be specified with the flag --best. By default bzip2 uses a compression level of -9, these options are primarily in place for gzip compatibility, for instance --fast doesn’t make things significantly faster as shown above, while --best invokes the default behaviour.
7. Compress a directory
With the help of the tar command, we can create a tar file containing a whole directory and compress the result with bzip2. We can perform the whole lot in one step, as the tar command allows us to specify a compression method to use.
tar cjvf etc.tar.bz2 /etc/
This example creates a compressed etc.tar.bz2 file of the entire /etc/ directory. The tar flags are as follows, ‘c’ creates a new tar archive, ‘j’ specifies that we want to compress with bzip2, ‘v’ provides verbose information, and ‘f’ specifies the file to create. The resulting etc.tar.bz2 file contains all files within /etc/ compressed using bzip2.
8. Integrity test
The -t or --test flag can be used to check the integrity of a compressed file.
On a normal file, the result will be listed as OK, shown below.
[[email protected] ~]# bzip2 -tv linux-3.18.19.tar.bz2 linux-3.18.19.tar.bz2: ok
I have now manually modified this file with a text editor and added a random value, essentially introducing corruption and it is now no longer valid.
[[email protected] ~]# bzip2 -tv linux-3.18.19.tar.bz2 linux-3.18.19.tar.bz2: data integrity (CRC) error in data You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files.
The compressed .bz2 file makes use of cyclic redundancy check (CRC) in order to detect errors. The CRC value can be viewed by running bzip2 command with the -vv flag, as shown below.
[[email protected] test]# bzip2 -vv file.txt file.txt: block 1: crc = 0x3f1075ca, combined CRC = 0x3f1075ca, size = 8 final combined CRC = 0x3f1075ca 0.174:1, 46.000 bits/byte, -475.00% saved, 8 in, 46 out.
9. Concatenate multiple files
Multiple files can be concatenated into a single .bz2 file.
bzip2 -c file1.txt > files.bz2 bzip2 -c file2.txt >> files.bz2
The files.bz2 now contains the contents of both file1.txt and file2.txt, if you decompress files.bz2 you will get a file named ‘files’ which contains the content of both .txt files. The output is similar to running ‘cat file1.txt file2.txt’. If instead you want to create a single file that contains multiple files you can use the tar command which supports bzip2 compression, as covered above in example 7.
10. Additional commands included with bzip2
The bzip2 package provides some very useful commands for working with compressed files, such as bzcat, bzgrep and bzless/bzmore.
As you can probably tell by the names of the commands, these are essentially the cat, grep, and less/more commands, however they work directly on compressed data. This means that you can easily view or search the contents of a compressed file without having to decompress it and then view or search it in a second step.
[[email protected] test]# bzcat test.txt.bz2 test example text [[email protected] test]# bzgrep exa test.txt.bz2 example
This is especially useful when searching through or reviewing log files which have been compressed during log rotation.
As shown the bzip2 package can be used in a number of helpful ways to compress data and save disk space. For further information on bzip2 you can refer to the bzip2 manual page, or leave a comment below!