Compression Formats in Linux Shell Script

Exclusive offer: get 50% off this eBook here
Linux Shell Scripting Cookbook

Linux Shell Scripting Cookbook — Save 50%

Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes

$26.99    $13.50
by Sarath Lakshman | January 2011 | Cookbooks Linux Servers Open Source

Taking snapshots and backups of data are regular tasks we come across. When it comes to a server or large data storage systems, regular backups are important. It is possible to automate backups via shell scripting. Archiving and compression seems to find usage in the everyday life of a system admin or a regular user. There are various compression formats that can be used in various ways so that best results can be obtained.

In this article by Sarath Lakshman, author of Linux Shell Scripting Cookbook, we will cover the following recipes:

  • Compressing with gunzip (gzip)
  • Compressing with bunzip (bzip)
  • Compressing with lzma
  • Archiving and compressing with zip
  • Heavy compression squashfs fileystem

 

Linux Shell Scripting Cookbook

Linux Shell Scripting Cookbook

Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes

  • Master the art of crafting one-liner command sequence to perform tasks such as text processing, digging data from files, and lot more
  • Practical problem solving techniques adherent to the latest Linux platform
  • Packed with easy-to-follow examples to exercise all the features of the Linux shell scripting language
  • Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible
        Read more about this book      

(For more resources on Linux, see here.)

Compressing with gunzip (gzip)

gzip is a commonly used compression format in GNU/Linux platforms. Utilities such as gzip, gunzip, and zcat are available to handle gzip compression file types. gzip can be applied on a file only. It cannot archive directories and multiple files. Hence we use a tar archive and compress it with gzip. When multiple files are given as input it will produce several individually compressed (.gz) files. Let's see how to operate with gzip.

How to do it...

In order to compress a file with gzip use the following command:

$ gzip filename
$ ls
filename.gz

Then it will remove the file and produce a compressed file called filename.gz.

Extract a gzip compressed file as follows:

$ gunzip filename.gz

It will remove filename.gz and produce an uncompressed version of filename.gz.

In order to list out the properties of a compressed file use:

$ gzip -l test.txt.gz
compressed uncompressed ratio uncompressed_name
35 6 -33.3% test.txt

The gzip command can read a file from stdin and also write a compressed file into stdout.

Read from stdin and out as stdout as follows:

$ cat file | gzip -c > file.gz

The -c option is used to specify output to stdout.

We can specify the compression level for gzip. Use --fast or the --best option to provide low and high compression ratios, respectively.

There's more...

The gzip command is often used with other commands. It also has advanced options to specify the compression ratio. Let's see how to work with these features.

Gzip with tarball

We usually use gzip with tarballs. A tarball can be compressed by using the –z option passed to the tar command while archiving and extracting.

You can create gzipped tarballs using the following methods:

  • Method - 1
    $ tar -czvvf archive.tar.gz [FILES]

    Or:

    $ tar -cavvf archive.tar.gz [FILES]

    The -a option specifies that the compression format should automatically be detected from the extension.

  • Method - 2

    First, create a tarball:

    $ tar -cvvf archive.tar [FILES]

    Compress it after tarballing as follows:

    $ gzip archive.tar

If many files (a few hundreds) are to be archived in a tarball and need to be compressed, we use Method - 2 with few changes. The issue with giving many files as command arguments to tar is that it can accept only a limited number of files from the command line. In order to solve this issue, we can create a tar file by adding files one by one using a loop with an append option (-r) as follows:

FILE_LIST="file1 file2 file3 file4 file5"
for f in $FILE_LIST;
do
tar -rvf archive.tar $f
done
gzip archive.tar

In order to extract a gzipped tarball, use the following:

  • -x for extraction
  • -z for gzip specification

Or:

$ tar -xavvf archive.tar.gz -C extract_directory

In the above command, the -a option is used to detect the compression format automatically.

zcat – reading gzipped files without extracting

zcat is a command that can be used to dump an extracted file from a .gz file to stdout without manually extracting it. The .gz file remains as before but it will dump the extracted file into stdout as follows:

$ ls
test.gz

$ zcat test.gz
A test file
# file test contains a line "A test file"

$ ls
test.gz

Compression ratio

We can specify compression ratio, which is available in range 1 to 9, where:

  • 1 is the lowest, but fastest
  • 9 is the best, but slowest

You can also specify the ratios in between as follows:

$ gzip -9 test.img

This will compress the file to the maximum.

Compressing with bunzip (bzip)

bunzip2 is another compression technique which is very similar to gzip. bzip2 typically produces smaller (more compressed) files than gzip. It comes with all Linux distributions. Let's see how to use bzip2.

How to do it...

In order to compress with bzip2 use:

$ bzip2 filename
$ ls
filename.bz2

Then it will remove the file and produce a compressed file called filename.bzip2.

Extract a bzipped file as follows:

$ bunzip2 filename.bz2

It will remove filename.bz2 and produce an uncompressed version of filename.

bzip2 can read a file from stdin and also write a compressed file into stdout.

In order to read from stdin and read out as stdout use:

$ cat file | bzip2 -c > file.tar.bz2

-c is used to specify output to stdout.

We usually use bzip2 with tarballs. A tarball can be compressed by using the -j option passed to the tar command while archiving and extracting.

Creating a bzipped tarball can be done by using the following methods:

  • Method - 1
    $ tar -cjvvf archive.tar.bz2 [FILES]

    Or:

    $ tar -cavvf archive.tar.bz2 [FILES]

    The -a option specifies to automatically detect compression format from the extension.

  • Method - 2

    First create the tarball:

    $ tar -cvvf archive.tar [FILES]

    Compress it after tarballing:

    $ bzip2 archive.tar

If we need to add hundreds of files to the archive, the above commands may fail. To fix that issue, use a loop to append files to the archive one by one using the –r option.

Extract a bzipped tarball as follows:

$ tar -xjvvf archive.tar.bz2 -C extract_directory

In this command:

  • -x is used for extraction
  • -j is for bzip2 specification
  • -C is for specifying the directory to which the files are to be extracted

Or, you can use the following command:

$ tar -xavvf archive.tar.bz2 -C extract_directory

-a will automatically detect the compression format.

There's more...

bunzip has several additional options to carry out different functions. Let's go through few of them.

Keeping input files without removing them

While using bzip2 or bunzip2, it will remove the input file and produce a compressed output file. But we can prevent it from removing input files by using the –k option.

For example:

$ bunzip2 test.bz2 -k
$ ls
test test.bz2

Compression ratio

We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the least compression, but fast, and 9 is the highest possible compression but much slower).

For example:

$ bzip2 -9 test.img

This command provides maximum compression.

 

Linux Shell Scripting Cookbook Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes
Published: January 2011
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

 

        Read more about this book      

(For more resources on Linux, see here.)

Compressing with lzma

lzma is comparatively new when compared to gzip or bzip2. lzma offers better compression rates than gzip or bzip2. As lzma is not preinstalled on most Linux distros, you may need to install it using the package manager.

How to do it...

In order to compress with lzma use the following command:

$ lzma filename
$ ls
filename.lzma

This will remove the file and produce a compressed file called filename.lzma.

To extract an lzma file use:

$ unlzma filename.lzma

This will remove filename.lzma and produce an uncompressed version of the file.

The lzma command can also read a file from stdin and write the compressed file to stdout.

In order to read from stdin and read out as stdout use:

$ cat file | lzma -c > file.lzma

-c is used to specify output to stdout.

We usually use lzma with tarballs. A tarball can be compressed by using the --lzma option passed to the tar command while archiving and extracting.

There are two methods to create a lzma tarball:

  • Method - 1
    $ tar -cvvf --lzma archive.tar.lzma [FILES]

    Or:

    $ tar -cavvf archive.tar.lzma [FILES]

    The -a option specifies to automatically detect the compression format from the extension.

  • Method - 2

    First, create the tarball:

    $ tar -cvvf archive.tar [FILES]

    Compress it after tarballing:

    $ lzma archive.tar

If we need to add hundreds of files to the archive, the above commands may fail. To fix that issue, use a loop to append files to the archive one by one using the –r option.

There's more...

Let's go through additional options associated with lzma utilities

Extracting an lzma tarball

In order to extract a tarball compressed with lzma compression to a specified directory, use:

$ tar -xvvf --lzma archive.tar.lzma -C extract_directory

In this command, -x is used for extraction. --lzma specifies the use of lzma to decompress the resulting file.

Or, we could also use:

$ tar -xavvf archive.tar.lzma -C extract_directory

The -a option specifies to automatically detect the compression format from the extension.

Keeping input files without removing them

While using lzma or unlzma, it will remove the input file and produce an output file. But we can prevent from removing input files and keep them by using the -k option. For example:

$ lzma test.bz2 -k
$ ls
test.bz2.lzma

Compression ratio

We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the least compression, but fast, and 9 is the highest possible compression but much slower).

You can also specify ratios in between as follows:

$ lzma -9 test.img

This command compresses the file to the maximum.

Archiving and compressing with zip

ZIP is a popular compression format used on many platforms. It isn't as commonly used as gzip or bzip2 on Linux platforms, but files from the Internet are often saved in this format.

How to do it...

In order to archive with ZIP, the following syntax is used:

$ zip archive_name.zip [SOURCE FILES/DIRS]

For example:

$ zip file.zip file

Here, the file.zip file will be produced.

Archive directories and files recursively as follows:

$ zip -r archive.zip folder1 file2

In this command, -r is used for specifying recursive.

Unlike lzma, gzip, or bzip2, zip won't remove the source file after archiving. zip is similar to tar in that respect, but zip can compress files where tar does not. However, zip adds compression too.

In order to extract files and folders in a ZIP file, use:

$ unzip file.zip

It will extract the files without removing filename.zip (unlike unlzma or gunzip).

In order to update files in the archive with newer files in the filesystem, use the -u flag:

$ zip file.zip -u newfile

Delete a file from a zipped archive, by using –d as follows:

$ zip -d arc.zip file.txt

In order to list the files in an archive use:

$ unzip -l archive.zip

squashfs – the heavy compression filesystem

squashfs is a heavy-compression based read-only filesystem that is capable of compressing 2 to 3GB of data onto a 700 MB file. Have you ever thought of how Linux Live CDs work? When a Live CD is booted it loads a complete Linux environment. Linux Live CDs make use of a read-only compressed filesystem called squashfs. It keeps the root filesystem on a compressed filesystem file. It can be loopback mounted and files can be accessed. Thus when some files are required by processes, they are decompressed and loaded onto the RAM and used. Knowledge of squashfs can be useful when building a custom live OS or when required to keep files heavily compressed and to access them without entirely extracting the files. For extracting a large compressed file, it will take a long time. However, if a file is loopback mounted, it will be very fast since the required portion of the compressed files are only decompressed when the request for files appear. In regular decompression, all the data is decompressed first. Let's see how we can use squashfs.

Getting ready

If you have an Ubuntu CD just locate a .squashfs file at CDRom ROOT/casper/filesystem.squashfs. squashfs internally uses compression algorithms such as gzip and lzma. squashfs support is available in all of the latest Linux distros. However, in order to create squashfs files, an additional package squashfs-tools needs to be installed from package manager.

How to do it...

In order to create a squashfs file by adding source directories and files, use:

$ mksquashfs SOURCES compressedfs.squashfs

Sources can be wildcards, or file, or folder paths.

For example:

$ sudo mksquashfs /etc test.squashfs
Parallel mksquashfs: Using 2 processors
Creating 4.0 filesystem on test.squashfs, block size 131072.
[=======================================] 1867/1867 100%
More details will be printed on terminal. They are limited to save space

In order to mount the squashfs file to a mount point, use loopback mounting as follows:

# mkdir /mnt/squash
# mount -o loop compressedfs.squashfs /mnt/squash

You can copy contents by accessing /mnt/squashfs.

There's more...

The squashfs file system can be created by specifying additional parameters. Let's go through the additional options.

Excluding files while creating a squashfs file

While creating a squashfs file, we can exclude a list of files or a file pattern specified using wildcards.

Exclude a list of files specified as command-line arguments by using the -e option. For example:

$ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow

The –e option is used to exclude passwd and shadow files.

It is also possible to specify a list of exclude files given in a file with –ef as follows:

$ cat excludelist
/etc/passwd
/etc/shadow
$ sudo mksquashfs /etc test.squashfs -ef excludelist

If we want to support wildcards in excludes lists, use -wildcard as an argument.

Summary

This article showed several commands used for performing data backup, archiving, compression, and so on, and their usages with practical script examples. It introduced commands such as tar, gzip, bunzip, lzma, squashfs, and much more.


Further resources on this subject:


Linux Shell Scripting Cookbook Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes
Published: January 2011
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


Sarath Lakshman

Sarath Lakshman is a 23 year old who was bitten by the Linux bug during his teenage years. He is a software engineer working in ZCloud engineering group at Zynga, India. He is a life hacker who loves to explore innovations. He is a GNU/Linux enthusiast and hactivist of free and open source software. He spends most of his time hacking with computers and having fun with his great friends. Sarath is well known as the developer of SLYNUX (2005)—a user friendly GNU/Linux distribution for Linux newbies. The free and open source software projects he has contributed to are PiTiVi Video editor, SLYNUX GNU/Linux distro, Swathantra Malayalam Computing, School-Admin, Istanbul, and the Pardus Project. He has authored many articles for the Linux For You magazine on various domains of FOSS technologies. He had made a contribution to several different open source projects during his multiple Google Summer of Code projects. Currently, he is exploring his passion about scalable distributed systems in his spare time. Sarath can be reached via his website http://www.sarathlakshman.com.

Books From Packt


Linux Email: Set up and Run a Small Office Email Server
Linux Email: Set up and Run a Small Office Email Server

Linux Thin Client Networks Design and Deployment
Linux Thin Client Networks Design and Deployment

Linux Shell Scripting Cookbook
Linux Shell Scripting Cookbook

Scalix: Linux Administrator's Guide
Scalix: Linux Administrator's Guide

Linux Email: Set up and Run a Small Office Email Server
Linux Email: Set up and Run a Small Office Email Server

OpenVPN 2 Cookbook
OpenVPN 2 Cookbook

Getting started with Audacity 1.3
Getting started with Audacity 1.3

Hacking Vim 7.2
Hacking Vim 7.2


No votes yet
Nice by
thanks.. nice tut..

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
n
G
j
4
t
q
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software