Python gzip Module

Learn via video course
FREE
View all courses
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Topics Covered

Overview

The below module covers the Python gzip module in detail where we learn about how we can make use of the Python gzip module for compressing and decompressing the file or folder. The Python gzip module provides the GzipFile class, where we get functions like( open(), compress(), and decompress()) which automatically compresses or decompress the data in the file or folder to make it look like an ordinary file object.

Python gzip Module

Python gzip module provides the user interface to GZip application. It is this Python gzip application, used for the compression and decompression of files, which is a part of the GNU project. The algorithm behind the gzip data compression is based on the zlib module.

By far it has been seen as one of the easiest and most effective ways to compress as well as decompress files. The GzipFile class reads and writes gzip-format files with its convenience functions (open(), compress(), and decompress() ) so that the data can be conveniently and automatically compressed or decompressed, to make it look like an ordinary file object

QuickNote: Any additional file formats that can be decompressed(via compress or pack) by the gzip and gunzip programs, are not supported with the Python gzip module.

Need for Python gzip Module

Though we shall be learning how we can use the Python gzip module in various ways, the question of where we find out the need for the Python gzip module comes into the picture.

The answer is data compression. With vasts amount of data getting generated each minute, the process of re-encoding, and re-arranging data with fewer bits than the original to minimize its size is called data compression. The algorithm helps to find the most efficient and effective way to minimize the size of the data such as representing an original string as a smaller string of bits by converting them using a reference dictionary.

Data compression helps to reduce the size of a text file to up to 50% of its total size. Larger files are transferred in a compressed format like ZIP, RAR, 7z, or MP3 over the internet. Data compression helps to reduce the amount of time it takes while transfering a file by lowering the physical size and taking up less storage space and memory. Data compression has many advantages, includes such as reduced storage hardware, data transfer time, and communication bandwidth which eventually save a lot of money. The only downside seen so far with data compression is that it requires more processing resources to compress the specified data whereas the compression vendors place a premium on optimizing the speed and resource efficiency to reduce the impact of intensive compression jobs.

Code Output and Explanation

Below is the code, its output, and an elaborative explanation to understand the Python gzip module.

Code:

Output:

Explanation: As seen above, we have shown how under the Python gzipClass class we can write various functions. We have created avariious object functions to suit the different scenarios, like compressing the data and writing to a file, reading data from a gzip file, capturing the size of a gzip file, compressing the input stream of data and decompressing the input stream of compressed data.

gzip.compress() Function

Let us discuss the Python gzip, and gzip.compress() Function with code, its output, and elaborative explanation.

Intro: Used to compress data to minimize the size from its original size. The return value is the byte object. The default compression level is at 9.

Syntax:

Parameters:

The parameters that are used for the gzip.compress() Functions are:

  • data – Users need to specify the data which shall be compressed
  • compresslevel - The default is 9. The compresslevel argument is defined as an integer from 0 to 9 that controls the level of compression where 1 is the fastest and produces the least compression, and 9 is the slowest and produces the most compression. If 0 is given it means no compression.
  • mtime – The mtime argument is defined as an optional numeric timestamp that is written to the last modification time field in the stream while compressing. It must only be provided during the compression mode. Its initial value is None or if it's omitted then, the current time is used

Code:

Output:

Explanation: As seen above, we took an original string and calculated its length. Then we used the Python gzip.compress() function to compress the size of the data. We see that when we calculate the length of the compressed string it comes out more than the original as during compression the data gets encrypted which can be observed with the zlib.compress() function.

gzip.decompress() Function

Let us discuss Python gzip, and the gzip.decompress() Function with code, its output, and an elaborative explanation.

Intro: Used to decompress the data returns the bytes of the decompressed data. This Python gzip.decompress() function is capable of decompressing the multi-member gzip data (that is, multiple gzip blocks concatenated together).

Syntax:

Parameters: The parameters that are used for the gzip.compress() Functions are:

  • data – Users need to specify the data that shall be decompressed

Code:

Output:

Explanation: As seen above, we took an original string and calculated its length. Then we used the Python gzip.compress() function to compress the size of the data. We then used decompressed the compressed string via the Python gzip.decompress() function. We calculated its length which came out to be the same as the original. During compression, the data gets encrypted which increases its length as seen above.

Command Line Interface

We learned so far, how the Python gzip module offers an easy command line interface where you can efficiently compress or decompress the files. The input files are kept as it is when you execute the Python gzip module.

Below are the various Command Line Interface options, you can use while working with the Python gzip module.

CommandFunction
-d/–decompressHelps to decompress the defined file
-h/–helphelps to show the help message for any new concerns
file/ -fWhen the file is not defined, it then reads it from sys.stdin
–bestit shows the slowest compression method (offers the best compression)
–fastit shows the fastest compression method (offers less compression)

How can you use these CLI commands are shown below

Examples of Usage

A few examples with various scenarios covered are mentioned below:

Use Case: How to GZIP compress a binary string:

Use Case: How to read a compressed file.

Use Case: How to GZIP compress an existing file:

Use Case: How to create a compressed GZIP file:

QuickNote: The zlib module is the basic data compression module that is usually needed to support the gzip file format.

FAQs

Below are a few frequently asked questions related to Python gzip modue.

Q: What is the compression level parameter in the gzip.compress() function?

A: The default is 9. The compressive argument is defined as an integer from 0 to 9 that controls the level of compression where 1 is the fastest and produces the least compression, and 9 is the slowest and produces the most compression. If 0 is given it means no compression.

Q: What is data compression?

A: With a vast amount of data getting generated each minute, the process of re-encoding, and re-arranging data with fewer bits than the original to minimize its size is called data compression. The algorithm helps to find the most efficient and effective way to minimize the size of the data such as representing an original string as a smaller string of bits by converting them using a reference dictionary.

Q: Are all the file formats supported with the gzip() module for compressing?

A: Any additional file formats that can be decompressed(via compress or pack) by the gzip and gunzip programs are not supported with the Python gzip module.

Conclusion

  • The process of re-encoding, and re-arranging data with fewer bits than the original to minimize its size is called data compression. The algorithm helps to find the most efficient and effective way to minimize the size of the data.
  • The Python gzip.decompress() function is used to decompress the data and returns the bytes of the decompressed data. This Python gzip.decompress() function is capable of decompressing the multi-member gzip data (that is, multiple gzip blocks concatenated together).
  • The Python gzip.compress() function is used to compress data to minimize the size from its original size. The return value is the byte object. The default compression level is at 9.

See Also: