glob() in Python
Overview
The term “glob” refers to methods for matching files that include specific patterns in line with UNIX shell-related expansion rules.
In Python, the glob module is used similarly to find, locate, and search for all of the files that are present in a system. This comparable pattern in glob might be anything from a file extension to the prefix of a file name to any likeness between two or more system files.
You must be familiar with the foundations of Python programming, Python os module, regex syntaxes, and a few UNIX or Bash command lines to read this tutorial.
Introduction to glob Module in Python
The glob module in Python does not require separate installations and comes with every default Python installation.
Although glob is a module that comes with the default Python installation, it still requires a separate import statement.
After the import, we have to refer to every function and method in this module with the prefix glob.
Let's see how to use glob in Python.
This basic example will make use of a function inside the glob class called glob() itself.
This snippet of code prints all the files within the sample_files folder located in the home folder of the glob directory.
For example, the sample_files folder contained two text files named random_text_1.txt and random_text_2.txt. And files are a variable instantiated(created from the class as an instance) as an object of the glob class and return a list as the output.
We will see more about the glob() function in Python later.
The output would look like this.
Output:
Pattern Matching Functions
In Python, we have several functions that we can use to list files that match a particular pattern. With the help of these functions, we can get the result list of files that will match the given pattern in the specified directory in arbitrary order in the output list.
Let's look at some of these functions.
- fnmatch()
- scandir()
The functions present in the above list i.e. fnmatch.fnmatch() and os.scandir() are used to perform pattern matching and not by calling subshells in Python. These two functions perform pattern matching and get a list of all filenames and also filenames in an arbitrary order. This is a piece of information that the global module considers a special case for all files whose names start with a dot (.), which is very unlikely in the fnmatch.fnmatch() function.
Let's look at each of these functions.
First is the fnmatch module. For matching Unix shell-style wildcards, we use the fnmatch module. A single file name is compared to a pattern using fnmatch(), which returns TRUE if they match and FALSE otherwise. When the operating system makes use of a case-sensitive file system, the comparison is case-sensitive.
In shell-style wildcards, the following special characters and their functions are used: ‘*’ – matches everything ‘?’ – matches any single character ‘[seq]’ – matches any character in the sequence ‘[!seq]’ – matches any character, not in sequence
The function fnmatch.fnmatch(filename, pattern) checks to see if the filename string provided matches the pattern string and returns a boolean result. Both parameters will be normalized to all lowercase or all uppercase letters if the operating system is case-insensitive before the comparison is made. Example: A script that searches all files with the characters ".py".
Let's see how to use it.
Output:
The output returns all the Python files with the extension .py in the current working directory.
Now let's look at the scandir() function in the os module.
To acquire an iterator of os.DirEntry objects matching the entries in the directory indicated by the specified path, we use Python's os.scandir() function.
Special entries "." and ".." are excluded, and the entries are yielded in an arbitrary order.
Let's see how to use it.
Output:
Explanation: First, we scan the directory to obtain an iterator of objects named os.DirEntry that corresponds to each entry. Following that, we list every file and directory in the given path using the os.scandir() method. entry.is_file() and entry.is_dir() methods are used to determine whether an entry is a file or a directory. Finally, we use the scandir.close() method to close the iterator and release the resources we have gathered. When an error occurs while iterating, the iterator is exhausted, trash collection is initiated, or both, the scandir.close() method is automatically invoked. Python automatically removes unnecessary objects (built-in types or class instances) to clear up memory. Trash or Garbage Collection is the method used by Python to regularly release and recover memory chunks that are no longer needed.
- path.expandvars()
- path.expanduser()
The two functions mentioned in the above list, i.e., The os.path.expandvars() and os.path.expanduser() can be used to expand shell and tilde variables for filename pattern matching tasks. Tilde variable expansion is a GNU reference used in Bash. All characters up to the first unquoted slash (or all characters if there isn't an unquoted slash) are regarded as tilde-prefixes if a word starts with an unquoted tilde character ("). The characters in the tilde-prefix after the tilde are considered a potential name for the user if none of the characters in the prefix are quoted.
Let's take a look at path.expandvars() function in the os module.
Python uses the os.path.expandvars() method to expand the environment variables in the specified path. It substitutes the value of the environment variable name for substrings of the type name in the specified path.
Let's see how we can use it.
Output:
Explanation: This Python program demonstrates how to use the os.path.expandvars() method. First, we import the os.path module. On Windows, in addition to $name and $name, % name and % expansions are supported.
The environment variables are then expanded with the corresponding values in the indicated paths. And then we print the paths with extended environment variables.
The environment variables "HOMEPATH," "USERNAME," and "TEMP" were replaced in the aforementioned example by the os.path.expandvars() method with their associated values.
Now let's look at path.expandusers() function in the os module.
The Python function os.path.expanduser() is used to expand the path component (tilde symbol ~) or user in the specified path to the user's home directory.
Let's see how.
Output:
Explanation: This Python program demonstrates how to use the os.path.expanduser() method. Importing the module os.path comes first. We then use the os.path.expanduser() method to expand an initial component in the provided path, and when it has been expanded, we report the path. The value of the HOME environment variable is then changed. We are using the os.path.expanduser() method to expand the initial component in the same path. After expanding the initial component in the supplied path, the initial user component is directly searched up in the password directory path with a user component at the beginning. We are using the os.path.expanduser() method and we will now expand the original user component in the provided path and report the path after doing so.
Rules of Pattern
Let's be clear that it is not conceivable that anyone believes that we can define or utilize any pattern to carry out the pattern matching filename duty. We are unable to construct or apply any pattern to gather a list of files that share the same characteristics. When defining the pattern for the filename pattern matching methods in the glob module, we must adhere to a specified set of guidelines.
In this section, we'll go over all the guidelines we need to follow when creating a pattern for filename pattern matching methods. Since they are not the main topic of this course, we will only briefly touch on these rules.
In the pattern matching functions of the glob module, we define the following set of rules for the pattern:
- When pattern matching, we must adhere to the entire set of accepted UNIX path expansion rules.
- We cannot define any ambiguous path inside the pattern; the path we define inside the pattern must be either absolute or relative.
- Only two wildcards, or special characters, are permitted inside the pattern: "*," and "?" The usual characters that can be stated inside the pattern are expressed in [].
- The filename segment (which is provided in the functions) is subject to the rules of the pattern for glob module functions, and it terminates at the path separator, i.e., '/' of the files.
Applications of glob() in Python
We've already spoken about how pattern matching is particularly useful for finding related files on our system and how useful it is to us.
The Python glob module has the following mentioned applications, and we can use it in the functions below:
- Sometimes we want to find a file with a specific prefix, a common string in the middle of a lot of file names, or the same specific extension. Now, to complete this work, we might need to build some code that would search the entire directory before producing the result. In its place, the glob module will be quite useful because we can use its features to execute this work quickly and easily while also saving time.
- In addition to this, the Python glob module is incredibly helpful when one of our programs needs to find a list of all the files in a specific file system whose names match a specific pattern. This work is simple to complete with glob Module, and it may be done without opening the program's output in another sub-shell.
Therefore, by examining the Python glob module's applications, we can determine how crucial this module is to Python programmers and developers and where one can use it to simplify the code and save time.
Important Functions in the glob Module
We will now talk about other glob module functions and how they operate within a Python program. Additionally, we will discover how these functions assist us with the pattern matching duty. Let's take a look at the list of functions that are available in the glob module; with their assistance, we can very easily complete the task of filename pattern matching:
-
iglob()
Syntax:
To produce the arbitrary values for the list of files in the output, the iglob() function of the glob module is quite helpful. With the iglob() method, a Python generator can be produced. To list the files in a given directory, we can utilize the Python generator made by the glob module. When called, this method also provides an iterator that iterates through the values (a list of files) without saving all of the filenames at once.
The iglob() function takes three parameters, let's look into what those parameters are.
pathname: (optional) The pathname or directory from which we must gather the list of files with a comparable structure. We can even omit the pathname parameter when working in the same file directory where our Python installation is located as it is a function's optional parameter.
'*' : (mandatory) Allows one to specify the pattern for which the function should gather file names and output a list. The pattern we specify for pattern matching inside the iglob() method, such as the file extension, should begin with the symbol "*."
recursive: (optional) Only accepts boolean values (true or false). The recursive option is used to control whether or not the function uses a recursive method to find file names.
Let's see the iglob() function in action.
Output:
Directly from the glob module, we used the functions glob.iglob() to obtain paths recursively from files, directories, and subdirectories.
-
glob()
Syntax:
The glob() method allows us to obtain a list of files that match a particular pattern (We have to define that specific pattern inside the function). The list that the glob() function returns will be a string that should include a path specification following the path that we have specified inside the function. Without actually saving these values (filenames), the string or iterator for the glob() function delivers the identical value as that returned by the iglob() method.
Let's see how to use it.
Output:
Using the glob() function from the glob module, we obtained the paths recursively from files, directories, and subdirectories.
-
escape()
Syntax:
Since it enables us to escape the specified character sequence, which we defined in the method, escape() becomes highly important. The escape() function comes in very handy for finding files that have specific characters in their file names, which we shall define in the function. By matching a random literal text in the file names that contain that specific character, it will match the sequence.
Let's see how to use the escape() function.
Output:
Here we used the glob.escape() function to find PNG images whose names contain the special characters _, $, and #.
How to Use glob() in Python?
Now that we have seen the functions inside the Python glob module. Let's look into the glob() function in the Python glob module in-depth and see various cases where we can use the glob() function.
- Python glob() method to Search Files
Output:
The above program demonstrates the use of the glob.glob() function for searching files and folders.
-
Python glob() to Search Files Recursively
Output:
Using the glob() function from the glob module, we obtained the paths recursively from files, directories, and subdirectories.
-
Python glob() to Search Files Using Wildcard Characters Before we dive in, let's understand the meaning of the term wildcards. A wildcard is a symbol (*, ? etc) that is used to substitute one or more characters or serve as an alias. Utilizing wildcards mostly serves to streamline search criteria. It is mostly used in computer programs, operating systems, languages, and search engines.
- Match Any Character in File Name Using asterisk (*):
Output:
- Search all files and folders in given directory
Output:
- Match Single character in File Name Using Question Mark(?):
Output:
- Match File Name using a Range of Characters
Output:
- Match Any Character in File Name Using asterisk (*):
-
Using glob() with regex
Output:
-
glob() for Finding Text in Files The glob module is useful for locating text in files as well. The glob module is typically used to locate files with matching names. However, the majority of the time, we only wanted to change a single word in a file. Alternatively, we desired files with precise text, like user id. To obtain the files that contain the desired text, please follow the instructions below. To list all files that match a file search pattern in a directory and all of its subdirectories, use the glob command. Next, read the document and look for the appropriate text. (If you wanted to search for a specific pattern in the file, you could use regex.)
Example: Lookup files for the term data
Output:
-
Sorting the glob() Output
Output:
-
Deleting Files Using glob()
Output:
All Functions in the glob Module
We have discussed all of the functions in the Python glob module. Here is a reminder of what those functions were and what it does.
glob() | iglob() | escape() |
---|---|---|
A list of files matching the path supplied in the function argument is returned. | Give us a generator object so we can iterate over it to acquire the names of the various files. | Notably helpful when dealing with filenames that contain unusual characters |
scandir() vs glob()
Both the glob() and scandir() methods internally search a directory for files that match a predefined pattern.
However, the outcome of the generator method scandir() is an iterator object. Instead, a list is returned using the memory-intensive glob() method.
Conclusion
Let's go through what we have learned here.
- glob is an inbuilt Python function primarily used in file handling. glob is typically used when a programmer must work with many files with the same or different extensions, such as txt, json, and csv.
- Functions for pattern matching in files such as fnmatch(), scandir(), expandvars() and expanduser().
- The rules of pattern matching while handling with files with reference to unix shell expansion rules.
- Some applications of the glob module in Python like to find a file with a specific prefix or to find a list of all the files in a specific file system whose names match a specific pattern.
- Important functions in the glob module such as glob(), iglob() and escape().
- Uses of the glob.glob() function in searching for files, pattern matching with wildcards, along with regex, sorting the output and also deleting items from the output.
- The main difference between glob() and scandir() methods is that scandir() method returns an iterator object while the glob() method returns a list as the output.