Python GREP

Learn via video course
FREE
View all courses
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Topics Covered

Overview

The GREP command line tool may be known to Linux or Unix users, for it was originally developed for Unix and then later on extended for all Unix-like systems. Hence, this command is not present in Windows. However, in Windows, too we can obtain the same functionality as this command provides, using Python. We replicate this command using Python. This article will focus on achieving the same functionality using Python along with some more complex searches.

What is GREP?

GREP stands for Globally search for Regular Expression and Print matching lines. It is a command line utility in Unix-like systems to search for some specific lines among the plain-text data sets that match a regular expression and then display them.

Regular expressions are text strings that have been especially encoded and are used as patterns to match strings in sets. Python makes extensive use of regular expressions, which may be used to determine whether or not a text matches a pattern.

The phrase GREP refers to the use of grep command to determine if the data it receives adheres to a defined pattern. Because of its capacity to sort input in accordance with intricate criteria, this software, which at first glance appears to be rather basic, is actually highly powerful.

These file-searching applications are collectively known as the grep utilities and include grep, egrep, and fgrep. fgrep is appropriate for many of the use cases because of its quickness and ability to solely examine strings and words.

How to Run GREP in Python?

As discussed, the grep command is highly involved with the operations using Regular Expressions (or RE in short). Python offers the re module to deal with regular expressions. We will now see how to run python grep.

The following snippet of code illustrates how can we imitate the grep functionality in Python (python grep):

Here, we iterate over the whole text in the file line by line, and use the search() method of the re module to search for the pattern over the line. This method produces a boolean value: true if the search is successful, else false. Hence, this accomplishes the gr subpart of our grep functionality, and we complete the grep by printing the line on a successful search. This iteration can also be done word by word rather than line by line in case there are multiple occurrences in a single line too and they too need to be tracked. In such a case that matching string will be displayed.

When this logic is implemented in the whole code:

Output:

In the above code, file.open() and file.close() here are standard file manipulation functions of Python to open and close the mentioned file respectively, in read or write modes. In between, we add the content to the file opened using the file.write() method. \n character produces a new line, just like when we print on the console.

The above example demonstrated the use case where the pattern was an exact word. In cases where the characters can vary as per some constraints, regular expressions can be created for them as well.

For example, in a text we need to search for a phone number in XXX-XXX-XXXX format, where X can be any digit between 0 and 9 (inclusive). For this the regular expression will be \d\d\d-\d\d\d-\d\d\d\d (here \d means the character will be an integer) or a shorter form will be \d{3}-\d{3}-\d{4} where the number (let’s say n) in the brackets following a character type tells us to match that character type n times.

We summarize the steps involved in imitating the grep command in python (python grep) as follows: Import the regular expressions module re, with ‘import re’.

With the help of the re.compile() method, create a regex object. Employ a raw string for the pattern, and if the pattern is just a normal string of words with no encoding, it can be directly used also without compiling. Use re.search(pattern, data) method to do the regular expression matching over the text data. search() method can also be applied on the regex object. regex.search(data) gives back a Match object.

The match object will contain the matched string (if there are multiple, the first occurrence) If you want to return all occurrences, iterate over the text word by word or line by line as done in the previous example.

Example:

Output:

In the above example, the regular expression taken is the one just described before it, for a phone number. The first sentence outputs a successful match with this phone number pattern while the second sentence fails to match because of the presence of 3-4-4 characters in the potentially matching string, while the pattern prescribes 3-3-4 characters.

Examples of Python GREP

Here we talk about some other ways or syntax in Python to obtain the same functionality of grep. Other than the above-discussed way, there is yet another clever Python command line implementation for this. This technique will execute the file in the terminal while specifying the regular expression and the file to be searched in the command line. This enables us to accurately duplicate GREP in Python as Python Grep.

Here we import the sys module along with re. The argv() method, which returns an array of all command-line arguments, is available in the sys module. This file may be saved as grepScript.py and this python script may be launched from the terminal along with providing the required arguments as shown below.

If we need to interact with multiple parameters, we may use the glob module. We may use the glob module to locate the paths of files in a directory that match a pattern.

The usage of it in Python to emulate GREP is demonstrated below.

The directory is supplied to the iglob() method, which constructs an object that returns the files in the directory.

The following example illustrates another clear approach to build GREP in only a few lines.

We may execute these lines immediately in the terminal using this method, which is more accurate and memory-efficient.

Also, the original grep command provides a wide variety of options in the search. These options are provided using different flags. We can imitate the behaviour provided by these options as well, and quite easily.

For example,

-c flag: This prints only a count of the lines that match a pattern This can be obtained by keeping a counter and incrementing it at every new line. And hence upon a match, the line number can be obtained.

-n flag: Display the matched lines and their line numbers. Same as -c flag, just print the line number as well as the line.

-o flag : Print only the matched parts of a matching line, each one being on a new line. Iterate word by word rather than line by line.

-i flag: Allows case insensitive matching re.compile() method provides a parameter re.IGNORECASE which if passed to the method, ignores the case while searching.

How to Search a File Using Python GREP?

  • To open a file in write mode, use the open() method.
  • Write all the content and data in which the search is to be performed, using the file.write() function and close the file.
  • Define a pattern which you want to find inside the file. If this is just a simple string of characters or words, it can be used directly, otherwise if it’s an encoded pattern, compile it using re.compile() method
  • Now, open the file in the read form.
  • Use the for loop and the re.search() function inside of it to discover the pattern. If it finds a match, print the result.

Conclusion

In this article, we discussed:

  • GREP is a pattern matching tool which searches for the specified text and displays it, we obtain the same functionality in operating systems which are not Unix based by using Python scripts
  • Python offers the re module and its methods like compile() and search() to replicate the functionality of grep
  • We can perform search for complex patterns by creating suitable regular expressions
  • We iterate over the whole word by word or line by line and the search() method returns a Match object containing the result and more information about the search
  • We can also run this replicated command from command line itself, using the sys module and its methods