Regular Expressions with Golang
Overview
Regular expressions are a fundamental feature or tool of all programming languages used in different types of applications to make life easier. It is mainly used for validating logic to compare input values by using some patterns. Golang also contains a package to handle several types of validations using golang regexp. In this article, we will learn about different aspects of the golang regexp package in a detail.
Introduction
A regular expression is a string of characters that is used for pattern matching, finding, and replacing specific text. It is a method of defining and implementing search patterns for strings, as well as integrating find/replace functionality into an application or software. However, sometimes it's difficult to deal with them due to a lot of libraries and frameworks, which sometimes create confusion. In Golang, it is handled by golang regexp, which has a slightly different approach, which we will see further in this article.
It has a built-in Golang regex package that contains all regex use cases and features. It is very robust and easy to use since it follows RE2 syntax standards(RE2 implements regular expressions using a finite-state machine and automata theory).
For using golang regexp we need to first import this package,
How to import the library?
import “regexp”
Basic Matches with the Regex Function
The regexp package includes several techniques for dealing with pattern matches. In this tutorial, we'll explore the most fundamental and helpful methods for dealing with pattern matches.
MatchString
We use MatchString to verify if a pattern matches a string. This tells us if the string meets the pattern's requirements. To generate reusable objects, we use Compile.
For Example:
Output for the above code :
Explanation of the code:
- In the above code, the main function, a variable inputText is declared and assigned a string value of I love scaler.
- The MatchString function is called on the regexp package, passing in two arguments: the regular expression pattern [A-z]ler and the input string inputText.
- The function returns a boolean value indicating whether the pattern was found in the input string and any error that might occur.
- If there is an error, the Error variable is printed out with the error message.
Another Example:
Output for the above code:
Explanation of the code:
- In the above code, the regexp.MatchString function is used to match the regular expression .+t against the value of the variable value.
- The function returns a boolean value indicating whether the match is successful or not, this value is assigned to the variable matched.
- An if-else block is used to test the result of the match, if it is true it prints true otherwise it prints false
FindStringIndex
The FindStringIndex method returns the starting and ending indexes of the regular expression's leftmost match. A null value is returned if no results are found.
For Example:
Output for the above code:
Explanation of the code:
In the above code, it will return the index of the exactly matched character from the re variable. But, in our cartel case will return a null value as no value matched.
MustCompile : This method generates a regular expression and returns the resulting instance. If the pattern is incorrect, it will generate an error (panic).
FindString
It is used to find the text of the first match. If no match is found, the return result is an empty string.
For Example:
Output for the above code:
Explanation of the code:
- In the above code, the main function, the FindString() method is called on the regular expression object and passed a string as an argument.
- This method searches for the first occurrence of the pattern er$ in the given string. If a match is found, it returns the matched string, otherwise,e it returns an empty string.
- In this case, it prints er as scaler and golanger ending with er and it returnsan empty string for regexp as it's not ending with er.
FindAllString(s string, n int)
The FindAllString function accepts a string and an int as parameters and returns a string slice containing all of the matches discovered by the compiled pattern in the input string. The int argument max specifies the maximum number of matches, with -1 indicating no limit. If no matches are found, a nil result is returned.
For Example:
Output for the above code:
Explanation of the code:
- In the above code, the FindAllString function is used to search the "welcomeMessage" variable for all substrings that match the defined pattern. The function returns a slice of strings containing all matches.
- The returned slice of matches is printed to the console using the fmt.Println function.
- The output will be ["Hello", alert"] which is the match with the pattern defined in the regexp.MustCompile function.
FindAllStringIndex(s string, n int)
FindAllStringIndex is the variant of FindStringIndex. It accepts a text and an int as parameters and returns a slice of int slices representing all subsequent matches of the pattern. If no matches are found, a nil result is returned.
More concisely, this method returns the beginning and final indexes of a slice of all subsequent regular expression matches.
Calling a FindAll method with the int argument 0 yields no results but calling with the int parameter 1 yields a slice of int slice of the first matched string from the left. The int argument of -1 yields a slice of int slices of all subsequent pattern matches.
For Example:
Output for the above code:
Explanation of the code:
- In the code example, the FindAllStringIndex function is used to search the content variable for all instances of the regular expression pattern. The -1 argument means to return all matches.
- The for loop iterates over the matches returned by the FindAllStringIndex function.
- Within the for loop, the match variable is defined as the substring of content that corresponds to the current match.
- The fmt .Printf function is used to print out the matched text, as well as its starting and ending index within the content string.
Compiling and Reusing Regular Expression Patterns
The Built-in function is used to compile a regular expression pattern to be reused in more complicated queries.
Using a regular expression to split strings
The Split() method is used to slice a string into substrings which is then separated by regexp and returns the slice of the substring that matches the expression else if no match is found it will simply returns nil.
For Example:
Output for the above code:
Explanation of the code:
- In the above code, the split function accepts a string and an integer and returns a slice of all substrings.
- Here, the second parameter is used to split, from whole sentences .
Another Example:
Output for the above code:
Explanation of the code:
Here, We have a comma-separated list of values in the code example. We cut the values from the string and calculated their sum.
The regular expression includes a comma character and any number of adjacent spaces. vals := re.Split(data, -1)
We get a value slice.
We go through the slice and calculate the sum of the val. The slice contains strings; therefore, we convert each string into an integer with the strconv.Atoi function.
Subexpressions
- Subexpression is used to retrieve substrings from a string with exact matches.
- It is donated with ().
- Mostly, useful to find the regions of content that are important within the given pattern.
For example :
FindStringSubmatch
The FindStringSubmatch method returns the substring that matches the regex pattern to the left. It is a subexpression method. A null value is returned if no pattern is found. As an example:
Output for the above code:
Explanation of the code:
- We've defined two subexpressions in this case, one for each variable component in the pattern.
- It will try to find out the strings from scaler - Limited and if found it will return the string along with the whole text.
Subexpressions can also be named to make it easier to process the resulting output.
Using a Regular Expression to Replace Substrings
For replacing the substring, we need to use the ReplaceAllString function which replaces the text of all matches.
For Example:
Output:
Explanation of the above code:
- In the above code, the ReplaceAllString method is called on the regular expression object, with the arguments ab-b-nmb-q and scaler. This replaces all occurrences of the pattern ab* in the string ab-b-nmb-q with the string scaler.
- The resulting string is then printed to the console using the fmt.Printf() function. The output will be scaler-b-nmb-q
Using a Function to Replace Matched Content
In Golang, the replace() function returns a copy of the given string with the first n non-overlapping instances of old instances replaced by a new one.
For Example:
Output for the above code:
Explanation of the Code
- In the above code, the pattern variable is declared which searches for the string welcome followed by a space and a sequence of alphabetical characters, followed by a space and the string new followed by a space and another sequence of alphabetical characters, and then the string city.
- Then it defines a variable text containing the string Hello guys, welcome to new york city
- It then calls the ReplaceAllStringFunc() function on the pattern variable, passing in the text variable as an argument, which replaces any matched text with a new string welcome to jamshedpur city!
- The final output of the code will be Hello guys, welcome to Jamshedpur city!
For Example:
Output :
Explanation of the above code:
In the preceding code, we find the strings that have an exact match with the re variable and print those data, if we find all matches, otherwise, it returns -1, indicating that no matched strings were found.
Conclusion
- In this post, we looked at the golang regexp package and its built-in methods for handling simple to advanced regex matches, as well as building and reusing regular expression patterns.
- A regular expression is a string of characters that is used for pattern matching, finding, and replacing specific text.
- MatchString to verify if a pattern matches a string.
- Split() method which is used to slice a string into substrings.
- replace() function returns a copy of the given string and more.
- We also saw how to reuse regular expression patterns in golang which makes it more compatible.