Pattern Matching in SQL
Overview
SQL, as the name suggests, is a structured query language used to manage relational databases and perform various operations on them. One of the primary data manipulation queries supported by SQL is the SELECT query which is used to select elements from a database. SQL supports standard pattern matching in the form of the LIKE operator to be used with SELECT to select specific entries. It also provides the REGEXP and RLIKE operators which use extended regular expressions for pattern matching. Pattern matching has a vast array of real-world applications, ranging from search engines to data encryption, etc. The widely popular daily internet puzzle "Wordle" is also an application of pattern matching.
Pattern Matching in SQL
Highlights:
- SQL provides the LIKE operator for standard pattern matching
- SQL provides the REGEXP_LIKE() function as well as REGEXP and RLIKE operators for pattern matching using extended regular expressions.
Remember the last time you tried typing something into the Google search bar and its autocomplete feature already completed your sentence? Or perhaps you remember the time your phone's autocorrect changed your text to something along the lines of "What the duck"? Or maybe you do recall that afternoon game of hangman with your friends in which you had to try and think of a word that starts with S and ends in D?.
In each of those instances, the primary function that Google, your phone's autocomplete, or you yourself were trying to accomplish was pattern-matching. Pattern matching is checking whether a specific sequence of characters or token or data exist within a dataset. It is a primary feature of any data management language and is also implemented in SQL.
The LIKE operator is used with the WHERE clause. It enables users to use _ to match a single character and % to match an arbitrary number of characters. Pattern matching in MySQL is case-insensitive by default. To make the pattern matching case sensitive in MySQL, the BINARY operator is used.
Syntax:
Extended regular expressions are used in several programming languages to accomplish pattern matching. SQL provides the REGEXP_LIKE() function as well as REGEXP and RLIKE operators to implement pattern matching using extended regular expressions.
Syntax:
In the following sub-sections, we shall discuss the various use-cases of pattern matching in SQL. Let us consider a table named pet consisting of the columns: name, owner, and species. The table is described below:
We shall be using this table as a reference in all the examples in this article.
The LIKE Clause
Highlights:
- SQL pattern matching can be done using the LIKE operator.
- _ is used to match a single element and % is used to match an arbitrary number of elements.
- The NOT operator is used to select elements that do not match the given pattern.
The LIKE operator in SQL pattern matching is used for standard pattern matching. It implements pattern matching using _ to match single elements and % to match an arbitrary number of elements to identify patterns. The NOT operator is used with the LIKE operator to select elements that do not match the pattern. LIKE operator is by default case-insensitive in MySQL. The BINARY operator is added to the syntax to make the pattern case-sensitive.
The general syntax for the LIKE clause is as follows:
Using LIKE Clause With %
In SQL pattern matching, the % (percentage) sign is used to match an arbitrary number of characters. It can be used in several ways:
- a% selects all entries that start with the letter a
- %b selects all entries that end with the letter b
- a%b selects all entries that start with a and end with b.
- abc%selects all entries that start with abc.
Let us look at a few examples:
1. Select all entries where the pet's name starts with B
2. Select all entries where the owner's name starts with Dia
3. Select all entries where the pet's name has uff in it
Using LIKE clause with _
In SQL pattern matching, the _ (underscore) is used to match exactly one arbitrary character. It can also be used in a number of ways. Let us look at a few examples of its usage:
1. Select all entries where the pet's name has exactly five letters
2. Select all entries where the owner's name starts with DIAN and has exactly five letters in them
Using LIKE Clause with % And _ Operator in A Single Query
We can also combine and use both % and _ operators in a single query to pattern match. Let us take a look at a few examples:
1. Select all entries where the owner's name has n as its second last letter
Using LIKE Clause with NOT Operator
Using SQL pattern matching, we may also select those entries that do not follow a given pattern. In other words, we can exclude all those entries that match the specified pattern. This is done by adding the NOT operator. Let us look at a few examples:
1. Select all entries where the pet name does not have l as its second letter
2. Select all entries where the pet name does not end in er
Pattern Matching Using Extended Regular Expression
SQL pattern matching is also implemented through the use of extended regular expressions. The following special characters or constructs are used within extended regular expressions for pattern matching:
Characters | Function |
---|---|
^ | Match the beginning of a string |
$ | Match the end of a string |
. | Match any character(including carriage return and newline) |
a* | Match any sequence of zero or more a characters |
a+ | Match any sequence of one or more a characters |
a? | Match either zero or one a characters |
(abc)* | Match zero or more instances of the sequence abc |
a{n} or a{m,n} | Match exactly n a or match m through n instances of a |
[a-dX] or | Match any character a through d or X |
[^a-dX] | Match characters that are not among a through d or X |
[.charcter.] | Within bracket expression([]) Matches the sequence of characters of that collating element |
[=character_class=] | Within bracket expression([]) Matches all characters with the same collation value, including itself |
[:character_class:] | Within bracket expression([]) Matches all characters belonging to a character class, like alpha for alphaneumeric characters, etc |
[[:<:]], [[:>:]] | These stand for word boundaries and match the beginning and end of words, respectively |
SQL provides the REGEXP_LIKE() function as well as REGEXP and RLIKE operators to implement pattern matching using extended regular expressions.
Syntax:
Using REGEXP_LIKE(Column_Name, REGEX) Function
The REGEXP_LIKE() function can be used (with or without the NOT clause) to select entries according to the regular expression defined within the function.
Let us look at a few examples:
1. Select all entries where the pet names start with b
2. Select all entries where the pet name has w in it
3. Select all entries where pet name owner name has n as the second last letter
Using RGEXP Clause
MySQL also provides RGEXP clause which can also be used with or without a NOT clause to select elements using extended regular expressions. Some example use-cases are :
1. Select all entries where the pet is neither a dog nor a cat
2. Select all entries where the owner name has an
Using RLIKE Clause
SQL pattern matching also provides the RLIKE operator which is a synonym of the RGEXP operator. Some example use-cases are:
1. Select all entries where the owner's name is four-lettered
2. Select all entries where the pet name starts with f and ends with y
Conclusion
- Pattern Matching in SQL is accomplished with the help of the LIKE operator which implements standard SQL pattern matching.
- SQL also provides the REGEXP_LIKE() function as well as the RGEXP and the RLIKE functions to implement pattern matching using extended regular expressions.
- Pattern Matching operators can be modified using the NOT clause to select those elements that do not match the pattern.