Filtering in SQL
Overview
Structured Query Language (SQL) allows filtering data during querying. Using various filtering techniques in SQL, one can refine the output of queries without having to implement any separate logic or coding.
Introduction
The GROUP BY statement forms groups of those rows with the same values or satisfy a specific expression. For example, suppose in a database of school students, you may need to find the students in particular age groups. This is where GROUP BY statement helps.
The GROUP BY statement is usually used along with aggregate functions such as COUNT(), MIN(), MAX(), AVG(), SUM(), etc. This helps us group the resultant output of a query by one or more columns.
Syntax
Here the expression could be a direct comparison of the value of a certain row's columns, or it could be an aggregate function, etc.
Example First, let us create a database and a table to operate upon, as follows:
After insertion of the above entries table is displayed as shown below :
Now assume that we need to print all the employment statutes in our relation (table TEST) along with the number of occurrences in the relation. This is how we achieve it in SQL:
Input table:
Output:
As evident from the data we input earlier, there are 2 rows each of employment status true and false. The very same is reflected in the output of our query. Note that COUNT() function we used in our query is an aggregation function. There are many others such as MAX(), MIN(), AVG(), etc.
HAVING CLAUSE
When there is a need for filtering in SQL, the result of a query, we use a where clause along with it. But the same can not be done in the case of GROUP BY clause. SQL provides us another clause, HAVING, to achieve the functionality of WHERE clause, i.e., to filter the output of a query that uses GROUP BY, on the basis of certain criteria.
Syntax
Let's add some more rows to our existing table TEST to make it a better dataset:
After the insertion of the above entries table is displayed as shown below :
Now let us look at an example for clearer understanding:
Input table:
Output:
This query aims to obtain the ID of all the employed candidates but without any duplicates. Two rows are said to be duplicates if they have the exact same name.
Our query groups rows together based on their names and secondarily on their employment status. The HAVING clause eliminates all unemployed rows from being included in any group.
Then finally, we simply print all the IDs of the resultant output rows from the query.
FILTER CLAUSE
The filter clause is another tool for filtering in SQL. The filter clause is used to, as the name suggests, filter the input data to an aggregation function. They differ from the WHERE clause because they are more flexible than the WHERE clause. Only one WHERE clause can be used at a time in a query. While on the other hand, multiple FILTER clauses can be present in a single query.
Syntax
Example
Input table:
Output:
In the above query, the number of rows that have an ID other than two is five. We are also printing the average length of names of length greater than 4.
Example of Applying Filter Conditions on Groups
Input table:
Output
In the query above, we tried obtaining the following two things:
- The IDs of rows having ID field values less than 6. This is dealt with by the GROUP BY and HAVING clauses in our query.
- The number of rows for each ID greater than 1 and less than 6. The FILTER clause is responsible for handling this. Note the count column has a value of zero corresponding to ID 1, whereas the count value is one for all other IDs.
Conclusion
- The GROUP BY statement forms groups of those rows which have the same values or satisfy a specific expression.
- SQL provides HAVING clause to achieve the functionality of WHERE clause with GROUP BY.
- The FILTER clause is used to filter the input data to an aggregation function and is more flexible than the WHERE clause.