Difference between Group by and Having clause in SQL
Overview
Combining your data points to get a single value is known as "aggregation." For example, calculating the mean or the minimum. Sometimes, aggregating all your data will result in a value that isn't useful. The SQL GROUP BY command and the HAVING statement, which let you choose which rows of data are included in each group, are both covered in this tutorial.
Group By Clause
Description
When using aggregate functions (MAX, SUM, AVG), the GROUP BY clause is frequently used to group the results by one or more columns. In simple terms, we may explain that the SELECT statement and the GROUP BY clause are used to group the necessary data. Rows with identical values are grouped together by the GROUP BY statement. After the where clause, this statement is utilized. This statement is frequently used to group the results by one or more columns along with an aggregate function like SUM, AVG, COUNT, etc.
Syntax
Let's explore the syntax for a fundamental group by query now that we understand what the SQL GROUP BY clause is.
Here,
- The usual SQL SELECT command query is "SELECT statements...".
- The phrase that conducts the grouping based on column_name1 is "GROUP BY column_name1".
- When grouping on multiple columns, the optional variable "[,column_name2,...]" stands in for the other column names.
- The optional "[HAVING condition]" limits the rows that the GROUP BY clause affects. The WHERE clause and this are similar.
Basic Example
Here,
GROUP BY SALARIES:
This is the grouping clause. It indicates that the rows in the EMPLOYEES table should be grouped based on the values in the SALARIES column. This means all rows with the same salary will be treated as a single group.
Putting it all together, the query essentially calculates the count of employees with the same salary and displays this count alongside the individual employees in those salary groups.
For example, let's say the EMPLOYEES table has the following data:
EMPLOYEES | SALARIES |
---|---|
John | 50000 |
Jane | 60000 |
Alex | 50000 |
Sarah | 60000 |
Michael | 70000 |
Running the provided query on this data would yield a result like:
COUNT_SALARIES | EMPLOYEES |
---|---|
2 | John |
2 | Alex |
2 | Jane |
1 | Michael |
2 | Sarah |
In the result, each row represents a salary group, and the COUNT_SALARIES column shows the number of employees with that salary. The EMPLOYEES column lists the employees within that salary group.
Advantages
- It enables the collection of rows with identical values across one or more columns.
- Data summaries and row reductions in the output table are both useful.
- To calculate summary statistics for each group, combine it with aggregate functions like SUM, COUNT, AVG, MIN, MAX, etc.
- It enables grouping using different columns.
Disadvantages
- Complex GROUP BY query writing and optimization can take some time.
- To perform some aggregations, subqueries or temporary tables may be needed.
- A complex GROUP BY query's output can take time to understand.
Limitations of WHERE Clause
Now, we want to add a restriction to the filter: we only want to include orders that amount more than 40,000. Try the WHERE clause now:
This SQL code is used to retrieve and display information from a database table named "sales."
SUM(total) > 40000:
This is a condition that filters the rows from the "sales" table. It considers only those rows where the sum of the "total" column (the total monetary value of each sale) is greater than 40,000.
In summary, this SQL code retrieves information from the "sales" table. It calculates the average unit price, total quantity of products sold, and total monetary gain for each product line. However, it only considers product lines where the total gain is more than 40,000. The results are grouped by product line and ordered in descending order based on the total gain.
This query will return the following error:
It is impossible to pass aggregated functions in the WHERE clause without making this error. To resolve this problem, a new command is required.
Having Clause
Short Description
Similar to the aggregate function with the GROUP BY clause is the having clause. With aggregate functions, the HAVING clause is used instead of the WHERE clause. While the GROUP BY Clause creates summary rows out of rows with the same data. To discover rows that meet specific criteria, the where clause is combined with the having clause. Always use the having clause following the group by clause.
Syntax
Basic Example
HAVING COUNT(SALARIES) > 1:
This is a filter applied after the grouping. The HAVING clause is similar to the WHERE clause, but it operates on the grouped results. It filters out groups that do not meet the specified condition. In this case, it's filtering out groups where the count of employees with the same salary is greater than 1. In other words, it's looking for duplicate salaries.
However, there's an issue in the code. The part EMPLOYEES in the SELECT statement is not valid because you're using an aggregate function (COUNT) without grouping the other columns or using them in an aggregate function. If you want to retrieve the details of employees with duplicate salaries, you'll need to adjust the code to something like this:
This code will correctly show you the duplicate salary values along with the count of employees having each salary.
Advantages
- It enables the filtering of groups according to a requirement involving an aggregate function.
- It can be used to compute calculations on aggregated data, including percentage or ratio calculations.
- It can be applied to complicated searches to get more targeted results.
Disadvantages
- Query performance may be slowed if there are intricate calculations involved.
- Understanding the results of a complicated HAVING query might be challenging.
- To do some sorts of filtering, subqueries or temporary tables may be needed.
SQL Order of Execution for GROUP BY and HAVING
The commands were written in the following order:
The proper method of execution is as follows:
The query processor chooses which tables to include before starting with SELECT, and SELECT is then executed after HAVING. This explains why ORDER BY allows the usage of ALIAS while HAVING does not, and vice versa. This order of execution also explains why HAVING and GROUP BY are used to apply conditions to aggregated data, although WHERE is unable to do so.
Difference Between Group by and Having Clause in SQL
Sr. no | Having Clause | GroupBy Clause |
---|---|---|
1 | It adds an additional condition to the query. | Data are grouped using the groupby clause according to a specific column or row. |
2 | Without a groupby clause, having cannot be used in an aggregate function; in that case, it acts like a where clause. | You can use groupby with a select statement without having clause. |
3 | The having clause can contain aggregate functions. | It cannot contain aggregate functions. |
4 | The query output is limited by a few constraints. | It filters the output based on a few columns or rows. |
Conclusion
The "Having" clause is used to filter those groups based on a condition that holds true for the aggregated data, while the "Group By" phrase is used to group data into summary rows based on shared values. Understanding the differences between these two clauses can help you create more effective SQL searches that draw out valuable information from your data.