Window Functions in Mysql
Overview
Window functions in MySQL are used to perform calculations like aggregation, analytics, and ranking over a set of rows. It works in the same way as the aggregate functions in terms of calculations but the aggregate function returns a single row after computing a whole column whereas a Windows function is applied OVER a set of rows which is known as a Window and it returns the value associated with every row individually. There will be a unique identity maintained for each row.
Introduction
The Window functions in MySQL are special kinds of pre-defined methods that are used to perform computations on a set of rows rather than a particular column. The set of rows on the methods that are applied is referred to as a Window. The row on which the window function is applied is known as the current row. The current row is related to the set of rows on which the computations are applied. As discussed earlier, the window functions in MySQL work in a similar way as the aggregate functions do. But the catch is, the aggregate function is applied on a particular column or a whole table producing output in a single row whereas window functions in MySQL operate on every row from a selected set of rows (Window) and produce an output corresponding to each row.
It is mandatory to use an OVER clause with window functions in MySQL as it is used to define the set of rows (Window) on which the functions should be applied. Apart from that, the OVER clause is used to order the selected rows in groups.
Earlier, window functions were not supported in MySQL. It was introduced in version 8.0.
Syntax:
In the above syntax, the OVER clause is used after the window function to define the partitions, order, and range of the rows.
Here, the PARTITION BY clause is used to make divisions or partitions of rows so we can specify the rows to be used in the window function computation. The ORDER BY clause is used to sort the order of rows that exists in a partition. The ROW or RANGE clause is used to form a subset of rows in the current partition
Both, ORDER BY and ROW or RANGEclauses are optional for the window function.
Now, Let's discuss a bit the types of Window functions in MySQL.
The Window functions are divided into three categories:-
- Aggregate Window Functions
- Analytical Window Functions
- Ranking Window Functions
Aggregate Window Function
Aggregate window functions are the kind of functions that are used to perform mathematical calculations on the data stored in MySQL. They are used to aggregate the data. Some of the most commonly used aggregate functions are as follows:
- SUM()
- AVG()
- MIN()
- MAX()
- COUNT()
Analytical Window Function
Analytical window functions are the kind of functions that are used to carry out analysis of the data stored in a database. Some of the most commonly used analytical functions are as follows:
- NTILE()
- LAG()
- LAST_VALUE()
- FIRST_VALUE()
- LEAD()
Ranking Window Function
Ranking window functions are the kind of functions that are used to give rankings to each row of a partition. Some of the most commonly used ranking functions are as follows:
- RANK()
- DENSE_RANK()
- ROW_NUMBER()
- PERCENT_RANK()
- CUME_DIST()
ROW_NUMBER() Function
ROW_NUMBER is one of the most common ranking window functions in MySQL which provides a normal serial number to each row present within a partition. It is just a normal serial-wise numbering of rows in ascending order. Even the rows which have the same value when using the ORDER BY clause has been given a different row number.
Syntax
Let's understand the use of row_number with an example.
Example
In this example, let's take a table Car which will contain details of cars like the model and release year. We will apply the row_number() function on the table to assign incremental sequences to each row from the table.
Let's consider the following table:
car_name | released_year |
---|---|
Hyundai Verna | 2016 |
Honda City | 1955 |
Audi Q3 | 2016 |
BMW X4 | 2010 |
Mercedes Benz | 2019 |
Code
Output:
car_name | row_num |
---|---|
Honda City | 1 |
Hyundai Verna | 2 |
Audi Q3 | 3 |
BMW X4 | 4 |
Mercedes Benz | 5 |
Explanation:
In the above example, we have used to table car which contains car_name and released_year columns. The ROW_NUMBER() window function is used here to assign numerical sequences to each row of the table. The rows are ordered by or sorted by the released_year using the ORDER BY clause as seen in the output table.
RANK() and DENSE_RANK() Functions
RANK()
RANK() is a kind of ranking window function in MySQL which is used to assign ranks to the rows within an ordered partition. The rows which contain the same values will be assigned the same rank, with the next ranking or rankings skipped. For example, if we have 4 items with the same value starting with ranking 1 then all 4 items will have rank 1, and the ranking from the 5th item would be 5 with 2,3,4 skipped.
Syntax:
Example
In this example, we will consider a table Car which will contain details of cars like the model and released year and price. We will apply the RANK() function on the table to assign rankings to each row from the table.
Let's consider the table:
car_name | released_year | price |
---|---|---|
Hyundai Verna | 2016 | 800000 |
Honda City | 1955 | 800000 |
Audi Q3 | 2016 | 7500000 |
BMW X4 | 2010 | 3000000 |
Mercedes Benz | 2019 | 7500000 |
Code
Output
car_name | ranking |
---|---|
Audi Q3 | 1 |
Mercedes Benz | 1 |
BMW X4 | 3 |
Hyundai Verna | 4 |
Honda City | 4 |
Explanation
In the above example, we have used to table car which contains the car_name, released_year and price columns. The RANK() window function is used here to assign a ranking to each row of the table based on their prices. The rows are ordered by or sorted in descending order by the prices using the ORDER BY clause as seen in the output table.
DENSE_RANK()
The DENSE_RANK() function is a type of ranking window function in MySQL which is used to assign rankings to the rows within an ordered partition. DENSE_RANK() is almost as same as the RANK() window function but in DENSE_RANK() no rankings are skipped even if the rows have the same value.
Syntax
Example
Let's take the same example we have used in the RANK() function section. Here, we will apply the DENSE_RANK() method to assign rankings without skipping the ranks.
Code
Output
car_name | ranking |
---|---|
Audi Q3 | 1 |
Mercedes Benz | 1 |
BMW X4 | 2 |
Hyundai Verna | 3 |
Honda City | 3 |
Explanation:
As seen in the previous example, the ranking is assigned based on price. Here, with the use of DENSE_RANK() the rankings are assigned and displayed with the car_name attribute. As seen in the output, the rankings are assigned without skipping the previous ranks even if the rows contain the same value.
LAG() and LEAD() Functions
LAG() and LEAD() both are types of analytical window functions in MySQL. As the name suggests, the LAG() function starts the value from the previous row (returns NULL if starting from the first row as no preceding row exists). It returns the value of the row before the current row of the partition. Whereas the LEAD() function returns the value of the row after the current row in a partition. It starts from the value of the next row (returns NULL if no row if no more rows are available).
Syntax (LAG)
Syntax (LEAD)
Example
Here, we will consider the same table car which contains different car details. We will apply both LAG() and LEAD window functions to demonstrate their practical usage.
Consider a table car:
car_name | released_year | price |
---|---|---|
Hyundai Verna | 2016 | 800000 |
Honda City | 1955 | 800000 |
Audi Q3 | 2016 | 7500000 |
BMW X4 | 2010 | 3000000 |
Mercedes Benz | 2019 | 7500000 |
Code (LAG)
Output
car_name | price | lag_column |
---|---|---|
Honda City | 800000 | NULL |
BMW X4 | 3000000 | 800000 |
Hyundai Verna | 800000 | 3000000 |
Audi Q3 | 7500000 | 800000 |
Mercedes Benz | 7500000 | 7500000 |
Code (LEAD)
Output
car_name | price | lead_column |
---|---|---|
Honda City | 800000 | 3000000 |
BMW X4 | 3000000 | 800000 |
Hyundai Verna | 800000 | 7500000 |
Audi Q3 | 7500000 | 7500000 |
Mercedes Benz | 7500000 | NULL |
Explanation
As seen in the above example, we have used both the LAG() and LEAD()functions on the same table and set of rows. The LAG() function returns values from one preceding row i.e., here it returns NULL in the first row as there was no preceding row available. The LEAD() function returns the value from one subsequent row i.e., here it starts returning the value from the second row and the last row returns the NULL value as there are no subsequent rows available.
FIRST_VALUE() and LAST_VALUE() Functions
FIRST_VALUE() and LAST_VALUE() are one of the most commonly used analytics window functions in MySQL. The FIRST_VALUE() function shows the value from the first row of a window frame or within a partition. LAST_VALUE() will return the value of the last row of a window frame or within a partition.
Syntax(FIRST_VALUE)
Syntax(LAST_VALUE)
Example
In this example, we will consider a table Employee which contains details about the employee working in different departments with their salary. Here, we will use the FIRST_VALUE() and LAST_VALUE()methods to return the name of the employee having the least salary and highest salary from each department respectively.
Consider a table Employee:
employee_name | department | salary |
---|---|---|
Abdul | Finance | 68000 |
Aman | IT | 60000 |
Jai | IT | 45000 |
Mohit | IT | 70000 |
Nilesh | Finance | 55000 |
Ravi | Accounting | 60000 |
Sushant | Finance | 65000 |
Vishal | Accounting | 40000 |
Code(FIRST_VALUE)
Output
employee_name | department | salary | least_salary |
---|---|---|---|
Vishal | Accounting | 40000 | Vishal |
Ravi | Accounting | 60000 | Vishal |
Nilesh | Finance | 55000 | Nilesh |
Sushant | Finance | 65000 | Nilesh |
Abdul | Finance | 68000 | Nilesh |
Jai | IT | 45000 | Jai |
Aman | IT | 60000 | Jai |
Mohit | IT | 70000 | Jai |
Code(LAST_VALUE)
Output
employee_name | department | salary | max_salary |
---|---|---|---|
Vishal | Accounting | 40000 | Ravi |
Ravi | Accounting | 60000 | Ravi |
Nilesh | Finance | 55000 | Abdul |
Sushant | Finance | 65000 | Abdul |
Abdul | Finance | 68000 | Abdul |
Jai | IT | 45000 | Mohit |
Aman | IT | 60000 | Mohit |
Mohit | IT | 70000 | Mohit |
Explanation
In the above example, we have used both the FIRST_VALUE() and LAST_VALUE() methods on the same table. Here, using the FIRST_VALUE() method we've returned the name of the employee having least_salary for every partition (department). And using the LAST_VALUE() we've returned the name of the employee having max_salary for every partition (department).
NOTE: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is a clause that defines the frame in the partition from the first row to the last row.
NTILE() Function
NTILE() function is a kind of analytical window function in MySQL which is used to divide rows into groups within an ordered partition. Each of the divided groups will be assigned a number which will be the same for that group of rows. Every group is represented by a number.
Syntax
Example
Here the same table Employee is used which will be divided into groups using the NTILE() window function.
Code
Output
employee_name | department | salary | grouping |
---|---|---|---|
Vishal | Accounting | 40000 | 1 |
Ravi | Accounting | 60000 | 1 |
Nilesh | Finance | 55000 | 1 |
Sushant | Finance | 65000 | 2 |
Abdul | Finance | 68000 | 2 |
Jai | IT | 45000 | 2 |
Aman | IT | 60000 | 3 |
Mohit | IT | 70000 | 3 |
Explanation
Here, we have considered the same Employee table used in the previous examples. We've used the NTILE() to form a total of 3 groups from the rows of the Employee table. The first group contains 3 rows, the second group contains 3 rows and the last group contains 2 rows.
Using Window Functions in Queries
Let's see an example to display the window functions in MySQL queries.
Example
In this example, we will use three Aggregate window functions MIN(), MAX(), and COUNT() to find the minimum price, maximum price, and count of cars in respective years.
Let's consider a table car:
car_name | released_year | price |
---|---|---|
Hyundai Verna | 2016 | 800000 |
Honda City | 1955 | 800000 |
Audi Q3 | 2016 | 7500000 |
BMW X4 | 2010 | 3000000 |
BMW X5 | 2010 | 2000000 |
Mercedes Benz Q5 | 2019 | 6700000 |
Mercedes Benz | 2019 | 7500000 |
Code
Output
car_name | released_year | price | min_price | max_price | count_of_cars |
---|---|---|---|---|---|
Honda City | 1955 | 800000 | 800000 | 800000 | 1 |
BMW X4 | 2010 | 3000000 | 2000000 | 3000000 | 2 |
BMW X5 | 2010 | 2000000 | 2000000 | 3000000 | 2 |
Hyundai Verna | 2016 | 800000 | 800000 | 7500000 | 2 |
Audi Q3 | 2016 | 7500000 | 800000 | 7500000 | 2 |
Mercedes Benz | 2019 | 6700000 | 6700000 | 7500000 | 2 |
Mercedes Benz Q5 | 2019 | 7500000 | 6700000 | 7500000 | 2 |
Explanation
The table car includes the released year, price, and model name information. Here, we have used three aggregate window functions MIN(), MAX(), and COUNT(). The MIN() function returns the minimum price for every year a car is released. The MAX() function returns the maximum price for every year a car is released. The COUNT() function returns the number of cars released each year.
Performance Considerations
Let's now discuss some of the advantages when performance is taken into consideration for window functions in MySQL:
- Window functions in MySQL operate on a set of rows(Window) rather than a single column.
- Window function keeps individual rows and adds a summary column.
- Window functions in MySQL can be applied on fixed as well as sliding window frames.
- Window functions in MySQL contain several methods which are used to efficiently extract meaningful information from the database.
Conclusion
Let's summarize whatever we've learned till now:
- The Window functions in MySQL are special kinds of pre-defined methods that are used to perform computations on a set of rows.
- The set of rows on the methods that are applied is referred to as a Window.
- Using an OVER clause with window functions in MySQL is mandatory.
- The Window functions are divided into three categories:- - Aggregate Window Functions - Analytical Window Functions -Ranking Window Functions`