LAG() Function in SQL
Overview
In SQL Server (Transact-SQL), LAG() is a window function that allows us to compare the current record value with the value of previous records. This previous record value can be returned on the same record without the use of self-join, thus making it easier to compare.
What is Lag Function in SQL?
SQL Server LAG() is a window function that provides access to a row at a specified offset value that comes before the current row. This implies that with the help of the LAG() function, we can access the data from any of the previous rows on the current row.
The LAG() function is used for comparing the value of the current row with the value of the previous rows. LAG() is an analytic function available after the release of SQL Server 2012. This works similar to the LEAD() function in SQL. In the lead function, we access successive rows, but in the lag function, we access previous rows.
Syntax of Lag Function in SQL
The syntax for LAG() in Transact SQL is:
Parameters of Lag Function in SQL
Expression - this argument takes the column name over which the lag function performs calculations. This is a mandatory argument.
Offset - Optional. It is the number of previous rows from the current row from which the function has to fetch the value. If offset is not specified, the default value is 1.
Default - Optional. This value will be returned if the offset value exceeds the partition's range. If this parameter is not specified, NULL is returned.
PARTITION BY clause - Optional. It divides the resultant set by the FROM clause into groups to which the specified function is applied. If we omit the PARTITION BY clause, the whole resultant set is treated as a single partition by the function.
ORDER BY clause - Optional. It is used to order the data values in each partition. By default, it uses ascending order to sort data.
Return Type of Lag Function in SQL
The return type is the same as the data type of the specified expression. NULL is returned if the expression passed as a parameter is nullable or the default is set to NULL. The return value must be a single value and cannot be another window function.
Examples of Lag Function in SQL
Let us consider a table named Employee that stores employees' salary history. SQL query to create the Employee table is:
Employee Table:
employee_id | year | salary |
---|---|---|
101 | 2017 | 17000.00 |
101 | 2018 | 18190.00 |
101 | 2020 | 19463.30 |
102 | 2017 | 17000.00 |
102 | 2018 | 18360.00 |
102 | 2020 | 19278.00 |
103 | 2017 | 9000.00 |
103 | 2018 | 9720.00 |
103 | 2020 | 10206.00 |
104 | 2017 | 6000.00 |
104 | 2018 | 6060.00 |
104 | 2020 | 6605.40 |
105 | 2017 | 4800.00 |
105 | 2018 | 4992.00 |
105 | 2020 | 5391.36 |
-
SQL Lag function without a default value
In the following SQL query, we will use the lag function on the Salary column with offset one. We did not specify any default value in this query.
Output
employee_id year salary previous_salary 105 2017 4800.00 NULL 105 2018 4992.00 4800.00 105 2020 5391.36 4992.00 104 2017 6000.00 5391.36 104 2018 6060.00 6000.00 104 2020 6605.40 6060.00 103 2017 9000.00 6605.40 103 2018 9720.00 9000.00 103 2020 10206.00 9720.00 101 2017 17000.00 10206.00 102 2017 17000.00 17000.00 101 2018 18190.00 17000.00 102 2018 18360.00 18190.00 102 2020 19278.00 18360.00 101 2020 19463.30 19278.00 In the output, we can note the following:
- The first row shows the NULL value for the previous_salary column because it does not have any previous rows.
- The second row contains the previous_salary row value in the previous_salary column. It takes value from the previous row due to offset value 1.
-
SQL Lag Function with a default value
In the following SQL query, we will use the Lag function on the Salary column with offset one. We did not specify any default value in this query.
Output
employee_id year salary previous_salary 105 2017 4800.00 NO DATA 105 2018 4992.00 4800.00 105 2020 5391.36 4992.00 104 2017 6000.00 5391.36 104 2018 6060.00 6000.00 104 2020 6605.40 6060.00 103 2017 9000.00 6605.40 103 2018 9720.00 9000.00 103 2020 10206.00 9720.00 101 2017 17000.00 10206.00 102 2017 17000.00 17000.00 101 2018 18190.00 17000.00 102 2018 18360.00 18190.00 102 2020 19278.00 18360.00 101 2020 19463.30 19278.00 In the output, the first row shows the NO DATA value instead of NULL for the previous_salary column because it does not have any previous rows.
-
SQL Lag function with OFFSET Value 2
In the above examples, we used default offset value 1 in the lag function, which takes the value from the previous row. In the example, we use offset value 2.
Output
employee_id year salary previous_salary 105 2017 4800.00 NO DATA 105 2018 4992.00 NO DATA 105 2020 5391.36 4800.00 104 2017 6000.00 4992.00 104 2018 6060.00 5391.36 104 2020 6605.40 6000.00 103 2017 9000.00 6060.00 103 2018 9720.00 6605.40 103 2020 10206.00 9000.00 101 2017 17000.00 9720.00 102 2017 17000.00 10206.00 101 2018 18190.00 17000.00 102 2018 18360.00 17000.00 102 2020 19278.00 18190.00 101 2020 19463.30 18360.00 In this example, since the offset value is 2, we have given a default value NO DATA for rows 1 and 2. While row 3 takes value from previous row 1.
-
SQL Lag function with PARTITION BY Clause
Output
employee_id year salary previous_salary 101 2017 17000.00 NO DATA 101 2018 18190.00 17000.00 101 2020 19463.30 18190.00 102 2017 17000.00 NO DATA 102 2018 18360.00 17000.00 102 2020 19278.00 18360.00 103 2017 9000.00 NO DATA 103 2018 9720.00 9000.00 103 2020 10206.00 9720.00 104 2017 6000.00 NO DATA 104 2018 6060.00 6000.00 104 2020 6605.40 6060.00 105 2017 4800.00 NO DATA 105 2018 4992.00 4800.00 105 2020 5391.36 4992.00 In the above example:
- The PARTITION BY clause divides the result set into groups by employee_id.
- Now, for each group, the ORDER BY clause sorts the rows by year column in ascending order.
- The LAG() function is applied to every row of each group independently. The first row in each group is NO DATA because there is no previous year record for salary. The second and third rows get the salary from the first and second rows and populate them into the previous_salary column.
Conclusion
- The Transact-SQL LAG() function is implemented to access values from previous rows.
- The lag function accesses the value from the previous rows on the basis of the defined offset in the function.
- The PARTITION BY clause in the lag function is used to define a logical boundary of data based on the given requirement.
- The Lag function uses NULL as the default value for data beyond the partition's scope.
- LAG() function can be used in combination with Stored Procedures, Common Table Expression (CTEs), and other analytic functions for computation purposes.