SQL LAG is a window function that allows you to access data from a preceding row within your dataset. It's particularly useful for comparing the value in the current row to a value in a row that comes before it, based on a defined ordering.
Understanding the SQL LAG()
Function
As highlighted by the reference, the LAG()
function is one of SQL's window functions that allows you to create a new column that accesses a previous row from another column. This powerful capability enables analysis that involves sequential data, such as time-series or ordered lists.
It gets its name from the fact that each row in the new column you create would be lagging to fetch a value from a preceding row in the other column you specify. Essentially, it "looks back" a specified number of rows to retrieve a value.
Why Use LAG()
?
Analyzing sequential data often requires comparing a value at a certain point to the value at a previous point. Without LAG()
, this type of analysis would typically involve complex self-joins or subqueries, which can be less efficient and harder to read. LAG()
simplifies this process significantly by providing direct access to previous rows within the window defined by your query.
Syntax of LAG()
The basic syntax for the LAG()
function is:
LAG(column_name, offset, default_value) OVER (
[PARTITION BY partition_column]
ORDER BY order_column
)
Let's break down the parameters:
column_name
: The column from which you want to retrieve the value from a previous row. This is the "another column" mentioned in the reference.offset
: (Optional) The number of rows back from the current row to retrieve the value. If omitted, the default is 1 (meaning it looks at the immediate previous row).default_value
: (Optional) The value to return if the offset goes beyond the scope of the partition (e.g., for the very first row in a partition when offset is 1). If omitted, the default isNULL
.OVER (...)
: This clause is what makesLAG()
a window function.PARTITION BY partition_column
: (Optional) Divides the rows into groups or partitions. TheLAG()
function is applied independently within each partition.ORDER BY order_column
: (Required) Specifies the order of the rows within each partition (or the entire result set if noPARTITION BY
is used). This order determines which row is considered "previous" to the current row.
Practical Example
Imagine a table tracking daily stock prices:
Date | Stock | Price |
---|---|---|
2023-01-01 | ABC | 100 |
2023-01-02 | ABC | 102 |
2023-01-03 | ABC | 101 |
2023-01-04 | ABC | 105 |
2023-01-01 | XYZ | 50 |
2023-01-02 | XYZ | 52 |
You want to see the previous day's price next to the current day's price for each stock. You would use LAG()
like this:
SELECT
Date,
Stock,
Price,
LAG(Price, 1, NULL) OVER (PARTITION BY Stock ORDER BY Date) AS Previous_Price
FROM
StockPrices;
Here's the result:
Date | Stock | Price | Previous_Price |
---|---|---|---|
2023-01-01 | ABC | 100 | NULL |
2023-01-02 | ABC | 102 | 100 |
2023-01-03 | ABC | 101 | 102 |
2023-01-04 | ABC | 105 | 101 |
2023-01-01 | XYZ | 50 | NULL |
2023-01-02 | XYZ | 52 | 50 |
- The
PARTITION BY Stock
ensures that the lagging happens independently for stock 'ABC' and 'XYZ'. - The
ORDER BY Date
ensures that "previous" means the row with the chronologically earlier date within each stock partition. LAG(Price, 1, NULL)
gets thePrice
from the row 1 position before the current row.NULL
is used for the first row where there is no previous row.
Common Use Cases
- Comparing Consecutive Values: Calculating the difference or percentage change between the current row's value and the previous row's value (e.g., day-over-day sales change, stock price movement).
- Identifying Trends: Looking back multiple rows (using a higher offset) to see values further in the past.
- Data Validation: Checking if values in sequential rows follow a specific pattern or constraint.
In essence, the LAG()
function provides a clear and efficient way to reference prior rows within ordered partitions, making sequential data analysis in SQL much more manageable.