SQL PARTITION BY Clause
The PARTITION BY in SQL is a subclause to OVER(). It divides the resultant rows into different partitions based on the specified columns' values. Then the window function is applied to each partition made and gives the results in the form of a separate column.
- The partitions made by the clause are referred to as 'Window'.
- Window functions that can be used with this clause are RANK(), LAG(), LEAD(), COUNT(), MAX() etc.
How to Implement PARTITION BY Clause in SQL?
PARTITION BY clause is used with the OVER() clause to form clusters based on a specified column.
Syntax of PARTITION BY in SQL:
Standard SQL allows only column names to be put as expression1, expression2. OrderClause and frameClause are optional, and can only be added when required.
Example:
Here, we've created a table StudentMarks with columns Id, Name, Subject, Marks.
Output:
As a result, PARTITION BY has calculated the average marks for the students where each window is made on the Subject.
Unlike GROUP BY, it returns each row of the table with calculated result.
Demo Database
To illustrate the usage of the PARTITION BY in SQL, let's create a sample database named "Sales" with a table called "Orders". This table contains columns such as order_id, customer_id, order_date, and order_amount.
Now that we have our sample database set up, let's explore how the PARTITION BY clause can be utilized in SQL queries.
OVER Clause with PARTITION BY in SQL
The OVER clause combined with the PARTITION BY clause allows you to define the window over which a function is applied.
Example:
Output:
[IMAGE 1 START SAMPLE] [IMAGE 1 FINISH SAMPLE]
In this query, we're calculating the total order amount per customer by partitioning the data based on the customer_id column. The SUM() function is then applied to the order_amount column within each partition, resulting in the total order amount for each customer.
Row Number with PARTITION BY in SQL
The ROW_NUMBER() function combined with PARTITION BY assigns a unique sequential integer to each row within a partition. This can be useful for ranking purposes.
Example:
Output:
[IMAGE 2 START SAMPLE] [IMAGE 2 FINISH SAMPLE]
In this query, we're assigning a row number to each order within each customer's partition based on the order_date. This allows us to identify the chronological order of orders for each customer.
COUNT() with PARTITION BY in SQL
The COUNT() function with PARTITION BY allows you to count rows within each partition separately.
Example:
Output:
[IMAGE 3 START SAMPLE] [IMAGE 3 FINISH SAMPLE]
In this query, we're counting the number of orders per customer by partitioning the data based on the customer_id column. The COUNT() function then counts the number of rows within each partition, giving us the total number of orders for each customer.
Conclusion
- The PARTITION BY in SQL allows for the partitioning of result sets based on specified columns.
- It facilitates the application of aggregate or window functions within each partition independently.
- The OVER clause, combined with PARTITION BY, is used to define the window over which functions are applied.
- Functions like SUM(), COUNT(), and ROW_NUMBER() can be effectively utilized with PARTITION BY to perform various analyses on partitioned data.