Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Introduction:

A Window Function performs a Data Analysis calculation across a set of table rows that are
somehow related gives a single row or grouped by condition (refer to Figure 1).

Window function does not cause rows to become grouped into a single output row. Rows retain
their separate identities also able to access more than just the current row of the query result.
(refer to Figure 1).

Window Function Syntax:


Window_Function([All] expression)
OVER( [PARTITION BY expression_list] [ORDER BY order_list Row_or_ Range
clause] )

Figure 1 — Difference between — Aggregated and Windows function

For more information please visit my last blog, I Introduced the SQL Window Function-Part I
and gave an overview of different types of SQL Window Functions.

In this blog, I am explaining all three major types of SQL Window Function along with
examples.

The database used to explain the below concepts: Postgres database.

Dataset: Available at Github: Retails.csv

Retails Table Screenshot.

Types of SQL Window Functions:

1. Window Aggregated Function


2. Window Ranking Aggregated Function
3. Window Analytical Function

1. Window Aggregated Function


Consist one of the supporting aggregated function i.e. AVG(), COUNT(), MIN(), MAX(),
SUM().

a. AVG():

 Returns average value for the input expression values.


 The function works with Numeric Values and ignores NULL values.

Example:

Write a query to calculate the average order price.

Query:
SELECT order_id, product_id, price, AVG(price) OVER (PARTITION BY order_id) AS
Average_Order_Price
FROM retails

Output — Query — AVG()


Explanation:

 To calculate average order price, window function used AVG() on price columns and
partition by on order_id
 Consider order id: 1112 — consist 3 products (i.e. 1,2,5). The average value of those 3
products are (866 + 163 +173) / 3 = 400.667

b. COUNT():

 Calculates the number of rows with NULL values too if available in column or
expression.
 This window function is helpful while creating a new feature in the dataset. Like count
number of entries belong to each customer.

Example:

Query 1: Calculate the number of products purchased by a customer in the order.

SELECT order_id, name, product_id, COUNT(*) OVER (Partition BY order_id) AS


Number_of_Products
FROM retails

Query 2: Calculate the number of product sales purchased (running total) by the customer.

SELECT order_id, name, product_id, COUNT(*) OVER (Order BY order_id) AS


Number_of_Products
FROM retails

Output — Query 1 & 2 — COUNT()

Explanation:

Query 1:
 Partition by order_id counts number of records belongs to particular order_id.
 In the output, we can see several products for each order displayed.

Query 2:

 Order by order_id counts a number of records and particular order_id and then add
number records of consecutive order.
 Output: We can see the count is increased by the number of records related to particular
order_id.

c. Min() or Max():

 Min() or Max() return Minimum or Maximum value of the expression across the input
values respectively.
 Both window function works with Numeric values and ignores NULL values.

Example:

Below query add a new feature into the result set, Minumum and Maximum price of the product
purchased in respective order.

Query:
SELECT order_id, name, product_id, price,
MIN(price) OVER (Partition BY order_id) AS Minimum_Price_Product,
MAX(price) OVER (Partition BY order_id) AS Maximum_Price_Product
FROM retails

Output- Query- MIN() or MAX()

Explanation:
 For each order_id record respective minimum and maximum price of the product has
been added.
 We can use each function separately too.

d. Sum():

 Returns sum/total expression across all input value.


 The function works with Numeric Values and ignores NULL values.

Example:

Below query return Total Price of each order_id.

Query:
SELECT order_id, name, product_id, price,
SUM(price) OVER (PARTITION BY order_id) AS Average_Order_Price
FROM retails

Output — Query — SUM()

Explanation:

 New column added with total_order_price for each order_id.


 Helpful to analyse data, where we have a number of records, belongs to each order_id.
2. Window Ranking Aggregated Function:
Consist one of the supporting ranking function i.e. RANK(), DENSE_RANK(),
ROW_NUMBER().

a. RANK():

 The Rank of a value in a group of values based on the ORDER BY expression in the
OVER clause (refer Query 1).
 Each value is ranked within its PARTITION BY expression (refer Query 2).
 Rows with equal values for the ranking criteria receive the same rank.
 Tie or same rank skip the consecutive rank eg. Rank (): 1,1,3,4,5.

Example:

Query 1: Rank the product based on their prices.

SELECT order_id, name, product_id, price,


RANK() OVER (ORDER BY price) AS Rank_Product_Price
FROM retails

Query 2: Rank the product based on their prices in each order (i.e. partition by order_id).

SELECT order_id, name, product_id, price,


RANK() OVER (PARTITION BY order_id ORDER BY price) AS Rank_Product_Price
FROM retails

Output — Query 1 & 2 — RANK()

Explanation:

As we can see in both query, ORDER BY states the expression used to rank the values.

Query 1:
 The ranking is done based on product_price.
 Also note, 9 rows with the same value has tie rank 1.
 So next Rank value starts with 10.

Query 2:

 Rank has been done by ORDER BY expression i.e. price column.


 Check order_id 114, we can see the rank for first 2 product prices are same. Hence rank
assign to it is 1.
 Next product price rank with 3 within that order_id.

b. DENSE_RANK():

 Similarly to Rank() function, Rank of a value in a group of values based on the ORDER
BY expression and the OVER clause and each value is ranked within its PARTITION
BY expression.
 The difference is, Rows with equal values receive the same rank and Tie or the same rank
not skip the consecutive rank.
 Example: Dense_Rank(): 1,1,2,3,4

EXAMPLE:

Query: Dense_Rank the product based on their prices in each order (i.e. partition by order_id).

SELECT order_id, name, product_id, price,


DENSE_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails
Output — Query — Dense_Rank()

Explanation:

 As we can see ranking to each row done based on ORDER BY expression i.e. price
values also within each order_id i.e. (PARTITION BY order_id).
 order_id 1114, have 5 products out of which 2 products having the same price hence rank
tie i.e. 1.
 The next rank starts with 2 (this is the major difference between Rank() and
Desne_Rank() function).
 Dense_Rank() not skip the consecutive rank number.

c. CUME_DIST():

 Calculates Relative Rank of the current row within a window partition based on below
Formula:

EXAMPLE:
Query: CUME_DIST i.e. Relative rank the product based on their prices in each order (i.e.
partition by order_id).

SELECT order_id, name, product_id, price,


CUME_DIST() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails

Output — Query — CUME_DIST() i.e. Relative Rank

Explanation:

Let's consider order_id 1112 having 3 products, relative rank calculated as discussed below
formula used:

 Row no. 3— First product: 1/3 = 0.3333


 Row no. 4— Second product: 2/3 = 0.666
 Row no. 5— Third product: 3/3 = 1

Similarly, if product having same value or price then relative rank also same check order_id
1114 in output screenshot.
d. ROW_NUMBER():

 An ordinal number of the current row within its partition based on ORDER BY
expression in the OVER clause.
 Each value is ordered within its PARTITION BY expression.
 Rows with equal values for the ORDER BY expressions receive different row numbers
non deterministically.

EXAMPLE:

Query: Assign Row_Number to the product based on their prices in each order (i.e. partition by
order_id).

SELECT order_id, name, product_id, price,


ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails

Output — Query — Row_Number()

Explanation:

 As we can see in output screenshot, row number assign based on price (ORDER BY
expression) within each order (PARTITION BY order_id).
 Not consider value same or not, just assign row_number to each row in the expression.

e. NTILE():
 Divides the rows for each window partition, as equally as possible, into a specified
number of ranked groups.
 Requires ORDER BY clause in the OVER clause.
 The column or expression specified in ORDER BY clause, first all values has been sorted
in ascending order and then equally assign group number.

Example:

Query: Assign Group/cluster/bucket number to all row into 10 different groups based on the
product price.

SELECT order_id, name, product_id, price,


NTILE(10) OVER (ORDER BY price) AS NTile_Product_Price
FROM retails

Output — Query — NTILE()

Explanation:

 In this dataset, we have a total of 50 records.


 Hence, each clustered consist of 5 rows as shown in the output screenshot.
 First, all rows have been sorted with the price and then assign a group number to each
row.
f. PERCENT_RANK()

 Percentage rank of the current row using the following formula:

Example:

Query: Calculate or assign percentage rank to all row based on the product price.

SELECT order_id, name, product_id, price,


PERCENT_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails

Output — Query — Percent_Rank()

Explanation:

Let’s consider order_id 1114 having 5 products, relative rank calculated as discussed below
formula used:
 Row no. 9— First product: (1–1)/(5–1) = 0
 Row no. 10— Second product: (1–1)/(5–1) = 0
 Row no. 11— Third product: (3–1)/(5–1) = 0.5
 Row no. 12 — Forth product: (4–1)/(5–1) = 0.75
 Row no. 13 — Fifth product: (5–1)/(5–1) = 1

3. Window Analytic Functions:


Consist one of the supporting ranking function i.e. LAG(), LEAD(), FIRST_VALUE(),
LAST_VALUE().

a. LAG() or LEAD():

Syntax:

LAG | LEAD (expression)


OVER ([ PARTITION BY expression_list] [ORDER BY order_list] )

 LAG or LEAD returns, value for the row value before or after the current row in a
partition respectively.
 If no row exists, null is returned.

Example:

Query: Add new feature 1 step LAG or LEAD product price within each order (i.e.
PARTITION BY order_id)

SELECT order_id, name, product_id, price,


LAG(price,1) OVER (PARTITION BY order_id ORDER BY price) AS LAG_Product_Price,
LEAD(price,1) OVER (PARTITION BY order_id ORDER BY price) AS
LEAD_Product_Price
FROM retails
Output — Query — LAG() or LEAD()

Explanation:

 As we can see both lag and lead column give 1 step value respectively within the
partition of order_id.
 When its first row in LAG() gets imputed with NULL.
 Similarly, in LEAD() the last row gets imputed with NULL.

b. FIRST_VALUE() or LAST_VALUE():

Syntax:

FIRST_VALUE | LAST_VALUE ( expression )


OVER ( [PARTITION BY expression_list ] [ ORDER BY order_list ]
[ row_or_range_clause ] )

 FIRST_VALUE or LAST_VALUE return First-row or Last-row value of the specified


expression with respect to the window frame (partition) respectively.

Example:

Query: Add new feature FIRST_VALUE or LAST_VALUE product price within each order
(i.e. PARTITION BY order_id).

SELECT order_id, name, product_id, price,


FIRST_VALUE(price) OVER (PARTITION BY order_id) AS FIRST_VALUE_Product_Price,
LAST_VALUE(price) OVER (PARTITION BY order_id) AS LAST_VALUE_Product_Price
FROM retails
Output — Query — First_Value() or Last_Value()

Explanation:

 As we can see both First_Value() and Last_Value() column give first-row value and last-
row value respectively.

c. NTH_VALUE():

Syntax:

NTH_VALUE (expression, nth_value )


OVER ( [PARTITION BY expression_list ] [ ORDER BY order_list ]
[ row_or_range_clause ] )

 NTH_VALUE return nth value assigned with expression with respect to the window
frame (partition) respectively.
 If nth_value is not available then its get imputed with NULL.

Example:

Query: Add new feature NTH_VALUE product price within each order (i.e. PARTITION BY
order_id).

SELECT order_id, name, product_id, price,


NTH_VALUE(price,3) OVER (PARTITION BY order_id) AS NTH_VALUE_Product_Price
FROM retails
Output — Query — NTH_VALUE()

Explanation:

 As in query we assigned nth_value as 3, it returns the 3rd-row value is partitioned by


order_id.
 For order_id 1111 there are only 2 products, hence column gets imputed with the NULL
value.

Conclusion:

In this blog, I tried to explain the major types of Window Function in SQL.

Window Functions are very useful while doing data analysis using SQL and easy to use.

Window Functions revolves around OVER, PARTITION BY and ROW or RANGE clauses.

You might also like