Skip to content

Exercise 5: Window Functions 🪟

🎯 Objective

Master window functions for advanced analytics without grouping rows.

📋 Problem Statement

Window functions perform calculations across rows related to the current row, without collapsing them like GROUP BY.

Key Concept: Window functions add calculated columns while keeping all original rows.

📥 Tasks

Task 1: Rank Products by Price

Rank all products by price (highest to lowest). Show product_name, price, and rank.

Task 2: Rank Within Category

Rank products by price within each category. Show category, product_name, price, and rank.

Task 3: Running Total of Orders

Calculate running total of order quantities over time (ordered by order_date).

Task 4: Price Comparison

For each product, show its price and the price of the next most expensive product in the same category.

Task 5: Top 2 Products per Category

Find the top 2 most expensive products in each category.

Task 6: Moving Average

Calculate 3-order moving average of order quantities.

💡 Hints

Hint 1: Window Function Syntax ```sql function_name() OVER ( [PARTITION BY column] [ORDER BY column] [ROWS/RANGE frame_specification] ) ```
Hint 2: Ranking Functions - `ROW_NUMBER()`: Unique sequential number (1, 2, 3, 4...) - `RANK()`: Same rank for ties, gaps after (1, 2, 2, 4...) - `DENSE_RANK()`: Same rank for ties, no gaps (1, 2, 2, 3...)
Hint 3: LEAD and LAG - `LEAD(column, offset)`: Access next row's value - `LAG(column, offset)`: Access previous row's value

Solutions

Click to reveal solutions

Solution 1: Rank Products by Price

sql
SELECT 
  product_name,
  price,
  RANK() OVER (ORDER BY price DESC) AS price_rank
FROM products
ORDER BY price_rank;

Result:

product_name          | price   | price_rank
----------------------|---------|------------
Laptop Pro 15         | 1299.99 | 1
Standing Desk         | 499.99  | 2
Monitor 27"           | 349.99  | 3
Office Chair          | 199.99  | 4
Keyboard Mechanical   | 89.99   | 5
Desk Lamp             | 45.99   | 6
Wireless Mouse        | 29.99   | 7
USB-C Cable           | 12.99   | 8

Explanation: RANK() assigns ranks based on ORDER BY. No PARTITION BY means one ranking across all products.

ROW_NUMBER vs RANK vs DENSE_RANK:

sql
SELECT 
  product_name,
  price,
  ROW_NUMBER() OVER (ORDER BY price DESC) AS row_num,
  RANK() OVER (ORDER BY price DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY price DESC) AS dense_rank
FROM products;

If two products had same price (e.g., both $100):

  • ROW_NUMBER: 1, 2, 3, 4 (always unique)
  • RANK: 1, 2, 2, 4 (tie at 2, skip 3)
  • DENSE_RANK: 1, 2, 2, 3 (tie at 2, no skip)

Solution 2: Rank Within Category

sql
SELECT 
  category,
  product_name,
  price,
  RANK() OVER (PARTITION BY category ORDER BY price DESC) AS category_rank
FROM products
ORDER BY category, category_rank;

Result:

category     | product_name          | price   | category_rank
-------------|-----------------------|---------|---------------
Electronics  | Laptop Pro 15         | 1299.99 | 1
Electronics  | Monitor 27"           | 349.99  | 2
Electronics  | Keyboard Mechanical   | 89.99   | 3
Electronics  | Wireless Mouse        | 29.99   | 4
Electronics  | USB-C Cable           | 12.99   | 5
Furniture    | Standing Desk         | 499.99  | 1
Furniture    | Office Chair          | 199.99  | 2
Furniture    | Desk Lamp             | 45.99   | 3

Explanation: PARTITION BY category creates separate ranking windows for each category.

Key Concept: PARTITION BY is like GROUP BY, but doesn't collapse rows.


Solution 3: Running Total of Orders

sql
SELECT 
  order_id,
  order_date,
  quantity,
  SUM(quantity) OVER (ORDER BY order_date, order_id) AS running_total
FROM orders
ORDER BY order_date, order_id;

Result:

order_id | order_date | quantity | running_total
---------|------------|----------|---------------
1        | 2024-01-15 | 2        | 2
2        | 2024-01-16 | 5        | 7
3        | 2024-01-17 | 1        | 8
4        | 2024-01-18 | 3        | 11
5        | 2024-01-19 | 10       | 21
6        | 2024-01-20 | 3        | 24
7        | 2024-01-21 | 1        | 25
8        | 2024-01-22 | 2        | 27
9        | 2024-01-23 | 4        | 31
10       | 2024-01-24 | 1        | 32

Explanation: SUM() OVER with ORDER BY creates cumulative sum.

Frame Specification (implicit):

sql
-- Explicit version (same result)
SUM(quantity) OVER (
  ORDER BY order_date, order_id
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

Solution 4: Price Comparison

sql
SELECT 
  category,
  product_name,
  price,
  LEAD(price) OVER (PARTITION BY category ORDER BY price DESC) AS next_lower_price,
  price - LEAD(price) OVER (PARTITION BY category ORDER BY price DESC) AS price_gap
FROM products
ORDER BY category, price DESC;

Result:

category     | product_name          | price   | next_lower_price | price_gap
-------------|-----------------------|---------|------------------|----------
Electronics  | Laptop Pro 15         | 1299.99 | 349.99           | 950.00
Electronics  | Monitor 27"           | 349.99  | 89.99            | 260.00
Electronics  | Keyboard Mechanical   | 89.99   | 29.99            | 60.00
Electronics  | Wireless Mouse        | 29.99   | 12.99            | 17.00
Electronics  | USB-C Cable           | 12.99   | NULL             | NULL
Furniture    | Standing Desk         | 499.99  | 199.99           | 300.00
Furniture    | Office Chair          | 199.99  | 45.99            | 154.00
Furniture    | Desk Lamp             | 45.99   | NULL             | NULL

Explanation:

  • LEAD() accesses next row's value (within partition)
  • Last row in each partition has NULL (no next row)
  • LAG() would access previous row

Use Cases:

  • Compare with previous/next period
  • Calculate differences between consecutive rows
  • Find gaps in sequences

Solution 5: Top 2 Products per Category

sql
WITH ranked_products AS (
  SELECT 
    category,
    product_name,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
  FROM products
)
SELECT category, product_name, price
FROM ranked_products
WHERE price_rank <= 2
ORDER BY category, price_rank;

Result:

category     | product_name          | price
-------------|-----------------------|--------
Electronics  | Laptop Pro 15         | 1299.99
Electronics  | Monitor 27"           | 349.99
Furniture    | Standing Desk         | 499.99
Furniture    | Office Chair          | 199.99

Explanation:

  1. CTE ranks products within category
  2. Filter to top 2 ranks
  3. This is the "Top N per group" pattern

Why CTE? Can't use window functions directly in WHERE clause.

Alternative (subquery):

sql
SELECT * FROM (
  SELECT 
    category, product_name, price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
  FROM products
) ranked
WHERE price_rank <= 2;

Solution 6: Moving Average

sql
SELECT 
  order_id,
  order_date,
  quantity,
  AVG(quantity) OVER (
    ORDER BY order_date, order_id
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_avg_3
FROM orders
ORDER BY order_date, order_id;

Result:

order_id | order_date | quantity | moving_avg_3
---------|------------|----------|-------------
1        | 2024-01-15 | 2        | 2.00
2        | 2024-01-16 | 5        | 3.50
3        | 2024-01-17 | 1        | 2.67
4        | 2024-01-18 | 3        | 3.00
5        | 2024-01-19 | 10       | 4.67
6        | 2024-01-20 | 3        | 5.33
7        | 2024-01-21 | 1        | 4.67
8        | 2024-01-22 | 2        | 2.00
9        | 2024-01-23 | 4        | 2.33
10       | 2024-01-24 | 1        | 2.33

Explanation:

  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW defines 3-row window
  • First row: only 1 value (itself)
  • Second row: 2 values (previous + current)
  • Third row onwards: 3 values (2 previous + current)

Frame Specifications:

sql
-- Last 3 rows including current
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

-- Next 3 rows including current
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING

-- All rows from start to current (running total)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- All rows in partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

🚀 Follow-up Challenges

  1. Percentile Rank: Calculate percentile rank of each product's price
  2. First/Last Value: Show first and last order date for each customer
  3. Lag Multiple Rows: Compare current month sales with same month last year
  4. Conditional Window: Calculate running total only for specific category

🔑 Key Takeaways

Window Function Components

sql
function_name(column) OVER (
  PARTITION BY partition_column  -- Optional: Create separate windows
  ORDER BY order_column          -- Optional: Define row order
  ROWS/RANGE frame_spec          -- Optional: Define frame within partition
)

Function Categories

CategoryFunctionsUse Case
RankingROW_NUMBER, RANK, DENSE_RANK, NTILEAssign ranks/numbers
OffsetLEAD, LAG, FIRST_VALUE, LAST_VALUEAccess other rows
AggregateSUM, AVG, COUNT, MIN, MAXRunning totals, moving averages
DistributionPERCENT_RANK, CUME_DISTPercentiles, distributions

PARTITION BY vs GROUP BY

AspectPARTITION BYGROUP BY
RowsKeeps all rowsCollapses to one row per group
Use withWindow functionsAggregate functions
OutputOriginal rows + calculated columnGrouped summary

Best Practices

  • Use CTEs for complex window function queries (better readability)
  • Order matters in LEAD/LAG and frame specifications
  • Partition wisely - too many partitions can hurt performance
  • Index ORDER BY columns for better performance
  • Avoid in WHERE - use CTE or subquery to filter window function results

📚 Advanced Patterns

Pattern 1: Cumulative Percentage

sql
SELECT 
  product_name,
  price,
  SUM(price) OVER (ORDER BY price DESC) AS cumulative_price,
  100.0 * SUM(price) OVER (ORDER BY price DESC) / SUM(price) OVER () AS cumulative_pct
FROM products;

Pattern 2: Gap Detection

sql
SELECT 
  order_id,
  order_date,
  LAG(order_date) OVER (ORDER BY order_date) AS prev_order_date,
  order_date - LAG(order_date) OVER (ORDER BY order_date) AS days_since_last_order
FROM orders;

Pattern 3: Quartiles

sql
SELECT 
  product_name,
  price,
  NTILE(4) OVER (ORDER BY price) AS price_quartile
FROM products;

Pattern 4: Conditional Aggregation

sql
SELECT 
  order_id,
  quantity,
  SUM(CASE WHEN quantity > 2 THEN quantity ELSE 0 END) 
    OVER (ORDER BY order_id) AS running_total_large_orders
FROM orders;

⚠️ Common Pitfalls

⚠️ Cạm bẫy

Pitfall 1: Forgetting ORDER BY

sql
-- WRONG: Undefined order, unpredictable results
SELECT product_name, ROW_NUMBER() OVER () FROM products;

-- CORRECT: Specify order
SELECT product_name, ROW_NUMBER() OVER (ORDER BY product_id) FROM products;

⚠️ Cạm bẫy

Pitfall 2: Using Window Functions in WHERE

sql
-- WRONG: Can't use window functions in WHERE
SELECT * FROM products
WHERE RANK() OVER (ORDER BY price DESC) <= 3;

-- CORRECT: Use CTE or subquery
WITH ranked AS (
  SELECT *, RANK() OVER (ORDER BY price DESC) AS rnk
  FROM products
)
SELECT * FROM ranked WHERE rnk <= 3;

⚠️ Cạm bẫy

Pitfall 3: Confusing ROWS vs RANGE

sql
-- ROWS: Physical rows (2 preceding rows)
AVG(price) OVER (ORDER BY price ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

-- RANGE: Logical range (all rows with price within 2 of current)
AVG(price) OVER (ORDER BY price RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)