Giao diện
Exercise 5: Window Functions 🪟
🎯 Objective
Master window functions for advanced analytics without grouping rows.
📋 Problem Statement
Window functions perform calculations across rows related to the current row, without collapsing them like GROUP BY.
Key Concept: Window functions add calculated columns while keeping all original rows.
📥 Tasks
Task 1: Rank Products by Price
Rank all products by price (highest to lowest). Show product_name, price, and rank.
Task 2: Rank Within Category
Rank products by price within each category. Show category, product_name, price, and rank.
Task 3: Running Total of Orders
Calculate running total of order quantities over time (ordered by order_date).
Task 4: Price Comparison
For each product, show its price and the price of the next most expensive product in the same category.
Task 5: Top 2 Products per Category
Find the top 2 most expensive products in each category.
Task 6: Moving Average
Calculate 3-order moving average of order quantities.
💡 Hints
Hint 1: Window Function Syntax
```sql function_name() OVER ( [PARTITION BY column] [ORDER BY column] [ROWS/RANGE frame_specification] ) ```Hint 2: Ranking Functions
- `ROW_NUMBER()`: Unique sequential number (1, 2, 3, 4...) - `RANK()`: Same rank for ties, gaps after (1, 2, 2, 4...) - `DENSE_RANK()`: Same rank for ties, no gaps (1, 2, 2, 3...)Hint 3: LEAD and LAG
- `LEAD(column, offset)`: Access next row's value - `LAG(column, offset)`: Access previous row's value✅ Solutions
Click to reveal solutions
Solution 1: Rank Products by Price
sql
SELECT
product_name,
price,
RANK() OVER (ORDER BY price DESC) AS price_rank
FROM products
ORDER BY price_rank;Result:
product_name | price | price_rank
----------------------|---------|------------
Laptop Pro 15 | 1299.99 | 1
Standing Desk | 499.99 | 2
Monitor 27" | 349.99 | 3
Office Chair | 199.99 | 4
Keyboard Mechanical | 89.99 | 5
Desk Lamp | 45.99 | 6
Wireless Mouse | 29.99 | 7
USB-C Cable | 12.99 | 8Explanation: RANK() assigns ranks based on ORDER BY. No PARTITION BY means one ranking across all products.
ROW_NUMBER vs RANK vs DENSE_RANK:
sql
SELECT
product_name,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) AS row_num,
RANK() OVER (ORDER BY price DESC) AS rank,
DENSE_RANK() OVER (ORDER BY price DESC) AS dense_rank
FROM products;If two products had same price (e.g., both $100):
- ROW_NUMBER: 1, 2, 3, 4 (always unique)
- RANK: 1, 2, 2, 4 (tie at 2, skip 3)
- DENSE_RANK: 1, 2, 2, 3 (tie at 2, no skip)
Solution 2: Rank Within Category
sql
SELECT
category,
product_name,
price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS category_rank
FROM products
ORDER BY category, category_rank;Result:
category | product_name | price | category_rank
-------------|-----------------------|---------|---------------
Electronics | Laptop Pro 15 | 1299.99 | 1
Electronics | Monitor 27" | 349.99 | 2
Electronics | Keyboard Mechanical | 89.99 | 3
Electronics | Wireless Mouse | 29.99 | 4
Electronics | USB-C Cable | 12.99 | 5
Furniture | Standing Desk | 499.99 | 1
Furniture | Office Chair | 199.99 | 2
Furniture | Desk Lamp | 45.99 | 3Explanation: PARTITION BY category creates separate ranking windows for each category.
Key Concept: PARTITION BY is like GROUP BY, but doesn't collapse rows.
Solution 3: Running Total of Orders
sql
SELECT
order_id,
order_date,
quantity,
SUM(quantity) OVER (ORDER BY order_date, order_id) AS running_total
FROM orders
ORDER BY order_date, order_id;Result:
order_id | order_date | quantity | running_total
---------|------------|----------|---------------
1 | 2024-01-15 | 2 | 2
2 | 2024-01-16 | 5 | 7
3 | 2024-01-17 | 1 | 8
4 | 2024-01-18 | 3 | 11
5 | 2024-01-19 | 10 | 21
6 | 2024-01-20 | 3 | 24
7 | 2024-01-21 | 1 | 25
8 | 2024-01-22 | 2 | 27
9 | 2024-01-23 | 4 | 31
10 | 2024-01-24 | 1 | 32Explanation: SUM() OVER with ORDER BY creates cumulative sum.
Frame Specification (implicit):
sql
-- Explicit version (same result)
SUM(quantity) OVER (
ORDER BY order_date, order_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)Solution 4: Price Comparison
sql
SELECT
category,
product_name,
price,
LEAD(price) OVER (PARTITION BY category ORDER BY price DESC) AS next_lower_price,
price - LEAD(price) OVER (PARTITION BY category ORDER BY price DESC) AS price_gap
FROM products
ORDER BY category, price DESC;Result:
category | product_name | price | next_lower_price | price_gap
-------------|-----------------------|---------|------------------|----------
Electronics | Laptop Pro 15 | 1299.99 | 349.99 | 950.00
Electronics | Monitor 27" | 349.99 | 89.99 | 260.00
Electronics | Keyboard Mechanical | 89.99 | 29.99 | 60.00
Electronics | Wireless Mouse | 29.99 | 12.99 | 17.00
Electronics | USB-C Cable | 12.99 | NULL | NULL
Furniture | Standing Desk | 499.99 | 199.99 | 300.00
Furniture | Office Chair | 199.99 | 45.99 | 154.00
Furniture | Desk Lamp | 45.99 | NULL | NULLExplanation:
- LEAD() accesses next row's value (within partition)
- Last row in each partition has NULL (no next row)
- LAG() would access previous row
Use Cases:
- Compare with previous/next period
- Calculate differences between consecutive rows
- Find gaps in sequences
Solution 5: Top 2 Products per Category
sql
WITH ranked_products AS (
SELECT
category,
product_name,
price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products
)
SELECT category, product_name, price
FROM ranked_products
WHERE price_rank <= 2
ORDER BY category, price_rank;Result:
category | product_name | price
-------------|-----------------------|--------
Electronics | Laptop Pro 15 | 1299.99
Electronics | Monitor 27" | 349.99
Furniture | Standing Desk | 499.99
Furniture | Office Chair | 199.99Explanation:
- CTE ranks products within category
- Filter to top 2 ranks
- This is the "Top N per group" pattern
Why CTE? Can't use window functions directly in WHERE clause.
Alternative (subquery):
sql
SELECT * FROM (
SELECT
category, product_name, price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products
) ranked
WHERE price_rank <= 2;Solution 6: Moving Average
sql
SELECT
order_id,
order_date,
quantity,
AVG(quantity) OVER (
ORDER BY order_date, order_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM orders
ORDER BY order_date, order_id;Result:
order_id | order_date | quantity | moving_avg_3
---------|------------|----------|-------------
1 | 2024-01-15 | 2 | 2.00
2 | 2024-01-16 | 5 | 3.50
3 | 2024-01-17 | 1 | 2.67
4 | 2024-01-18 | 3 | 3.00
5 | 2024-01-19 | 10 | 4.67
6 | 2024-01-20 | 3 | 5.33
7 | 2024-01-21 | 1 | 4.67
8 | 2024-01-22 | 2 | 2.00
9 | 2024-01-23 | 4 | 2.33
10 | 2024-01-24 | 1 | 2.33Explanation:
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWdefines 3-row window- First row: only 1 value (itself)
- Second row: 2 values (previous + current)
- Third row onwards: 3 values (2 previous + current)
Frame Specifications:
sql
-- Last 3 rows including current
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
-- Next 3 rows including current
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
-- All rows from start to current (running total)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- All rows in partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING🚀 Follow-up Challenges
- Percentile Rank: Calculate percentile rank of each product's price
- First/Last Value: Show first and last order date for each customer
- Lag Multiple Rows: Compare current month sales with same month last year
- Conditional Window: Calculate running total only for specific category
🔑 Key Takeaways
Window Function Components
sql
function_name(column) OVER (
PARTITION BY partition_column -- Optional: Create separate windows
ORDER BY order_column -- Optional: Define row order
ROWS/RANGE frame_spec -- Optional: Define frame within partition
)Function Categories
| Category | Functions | Use Case |
|---|---|---|
| Ranking | ROW_NUMBER, RANK, DENSE_RANK, NTILE | Assign ranks/numbers |
| Offset | LEAD, LAG, FIRST_VALUE, LAST_VALUE | Access other rows |
| Aggregate | SUM, AVG, COUNT, MIN, MAX | Running totals, moving averages |
| Distribution | PERCENT_RANK, CUME_DIST | Percentiles, distributions |
PARTITION BY vs GROUP BY
| Aspect | PARTITION BY | GROUP BY |
|---|---|---|
| Rows | Keeps all rows | Collapses to one row per group |
| Use with | Window functions | Aggregate functions |
| Output | Original rows + calculated column | Grouped summary |
Best Practices
- Use CTEs for complex window function queries (better readability)
- Order matters in LEAD/LAG and frame specifications
- Partition wisely - too many partitions can hurt performance
- Index ORDER BY columns for better performance
- Avoid in WHERE - use CTE or subquery to filter window function results
📚 Advanced Patterns
Pattern 1: Cumulative Percentage
sql
SELECT
product_name,
price,
SUM(price) OVER (ORDER BY price DESC) AS cumulative_price,
100.0 * SUM(price) OVER (ORDER BY price DESC) / SUM(price) OVER () AS cumulative_pct
FROM products;Pattern 2: Gap Detection
sql
SELECT
order_id,
order_date,
LAG(order_date) OVER (ORDER BY order_date) AS prev_order_date,
order_date - LAG(order_date) OVER (ORDER BY order_date) AS days_since_last_order
FROM orders;Pattern 3: Quartiles
sql
SELECT
product_name,
price,
NTILE(4) OVER (ORDER BY price) AS price_quartile
FROM products;Pattern 4: Conditional Aggregation
sql
SELECT
order_id,
quantity,
SUM(CASE WHEN quantity > 2 THEN quantity ELSE 0 END)
OVER (ORDER BY order_id) AS running_total_large_orders
FROM orders;⚠️ Common Pitfalls
⚠️ Cạm bẫy
Pitfall 1: Forgetting ORDER BY
sql
-- WRONG: Undefined order, unpredictable results
SELECT product_name, ROW_NUMBER() OVER () FROM products;
-- CORRECT: Specify order
SELECT product_name, ROW_NUMBER() OVER (ORDER BY product_id) FROM products;⚠️ Cạm bẫy
Pitfall 2: Using Window Functions in WHERE
sql
-- WRONG: Can't use window functions in WHERE
SELECT * FROM products
WHERE RANK() OVER (ORDER BY price DESC) <= 3;
-- CORRECT: Use CTE or subquery
WITH ranked AS (
SELECT *, RANK() OVER (ORDER BY price DESC) AS rnk
FROM products
)
SELECT * FROM ranked WHERE rnk <= 3;⚠️ Cạm bẫy
Pitfall 3: Confusing ROWS vs RANGE
sql
-- ROWS: Physical rows (2 preceding rows)
AVG(price) OVER (ORDER BY price ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
-- RANGE: Logical range (all rows with price within 2 of current)
AVG(price) OVER (ORDER BY price RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)