Giao diện
Exercise 9: Complex Queries 🧩
🎯 Objective
Master advanced SQL patterns for complex business requirements and analytics.
📋 Problem Statement
Real-world applications require sophisticated queries combining multiple techniques. This exercise integrates everything you've learned.
📥 Tasks
Task 1: Pivot Table - Sales by Category and Month
Create a pivot table showing total sales by category (rows) and month (columns) for 2024.
Task 2: Cohort Analysis
Calculate customer retention: For each customer's first order month, show how many ordered again in subsequent months.
Task 3: Running Difference
Calculate month-over-month sales growth (absolute and percentage).
Task 4: Top N per Group with Ties
Find top 3 products by sales in each category, including ties.
Task 5: Gap and Island Detection
Find consecutive date ranges where orders were placed daily (no gaps).
Task 6: Conditional Aggregation
Create a summary showing: total orders, total revenue, average order value, and percentage of orders above $100.
💡 Hints
Hint 1: Pivot Table
Use CASE statements with aggregation: ```sql SELECT category, SUM(CASE WHEN month = 1 THEN sales ELSE 0 END) AS jan, SUM(CASE WHEN month = 2 THEN sales ELSE 0 END) AS feb FROM ... GROUP BY category; ```Hint 2: Cohort Analysis
1. Find each customer's first order date 2. Calculate months since first order for each subsequent order 3. Count customers active in each cohort monthHint 3: Gap Detection
Use LAG() to compare consecutive dates, identify gaps where difference > 1 day.✅ Solutions
Click to reveal solutions
Solution 1: Pivot Table - Sales by Category and Month
sql
WITH monthly_sales AS (
SELECT
p.category,
strftime('%m', o.order_date) AS month,
SUM(o.quantity * p.price) AS sales
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE strftime('%Y', o.order_date) = '2024'
GROUP BY p.category, strftime('%m', o.order_date)
)
SELECT
category,
SUM(CASE WHEN month = '01' THEN sales ELSE 0 END) AS Jan,
SUM(CASE WHEN month = '02' THEN sales ELSE 0 END) AS Feb,
SUM(CASE WHEN month = '03' THEN sales ELSE 0 END) AS Mar,
SUM(CASE WHEN month = '04' THEN sales ELSE 0 END) AS Apr,
SUM(CASE WHEN month = '05' THEN sales ELSE 0 END) AS May,
SUM(CASE WHEN month = '06' THEN sales ELSE 0 END) AS Jun,
SUM(CASE WHEN month = '07' THEN sales ELSE 0 END) AS Jul,
SUM(CASE WHEN month = '08' THEN sales ELSE 0 END) AS Aug,
SUM(CASE WHEN month = '09' THEN sales ELSE 0 END) AS Sep,
SUM(CASE WHEN month = '10' THEN sales ELSE 0 END) AS Oct,
SUM(CASE WHEN month = '11' THEN sales ELSE 0 END) AS Nov,
SUM(CASE WHEN month = '12' THEN sales ELSE 0 END) AS Dec,
SUM(sales) AS Total
FROM monthly_sales
GROUP BY category
ORDER BY Total DESC;Result:
category | Jan | Feb | Mar | ... | Total
-------------|--------|--------|-----|-----|--------
Electronics | 3899.95| 2599.96| ... | ... | 15000.00
Furniture | 1599.96| 999.98 | ... | ... | 8000.00Explanation:
- CTE calculates sales per category-month
- CASE statements pivot months into columns
- SUM aggregates across all months
Alternative (PostgreSQL - CROSSTAB):
sql
SELECT * FROM crosstab(
'SELECT category, month, sales FROM monthly_sales ORDER BY 1,2',
'SELECT DISTINCT month FROM monthly_sales ORDER BY 1'
) AS ct(category TEXT, Jan NUMERIC, Feb NUMERIC, ...);Solution 2: Cohort Analysis
sql
WITH first_orders AS (
-- Find each customer's first order date
SELECT
customer_id,
MIN(order_date) AS first_order_date,
strftime('%Y-%m', MIN(order_date)) AS cohort_month
FROM orders
GROUP BY customer_id
),
order_months AS (
-- Calculate months since first order for each order
SELECT
o.customer_id,
fo.cohort_month,
strftime('%Y-%m', o.order_date) AS order_month,
(strftime('%Y', o.order_date) - strftime('%Y', fo.first_order_date)) * 12 +
(strftime('%m', o.order_date) - strftime('%m', fo.first_order_date)) AS months_since_first
FROM orders o
JOIN first_orders fo ON o.customer_id = fo.customer_id
)
SELECT
cohort_month,
COUNT(DISTINCT CASE WHEN months_since_first = 0 THEN customer_id END) AS Month_0,
COUNT(DISTINCT CASE WHEN months_since_first = 1 THEN customer_id END) AS Month_1,
COUNT(DISTINCT CASE WHEN months_since_first = 2 THEN customer_id END) AS Month_2,
COUNT(DISTINCT CASE WHEN months_since_first = 3 THEN customer_id END) AS Month_3,
-- Calculate retention rates
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_first = 1 THEN customer_id END) /
NULLIF(COUNT(DISTINCT CASE WHEN months_since_first = 0 THEN customer_id END), 0), 2) AS Retention_Month_1
FROM order_months
GROUP BY cohort_month
ORDER BY cohort_month;Result:
cohort_month | Month_0 | Month_1 | Month_2 | Month_3 | Retention_Month_1
-------------|---------|---------|---------|---------|------------------
2024-01 | 100 | 45 | 30 | 20 | 45.00%
2024-02 | 80 | 35 | 25 | NULL | 43.75%
2024-03 | 90 | 40 | NULL | NULL | 44.44%Explanation:
- Month_0: Customers who made first order in cohort month
- Month_1: Customers who ordered again 1 month later
- Retention_Month_1: Percentage who returned in month 1
Business Insight: Track customer retention over time to measure product-market fit.
Solution 3: Running Difference (Month-over-Month Growth)
sql
WITH monthly_sales AS (
SELECT
strftime('%Y-%m', order_date) AS month,
SUM(quantity * (SELECT price FROM products WHERE product_id = orders.product_id)) AS total_sales
FROM orders
GROUP BY strftime('%Y-%m', order_date)
),
sales_with_previous AS (
SELECT
month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) AS previous_month_sales
FROM monthly_sales
)
SELECT
month,
total_sales,
previous_month_sales,
total_sales - previous_month_sales AS absolute_growth,
ROUND(100.0 * (total_sales - previous_month_sales) / NULLIF(previous_month_sales, 0), 2) AS growth_percentage
FROM sales_with_previous
ORDER BY month;Result:
month | total_sales | previous_month_sales | absolute_growth | growth_percentage
--------|-------------|----------------------|-----------------|------------------
2024-01 | 5000.00 | NULL | NULL | NULL
2024-02 | 6500.00 | 5000.00 | 1500.00 | 30.00%
2024-03 | 5800.00 | 6500.00 | -700.00 | -10.77%
2024-04 | 7200.00 | 5800.00 | 1400.00 | 24.14%Explanation:
- Calculate total sales per month
- Use LAG() to get previous month's sales
- Calculate absolute and percentage growth
Key Insight: Negative growth in March indicates seasonal dip or issue to investigate.
Solution 4: Top N per Group with Ties
sql
WITH product_sales AS (
SELECT
p.category,
p.product_name,
SUM(o.quantity * p.price) AS total_sales
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category, p.product_id, p.product_name
),
ranked_products AS (
SELECT
category,
product_name,
total_sales,
RANK() OVER (PARTITION BY category ORDER BY total_sales DESC) AS sales_rank
FROM product_sales
)
SELECT
category,
product_name,
total_sales,
sales_rank
FROM ranked_products
WHERE sales_rank <= 3
ORDER BY category, sales_rank;Result:
category | product_name | total_sales | sales_rank
-------------|-----------------------|-------------|------------
Electronics | Laptop Pro 15 | 5199.96 | 1
Electronics | Monitor 27" | 349.99 | 2
Electronics | Keyboard Mechanical | 359.96 | 3
Furniture | Standing Desk | 999.98 | 1
Furniture | Office Chair | 599.97 | 2
Furniture | Desk Lamp | 45.99 | 3Why RANK() instead of ROW_NUMBER()?
- RANK(): Includes ties (if two products have same sales, both get rank 2, next is 4)
- ROW_NUMBER(): Arbitrary ordering of ties (one gets 2, other gets 3)
- DENSE_RANK(): Includes ties, no gaps (both get 2, next is 3)
Use Case: "Show top 3 best-selling products per category, including ties"
Solution 5: Gap and Island Detection
sql
WITH order_dates AS (
SELECT DISTINCT DATE(order_date) AS order_date
FROM orders
ORDER BY order_date
),
gaps AS (
SELECT
order_date,
LAG(order_date) OVER (ORDER BY order_date) AS prev_date,
julianday(order_date) - julianday(LAG(order_date) OVER (ORDER BY order_date)) AS days_diff
FROM order_dates
),
islands AS (
SELECT
order_date,
SUM(CASE WHEN days_diff > 1 OR days_diff IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY order_date) AS island_id
FROM gaps
)
SELECT
island_id,
MIN(order_date) AS streak_start,
MAX(order_date) AS streak_end,
COUNT(*) AS consecutive_days
FROM islands
GROUP BY island_id
HAVING COUNT(*) >= 3 -- Only streaks of 3+ days
ORDER BY consecutive_days DESC;Result:
island_id | streak_start | streak_end | consecutive_days
----------|--------------|------------|------------------
1 | 2024-01-15 | 2024-01-24 | 10
3 | 2024-02-01 | 2024-02-05 | 5
2 | 2024-01-28 | 2024-01-30 | 3Explanation:
- Get distinct order dates
- Calculate days between consecutive dates (LAG)
- Identify gaps (days_diff > 1)
- Assign island_id (increments at each gap)
- Group by island to find streaks
Use Case: Find periods of consistent daily activity or identify gaps in data.
Solution 6: Conditional Aggregation
sql
WITH order_values AS (
SELECT
o.order_id,
o.order_date,
SUM(o.quantity * p.price) AS order_value
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY o.order_id, o.order_date
)
SELECT
COUNT(*) AS total_orders,
SUM(order_value) AS total_revenue,
ROUND(AVG(order_value), 2) AS avg_order_value,
ROUND(100.0 * SUM(CASE WHEN order_value > 100 THEN 1 ELSE 0 END) / COUNT(*), 2) AS pct_orders_above_100,
SUM(CASE WHEN order_value > 100 THEN 1 ELSE 0 END) AS orders_above_100,
SUM(CASE WHEN order_value <= 100 THEN 1 ELSE 0 END) AS orders_below_100,
MIN(order_value) AS min_order,
MAX(order_value) AS max_order,
-- Percentiles (database-specific)
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_value) AS median_order_value,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY order_value) AS p95_order_value
FROM order_values;Result:
total_orders | total_revenue | avg_order_value | pct_orders_above_100 | orders_above_100 | orders_below_100 | min_order | max_order | median | p95
-------------|---------------|-----------------|----------------------|------------------|------------------|-----------|-----------|--------|------
10 | 8549.85 | 854.99 | 60.00% | 6 | 4 | 12.99 | 2599.98 | 524.99 | 2299.99Explanation:
- Conditional aggregation: SUM(CASE WHEN...) counts rows meeting condition
- Percentiles: Show distribution (median = 50th percentile, p95 = 95th percentile)
Business Insight: 60% of orders are high-value (>$100), indicating strong customer engagement.
🚀 Follow-up Challenges
- Funnel Analysis: Calculate conversion rates through multi-step process
- RFM Segmentation: Segment customers by Recency, Frequency, Monetary value
- Time Series Forecasting: Calculate moving averages and trends
- ABC Analysis: Classify products by cumulative revenue contribution
🔑 Key Takeaways
Complex Query Patterns
| Pattern | Use Case | Key Technique |
|---|---|---|
| Pivot Table | Cross-tabulation | CASE + SUM + GROUP BY |
| Cohort Analysis | Retention tracking | Self-join + Date math |
| Running Calculations | Trends, growth | Window functions (LAG, SUM OVER) |
| Top N per Group | Rankings | RANK() + PARTITION BY |
| Gap Detection | Find missing data | LAG() + Conditional logic |
| Conditional Aggregation | Complex metrics | SUM(CASE WHEN...) |
Query Building Strategy
1. Break Down the Problem
- What is the final output?
- What intermediate steps are needed?
- What data sources are required?
2. Build Incrementally
- Start with simplest query
- Add one CTE at a time
- Test each step independently
3. Optimize Last
- Get correct results first
- Then optimize for performance
- Use EXPLAIN to identify bottlenecks
Advanced Aggregation Techniques
Conditional Counting:
sql
SUM(CASE WHEN condition THEN 1 ELSE 0 END) AS count_matchingConditional Summing:
sql
SUM(CASE WHEN condition THEN amount ELSE 0 END) AS total_matchingMultiple Conditions:
sql
SUM(CASE
WHEN condition1 THEN value1
WHEN condition2 THEN value2
ELSE 0
END) AS complex_metricPercentage Calculation:
sql
100.0 * SUM(CASE WHEN condition THEN 1 ELSE 0 END) / COUNT(*) AS percentageDate Math Patterns
Months Between Dates:
sql
(YEAR(date2) - YEAR(date1)) * 12 + (MONTH(date2) - MONTH(date1))Days Between Dates:
sql
DATEDIFF(date2, date1) -- MySQL
julianday(date2) - julianday(date1) -- SQLite
date2 - date1 -- PostgreSQLFirst Day of Month:
sql
DATE_TRUNC('month', date) -- PostgreSQL
DATE_FORMAT(date, '%Y-%m-01') -- MySQL📚 Real-World Applications
Application 1: E-commerce Dashboard
sql
-- Daily sales summary with comparisons
WITH daily_sales AS (
SELECT
DATE(order_date) AS date,
COUNT(*) AS orders,
SUM(quantity * price) AS revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY DATE(order_date)
)
SELECT
date,
orders,
revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day_revenue,
LAG(revenue, 7) OVER (ORDER BY date) AS same_day_last_week,
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7day_avg
FROM daily_sales;Application 2: Customer Segmentation (RFM)
sql
WITH customer_metrics AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date,
COUNT(*) AS frequency,
SUM(quantity * price) AS monetary
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY customer_id
),
rfm_scores AS (
SELECT
customer_id,
julianday('now') - julianday(last_order_date) AS recency_days,
frequency,
monetary,
NTILE(5) OVER (ORDER BY julianday('now') - julianday(last_order_date) DESC) AS recency_score,
NTILE(5) OVER (ORDER BY frequency) AS frequency_score,
NTILE(5) OVER (ORDER BY monetary) AS monetary_score
FROM customer_metrics
)
SELECT
customer_id,
recency_score || frequency_score || monetary_score AS rfm_segment,
CASE
WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Champions'
WHEN recency_score >= 3 AND frequency_score >= 3 THEN 'Loyal'
WHEN recency_score >= 4 AND frequency_score <= 2 THEN 'Promising'
WHEN recency_score <= 2 THEN 'At Risk'
ELSE 'Other'
END AS segment_name
FROM rfm_scores;Application 3: Inventory Reorder Alert
sql
WITH product_velocity AS (
SELECT
p.product_id,
p.product_name,
p.stock_quantity,
AVG(o.quantity) AS avg_daily_sales
FROM products p
LEFT JOIN orders o ON p.product_id = o.product_id
AND o.order_date >= DATE('now', '-30 days')
GROUP BY p.product_id, p.product_name, p.stock_quantity
)
SELECT
product_name,
stock_quantity,
ROUND(avg_daily_sales, 2) AS avg_daily_sales,
ROUND(stock_quantity / NULLIF(avg_daily_sales, 0), 1) AS days_of_stock,
CASE
WHEN stock_quantity / NULLIF(avg_daily_sales, 0) < 7 THEN 'URGENT'
WHEN stock_quantity / NULLIF(avg_daily_sales, 0) < 14 THEN 'WARNING'
ELSE 'OK'
END AS reorder_status
FROM product_velocity
WHERE avg_daily_sales > 0
ORDER BY days_of_stock;⚠️ Common Pitfalls
⚠️ Cạm bẫy
Pitfall 1: Division by Zero
sql
-- WRONG: Crashes if denominator is 0
SELECT revenue / orders AS avg_order_value;
-- CORRECT: Use NULLIF
SELECT revenue / NULLIF(orders, 0) AS avg_order_value;⚠️ Cạm bẫy
Pitfall 2: Date Comparison with Time
sql
-- WRONG: Misses orders on 2024-01-31 after midnight
SELECT * FROM orders WHERE order_date < '2024-02-01';
-- CORRECT: Use DATE() or explicit time
SELECT * FROM orders WHERE DATE(order_date) < '2024-02-01';⚠️ Cạm bẫy
Pitfall 3: Aggregating Aggregates
sql
-- WRONG: Can't nest aggregates
SELECT category, AVG(SUM(price)) FROM products GROUP BY category;
-- CORRECT: Use subquery or CTE
WITH category_totals AS (
SELECT category, SUM(price) AS total
FROM products GROUP BY category
)
SELECT AVG(total) FROM category_totals;