Exercise 8: Query Optimization 🚀

🎯 Objective

Master query optimization techniques, understand execution plans, and use indexes effectively.

📋 Problem Statement

Slow queries can cripple application performance. Learn to identify bottlenecks and optimize queries systematically.

📥 Tasks

Task 1: Analyze Query Performance

Use EXPLAIN to analyze this query and identify performance issues:

sql

SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;

Task 2: Create Appropriate Indexes

Based on the query above, create indexes to improve performance.

Task 3: Optimize Subquery

Rewrite this slow subquery as a JOIN:

sql

SELECT product_name, price,
  (SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);

Task 4: Avoid SELECT *

Rewrite to select only needed columns and measure performance difference.

Task 5: Optimize OR Conditions

Rewrite this query to avoid slow OR conditions:

sql

SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';

Task 6: Pagination Optimization

Implement efficient pagination for large result sets (avoid OFFSET on large offsets).

💡 Hints

Hint 1: EXPLAIN Output

Key metrics to look for: - **Seq Scan**: Full table scan (slow for large tables) - **Index Scan**: Using index (fast) - **Rows**: Estimated rows processed - **Cost**: Query planner's cost estimate

Hint 2: Index Types

- **B-Tree**: Default, good for equality and range queries - **Hash**: Good for equality only - **GiST/GIN**: For full-text search, arrays - **Partial**: Index subset of rows

Hint 3: Query Optimization Checklist

1. Add indexes on WHERE, JOIN, ORDER BY columns 2. Avoid SELECT * 3. Use LIMIT when possible 4. Replace subqueries with JOINs 5. Use UNION ALL instead of UNION (if duplicates OK) 6. Avoid functions on indexed columns in WHERE

✅ Solutions

Click to reveal solutions

Solution 1: Analyze Query Performance

sql

EXPLAIN QUERY PLAN
SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;

Sample Output (SQLite):

QUERY PLAN
|--SCAN TABLE products AS p
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SCAN TABLE orders AS o
   `--USE TEMP B-TREE FOR GROUP BY
   `--USE TEMP B-TREE FOR ORDER BY

Analysis:

❌ SCAN TABLE products: Full table scan (no index on price)
✅ SEARCH suppliers: Using primary key (good)
❌ SCAN TABLE orders: Full table scan (no index on product_id)
❌ TEMP B-TREE: Temporary structures for GROUP BY and ORDER BY (expensive)

Performance Issues:

No index on products.price → full table scan
No index on orders.product_id → full table scan for JOIN
GROUP BY and ORDER BY require temporary structures

Solution 2: Create Appropriate Indexes

sql

-- Index for WHERE clause
CREATE INDEX idx_products_price ON products(price);

-- Index for JOIN (foreign key)
CREATE INDEX idx_products_supplier_id ON products(supplier_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

-- Composite index for GROUP BY (optional, advanced)
CREATE INDEX idx_products_id_name ON products(product_id, product_name);

-- Verify indexes
.indexes products  -- SQLite
-- or
SHOW INDEXES FROM products;  -- MySQL

After Indexes - EXPLAIN Output:

QUERY PLAN
|--SEARCH TABLE products AS p USING INDEX idx_products_price (price>?)
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SEARCH TABLE orders AS o USING INDEX idx_orders_product_id (product_id=?)

Performance Improvement:

✅ Index scan instead of full table scan
✅ Faster JOINs using indexes
✅ Reduced temporary structure usage

Index Guidelines:

Index columns in WHERE, JOIN, ORDER BY, GROUP BY
Index foreign keys (always!)
Don't over-index (slows down INSERT/UPDATE)
Monitor index usage (remove unused indexes)

Solution 3: Optimize Subquery

sql

-- SLOW: Correlated subquery (executes once per row)
SELECT product_name, price,
  (SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- FAST: JOIN + CTE
WITH avg_price AS (
  SELECT AVG(price) AS avg_price FROM products
)
SELECT p.product_name, p.price, s.supplier_name
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
CROSS JOIN avg_price
WHERE p.price > avg_price.avg_price;

Performance Comparison:

Subquery version: O(n²) - subquery executes n times
JOIN version: O(n) - single pass through data

Benchmark (1000 products):

Subquery: ~500ms
JOIN: ~50ms
10x faster!

When to use subquery vs JOIN:

✅ JOIN: When you need data from related table
✅ Subquery: When checking existence (EXISTS) or single value
❌ Subquery: Avoid correlated subqueries in SELECT

Solution 4: Avoid SELECT *

sql

-- SLOW: Fetches all columns (including large TEXT/BLOB)
SELECT * FROM products WHERE category = 'Electronics';

-- FAST: Select only needed columns
SELECT product_id, product_name, price FROM products WHERE category = 'Electronics';

Why SELECT * is bad:

Network overhead: Transfers unnecessary data
Memory usage: Loads unused columns into memory
Index coverage: Can't use covering indexes
Maintenance: Breaks when table structure changes
Security: May expose sensitive columns

Covering Index (Advanced):

sql

-- Index includes all columns needed by query
CREATE INDEX idx_products_category_covering 
ON products(category, product_id, product_name, price);

-- Query can be satisfied entirely from index (no table access)
SELECT product_id, product_name, price 
FROM products 
WHERE category = 'Electronics';

Performance Impact:

Small tables: Minimal difference
Large tables with many columns: 2-5x faster
Tables with TEXT/BLOB: 10x+ faster

Solution 5: Optimize OR Conditions

sql

-- SLOW: OR prevents index usage (in some databases)
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';

-- FAST: Use IN (can use index)
SELECT * FROM products
WHERE category IN ('Electronics', 'Furniture');

-- ALTERNATIVE: UNION ALL (if indexes exist)
SELECT * FROM products WHERE category = 'Electronics'
UNION ALL
SELECT * FROM products WHERE category = 'Furniture';

Performance Comparison:

OR: May not use index efficiently
IN: Can use index scan
UNION ALL: Uses index twice (good if selective)

When to use each:

IN: Default choice (simple, efficient)
UNION ALL: When conditions are very selective (< 10% of rows each)
OR: Avoid if possible (unless database optimizes it well)

Index for this query:

sql

CREATE INDEX idx_products_category ON products(category);

Solution 6: Pagination Optimization

sql

-- SLOW: OFFSET on large offsets (skips rows but still processes them)
SELECT * FROM products
ORDER BY product_id
LIMIT 10 OFFSET 10000;  -- Processes 10,010 rows, returns 10

-- FAST: Keyset pagination (uses WHERE instead of OFFSET)
SELECT * FROM products
WHERE product_id > 10000  -- Last ID from previous page
ORDER BY product_id
LIMIT 10;  -- Processes only 10 rows

-- Implementation:
-- Page 1:
SELECT * FROM products ORDER BY product_id LIMIT 10;
-- Returns IDs 1-10, last_id = 10

-- Page 2:
SELECT * FROM products WHERE product_id > 10 ORDER BY product_id LIMIT 10;
-- Returns IDs 11-20, last_id = 20

-- Page 3:
SELECT * FROM products WHERE product_id > 20 ORDER BY product_id LIMIT 10;
-- And so on...

Performance Comparison (1M rows):

OFFSET 10000: ~500ms (processes 10,010 rows)
Keyset WHERE: ~5ms (processes 10 rows)
100x faster!

Keyset Pagination Requirements:

✅ Unique, sequential column (ID, timestamp)
✅ Index on pagination column
❌ Can't jump to arbitrary page number
❌ Can't sort by non-indexed column

When to use:

✅ Infinite scroll (mobile apps)
✅ API pagination with large datasets
❌ Traditional page numbers (1, 2, 3...)
❌ Small datasets (< 10,000 rows)

🚀 Follow-up Challenges

Composite Indexes: Create multi-column index for complex WHERE clause
Partial Indexes: Index only active products (WHERE deleted_at IS NULL)
Query Rewrite: Convert DISTINCT to GROUP BY for better performance
Materialized Views: Cache expensive aggregations

🔑 Key Takeaways

Query Optimization Checklist

1. Analyze First

[ ] Run EXPLAIN to see execution plan
[ ] Identify full table scans
[ ] Check for missing indexes
[ ] Look for expensive operations (sorts, temp tables)

2. Index Strategy

[ ] Index foreign keys (always!)
[ ] Index WHERE clause columns
[ ] Index JOIN columns
[ ] Index ORDER BY columns
[ ] Consider composite indexes for multi-column queries
[ ] Don't over-index (slows writes)

3. Query Rewriting

[ ] Replace correlated subqueries with JOINs
[ ] Use IN instead of OR
[ ] Avoid SELECT *
[ ] Use LIMIT when possible
[ ] Use EXISTS instead of COUNT(*) > 0
[ ] Use UNION ALL instead of UNION (if duplicates OK)

4. Schema Design

[ ] Normalize to reduce redundancy
[ ] Denormalize for read-heavy workloads
[ ] Partition large tables
[ ] Archive old data

Index Types and Use Cases

Index Type	Best For	Example
B-Tree (default)	Equality, range queries	`WHERE price > 100`
Hash	Equality only	`WHERE id = 123`
Partial	Subset of rows	`WHERE deleted_at IS NULL`
Composite	Multiple columns	`WHERE category = 'X' AND price > 100`
Covering	Include all SELECT columns	Avoid table access
Full-text	Text search	`WHERE description LIKE '%keyword%'`

Performance Anti-Patterns

❌ Avoid These:

**SELECT *** - Fetches unnecessary data
No indexes on foreign keys - Slow JOINs
Functions on indexed columns - WHERE YEAR(date) = 2024 (can't use index)
OR conditions - May prevent index usage
Correlated subqueries - Execute once per row
OFFSET on large offsets - Processes skipped rows
NOT IN with NULLs - Unexpected behavior
Implicit type conversion - WHERE varchar_col = 123 (can't use index)

EXPLAIN Output Interpretation

Good Signs:

✅ Index Scan / Index Seek
✅ Low row count estimates
✅ Low cost estimates
✅ Using covering index

Bad Signs:

❌ Seq Scan / Table Scan (on large tables)
❌ High row count estimates
❌ High cost estimates
❌ Temp tables for sorting
❌ Nested loops on large tables

📚 Advanced Techniques

Technique 1: Covering Index

sql

-- Query needs: category, product_name, price
CREATE INDEX idx_covering ON products(category, product_name, price);

-- Query satisfied entirely from index (no table access)
SELECT product_name, price FROM products WHERE category = 'Electronics';

Technique 2: Partial Index

sql

-- Index only active products (saves space)
CREATE INDEX idx_active_products ON products(category, price)
WHERE deleted_at IS NULL;

-- Query must match index condition
SELECT * FROM products 
WHERE category = 'Electronics' AND deleted_at IS NULL;

Technique 3: Index on Expression

sql

-- Index on computed column
CREATE INDEX idx_lower_name ON products(LOWER(product_name));

-- Query can use index
SELECT * FROM products WHERE LOWER(product_name) = 'laptop';

Technique 4: Query Hints (Database-Specific)

sql

-- Force index usage (MySQL)
SELECT * FROM products USE INDEX (idx_category) WHERE category = 'Electronics';

-- Force index usage (SQL Server)
SELECT * FROM products WITH (INDEX(idx_category)) WHERE category = 'Electronics';

⚠️ Common Pitfalls

⚠️ Cạm bẫy

Pitfall 1: Function on Indexed Column

sql

-- WRONG: Can't use index on date
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- CORRECT: Use range
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

⚠️ Cạm bẫy

Pitfall 2: Implicit Type Conversion

sql

-- WRONG: product_id is VARCHAR, comparing with INT (can't use index)
SELECT * FROM products WHERE product_id = 123;

-- CORRECT: Use correct type
SELECT * FROM products WHERE product_id = '123';

⚠️ Cạm bẫy

Pitfall 3: Too Many Indexes

sql

-- WRONG: 10 indexes on one table (slows INSERT/UPDATE)
CREATE INDEX idx1 ON products(col1);
CREATE INDEX idx2 ON products(col2);
-- ... 8 more indexes

-- CORRECT: Use composite indexes
CREATE INDEX idx_composite ON products(col1, col2, col3);

⚠️ Cạm bẫy

Pitfall 4: Wrong Index Column Order

sql

-- Query: WHERE category = 'X' AND price > 100
-- WRONG: Price first (less selective)
CREATE INDEX idx_wrong ON products(price, category);

-- CORRECT: Category first (more selective)
CREATE INDEX idx_correct ON products(category, price);

📊 Benchmarking

How to Benchmark Queries

sql

-- 1. Clear cache (database-specific)
-- PostgreSQL: SELECT pg_stat_reset();
-- MySQL: RESET QUERY CACHE;

-- 2. Run query multiple times
SELECT /* Query 1 */ * FROM products WHERE price > 100;
-- Note execution time

-- 3. Compare with optimized version
SELECT /* Query 2 */ product_id, product_name, price 
FROM products WHERE price > 100;
-- Note execution time

-- 4. Use EXPLAIN ANALYZE (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM products WHERE price > 100;
-- Shows actual execution time and row counts

Performance Metrics

Metric	Good	Acceptable	Bad
Query time	< 100ms	100-500ms	> 500ms
Rows scanned	< 1000	1000-10000	> 10000
Index usage	Yes	Partial	No
Temp tables	0	1	> 1

Exercise 8: Query Optimization 🚀 ​

🎯 Objective ​

📋 Problem Statement ​

📥 Tasks ​

Task 1: Analyze Query Performance ​

Task 2: Create Appropriate Indexes ​

Task 3: Optimize Subquery ​

Task 4: Avoid SELECT * ​

Task 5: Optimize OR Conditions ​

Task 6: Pagination Optimization ​

💡 Hints ​

✅ Solutions ​

Solution 1: Analyze Query Performance ​

Solution 2: Create Appropriate Indexes ​

Solution 3: Optimize Subquery ​

Solution 4: Avoid SELECT * ​

Solution 5: Optimize OR Conditions ​

Solution 6: Pagination Optimization ​

🚀 Follow-up Challenges ​

🔑 Key Takeaways ​

Query Optimization Checklist ​

Index Types and Use Cases ​

Performance Anti-Patterns ​

EXPLAIN Output Interpretation ​

📚 Advanced Techniques ​

Technique 1: Covering Index ​

Technique 2: Partial Index ​

Technique 3: Index on Expression ​

Technique 4: Query Hints (Database-Specific) ​

⚠️ Common Pitfalls ​

📊 Benchmarking ​

How to Benchmark Queries ​

Performance Metrics ​

Exercise 8: Query Optimization 🚀

🎯 Objective

📋 Problem Statement

📥 Tasks

Task 1: Analyze Query Performance

Task 2: Create Appropriate Indexes

Task 3: Optimize Subquery

Task 4: Avoid SELECT *

Task 5: Optimize OR Conditions

Task 6: Pagination Optimization

💡 Hints

✅ Solutions

Solution 1: Analyze Query Performance

Solution 2: Create Appropriate Indexes

Solution 3: Optimize Subquery

Solution 4: Avoid SELECT *

Solution 5: Optimize OR Conditions

Solution 6: Pagination Optimization

🚀 Follow-up Challenges

🔑 Key Takeaways

Query Optimization Checklist

Index Types and Use Cases

Performance Anti-Patterns

EXPLAIN Output Interpretation

📚 Advanced Techniques

Technique 1: Covering Index

Technique 2: Partial Index

Technique 3: Index on Expression

Technique 4: Query Hints (Database-Specific)

⚠️ Common Pitfalls

📊 Benchmarking

How to Benchmark Queries

Performance Metrics