Giao diện
Exercise 8: Query Optimization 🚀
🎯 Objective
Master query optimization techniques, understand execution plans, and use indexes effectively.
📋 Problem Statement
Slow queries can cripple application performance. Learn to identify bottlenecks and optimize queries systematically.
📥 Tasks
Task 1: Analyze Query Performance
Use EXPLAIN to analyze this query and identify performance issues:
sql
SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;Task 2: Create Appropriate Indexes
Based on the query above, create indexes to improve performance.
Task 3: Optimize Subquery
Rewrite this slow subquery as a JOIN:
sql
SELECT product_name, price,
(SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);Task 4: Avoid SELECT *
Rewrite to select only needed columns and measure performance difference.
Task 5: Optimize OR Conditions
Rewrite this query to avoid slow OR conditions:
sql
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';Task 6: Pagination Optimization
Implement efficient pagination for large result sets (avoid OFFSET on large offsets).
💡 Hints
Hint 1: EXPLAIN Output
Key metrics to look for: - **Seq Scan**: Full table scan (slow for large tables) - **Index Scan**: Using index (fast) - **Rows**: Estimated rows processed - **Cost**: Query planner's cost estimateHint 2: Index Types
- **B-Tree**: Default, good for equality and range queries - **Hash**: Good for equality only - **GiST/GIN**: For full-text search, arrays - **Partial**: Index subset of rowsHint 3: Query Optimization Checklist
1. Add indexes on WHERE, JOIN, ORDER BY columns 2. Avoid SELECT * 3. Use LIMIT when possible 4. Replace subqueries with JOINs 5. Use UNION ALL instead of UNION (if duplicates OK) 6. Avoid functions on indexed columns in WHERE✅ Solutions
Click to reveal solutions
Solution 1: Analyze Query Performance
sql
EXPLAIN QUERY PLAN
SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;Sample Output (SQLite):
QUERY PLAN
|--SCAN TABLE products AS p
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SCAN TABLE orders AS o
`--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BYAnalysis:
- ❌ SCAN TABLE products: Full table scan (no index on price)
- ✅ SEARCH suppliers: Using primary key (good)
- ❌ SCAN TABLE orders: Full table scan (no index on product_id)
- ❌ TEMP B-TREE: Temporary structures for GROUP BY and ORDER BY (expensive)
Performance Issues:
- No index on
products.price→ full table scan - No index on
orders.product_id→ full table scan for JOIN - GROUP BY and ORDER BY require temporary structures
Solution 2: Create Appropriate Indexes
sql
-- Index for WHERE clause
CREATE INDEX idx_products_price ON products(price);
-- Index for JOIN (foreign key)
CREATE INDEX idx_products_supplier_id ON products(supplier_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);
-- Composite index for GROUP BY (optional, advanced)
CREATE INDEX idx_products_id_name ON products(product_id, product_name);
-- Verify indexes
.indexes products -- SQLite
-- or
SHOW INDEXES FROM products; -- MySQLAfter Indexes - EXPLAIN Output:
QUERY PLAN
|--SEARCH TABLE products AS p USING INDEX idx_products_price (price>?)
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SEARCH TABLE orders AS o USING INDEX idx_orders_product_id (product_id=?)Performance Improvement:
- ✅ Index scan instead of full table scan
- ✅ Faster JOINs using indexes
- ✅ Reduced temporary structure usage
Index Guidelines:
- Index columns in WHERE, JOIN, ORDER BY, GROUP BY
- Index foreign keys (always!)
- Don't over-index (slows down INSERT/UPDATE)
- Monitor index usage (remove unused indexes)
Solution 3: Optimize Subquery
sql
-- SLOW: Correlated subquery (executes once per row)
SELECT product_name, price,
(SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);
-- FAST: JOIN + CTE
WITH avg_price AS (
SELECT AVG(price) AS avg_price FROM products
)
SELECT p.product_name, p.price, s.supplier_name
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
CROSS JOIN avg_price
WHERE p.price > avg_price.avg_price;Performance Comparison:
- Subquery version: O(n²) - subquery executes n times
- JOIN version: O(n) - single pass through data
Benchmark (1000 products):
- Subquery: ~500ms
- JOIN: ~50ms
- 10x faster!
When to use subquery vs JOIN:
- ✅ JOIN: When you need data from related table
- ✅ Subquery: When checking existence (EXISTS) or single value
- ❌ Subquery: Avoid correlated subqueries in SELECT
Solution 4: Avoid SELECT *
sql
-- SLOW: Fetches all columns (including large TEXT/BLOB)
SELECT * FROM products WHERE category = 'Electronics';
-- FAST: Select only needed columns
SELECT product_id, product_name, price FROM products WHERE category = 'Electronics';Why SELECT * is bad:
- Network overhead: Transfers unnecessary data
- Memory usage: Loads unused columns into memory
- Index coverage: Can't use covering indexes
- Maintenance: Breaks when table structure changes
- Security: May expose sensitive columns
Covering Index (Advanced):
sql
-- Index includes all columns needed by query
CREATE INDEX idx_products_category_covering
ON products(category, product_id, product_name, price);
-- Query can be satisfied entirely from index (no table access)
SELECT product_id, product_name, price
FROM products
WHERE category = 'Electronics';Performance Impact:
- Small tables: Minimal difference
- Large tables with many columns: 2-5x faster
- Tables with TEXT/BLOB: 10x+ faster
Solution 5: Optimize OR Conditions
sql
-- SLOW: OR prevents index usage (in some databases)
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';
-- FAST: Use IN (can use index)
SELECT * FROM products
WHERE category IN ('Electronics', 'Furniture');
-- ALTERNATIVE: UNION ALL (if indexes exist)
SELECT * FROM products WHERE category = 'Electronics'
UNION ALL
SELECT * FROM products WHERE category = 'Furniture';Performance Comparison:
- OR: May not use index efficiently
- IN: Can use index scan
- UNION ALL: Uses index twice (good if selective)
When to use each:
- IN: Default choice (simple, efficient)
- UNION ALL: When conditions are very selective (< 10% of rows each)
- OR: Avoid if possible (unless database optimizes it well)
Index for this query:
sql
CREATE INDEX idx_products_category ON products(category);Solution 6: Pagination Optimization
sql
-- SLOW: OFFSET on large offsets (skips rows but still processes them)
SELECT * FROM products
ORDER BY product_id
LIMIT 10 OFFSET 10000; -- Processes 10,010 rows, returns 10
-- FAST: Keyset pagination (uses WHERE instead of OFFSET)
SELECT * FROM products
WHERE product_id > 10000 -- Last ID from previous page
ORDER BY product_id
LIMIT 10; -- Processes only 10 rows
-- Implementation:
-- Page 1:
SELECT * FROM products ORDER BY product_id LIMIT 10;
-- Returns IDs 1-10, last_id = 10
-- Page 2:
SELECT * FROM products WHERE product_id > 10 ORDER BY product_id LIMIT 10;
-- Returns IDs 11-20, last_id = 20
-- Page 3:
SELECT * FROM products WHERE product_id > 20 ORDER BY product_id LIMIT 10;
-- And so on...Performance Comparison (1M rows):
- OFFSET 10000: ~500ms (processes 10,010 rows)
- Keyset WHERE: ~5ms (processes 10 rows)
- 100x faster!
Keyset Pagination Requirements:
- ✅ Unique, sequential column (ID, timestamp)
- ✅ Index on pagination column
- ❌ Can't jump to arbitrary page number
- ❌ Can't sort by non-indexed column
When to use:
- ✅ Infinite scroll (mobile apps)
- ✅ API pagination with large datasets
- ❌ Traditional page numbers (1, 2, 3...)
- ❌ Small datasets (< 10,000 rows)
🚀 Follow-up Challenges
- Composite Indexes: Create multi-column index for complex WHERE clause
- Partial Indexes: Index only active products (WHERE deleted_at IS NULL)
- Query Rewrite: Convert DISTINCT to GROUP BY for better performance
- Materialized Views: Cache expensive aggregations
🔑 Key Takeaways
Query Optimization Checklist
1. Analyze First
- [ ] Run EXPLAIN to see execution plan
- [ ] Identify full table scans
- [ ] Check for missing indexes
- [ ] Look for expensive operations (sorts, temp tables)
2. Index Strategy
- [ ] Index foreign keys (always!)
- [ ] Index WHERE clause columns
- [ ] Index JOIN columns
- [ ] Index ORDER BY columns
- [ ] Consider composite indexes for multi-column queries
- [ ] Don't over-index (slows writes)
3. Query Rewriting
- [ ] Replace correlated subqueries with JOINs
- [ ] Use IN instead of OR
- [ ] Avoid SELECT *
- [ ] Use LIMIT when possible
- [ ] Use EXISTS instead of COUNT(*) > 0
- [ ] Use UNION ALL instead of UNION (if duplicates OK)
4. Schema Design
- [ ] Normalize to reduce redundancy
- [ ] Denormalize for read-heavy workloads
- [ ] Partition large tables
- [ ] Archive old data
Index Types and Use Cases
| Index Type | Best For | Example |
|---|---|---|
| B-Tree (default) | Equality, range queries | WHERE price > 100 |
| Hash | Equality only | WHERE id = 123 |
| Partial | Subset of rows | WHERE deleted_at IS NULL |
| Composite | Multiple columns | WHERE category = 'X' AND price > 100 |
| Covering | Include all SELECT columns | Avoid table access |
| Full-text | Text search | WHERE description LIKE '%keyword%' |
Performance Anti-Patterns
❌ Avoid These:
- **SELECT *** - Fetches unnecessary data
- No indexes on foreign keys - Slow JOINs
- Functions on indexed columns -
WHERE YEAR(date) = 2024(can't use index) - OR conditions - May prevent index usage
- Correlated subqueries - Execute once per row
- OFFSET on large offsets - Processes skipped rows
- NOT IN with NULLs - Unexpected behavior
- Implicit type conversion -
WHERE varchar_col = 123(can't use index)
EXPLAIN Output Interpretation
Good Signs:
- ✅ Index Scan / Index Seek
- ✅ Low row count estimates
- ✅ Low cost estimates
- ✅ Using covering index
Bad Signs:
- ❌ Seq Scan / Table Scan (on large tables)
- ❌ High row count estimates
- ❌ High cost estimates
- ❌ Temp tables for sorting
- ❌ Nested loops on large tables
📚 Advanced Techniques
Technique 1: Covering Index
sql
-- Query needs: category, product_name, price
CREATE INDEX idx_covering ON products(category, product_name, price);
-- Query satisfied entirely from index (no table access)
SELECT product_name, price FROM products WHERE category = 'Electronics';Technique 2: Partial Index
sql
-- Index only active products (saves space)
CREATE INDEX idx_active_products ON products(category, price)
WHERE deleted_at IS NULL;
-- Query must match index condition
SELECT * FROM products
WHERE category = 'Electronics' AND deleted_at IS NULL;Technique 3: Index on Expression
sql
-- Index on computed column
CREATE INDEX idx_lower_name ON products(LOWER(product_name));
-- Query can use index
SELECT * FROM products WHERE LOWER(product_name) = 'laptop';Technique 4: Query Hints (Database-Specific)
sql
-- Force index usage (MySQL)
SELECT * FROM products USE INDEX (idx_category) WHERE category = 'Electronics';
-- Force index usage (SQL Server)
SELECT * FROM products WITH (INDEX(idx_category)) WHERE category = 'Electronics';⚠️ Common Pitfalls
⚠️ Cạm bẫy
Pitfall 1: Function on Indexed Column
sql
-- WRONG: Can't use index on date
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- CORRECT: Use range
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';⚠️ Cạm bẫy
Pitfall 2: Implicit Type Conversion
sql
-- WRONG: product_id is VARCHAR, comparing with INT (can't use index)
SELECT * FROM products WHERE product_id = 123;
-- CORRECT: Use correct type
SELECT * FROM products WHERE product_id = '123';⚠️ Cạm bẫy
Pitfall 3: Too Many Indexes
sql
-- WRONG: 10 indexes on one table (slows INSERT/UPDATE)
CREATE INDEX idx1 ON products(col1);
CREATE INDEX idx2 ON products(col2);
-- ... 8 more indexes
-- CORRECT: Use composite indexes
CREATE INDEX idx_composite ON products(col1, col2, col3);⚠️ Cạm bẫy
Pitfall 4: Wrong Index Column Order
sql
-- Query: WHERE category = 'X' AND price > 100
-- WRONG: Price first (less selective)
CREATE INDEX idx_wrong ON products(price, category);
-- CORRECT: Category first (more selective)
CREATE INDEX idx_correct ON products(category, price);📊 Benchmarking
How to Benchmark Queries
sql
-- 1. Clear cache (database-specific)
-- PostgreSQL: SELECT pg_stat_reset();
-- MySQL: RESET QUERY CACHE;
-- 2. Run query multiple times
SELECT /* Query 1 */ * FROM products WHERE price > 100;
-- Note execution time
-- 3. Compare with optimized version
SELECT /* Query 2 */ product_id, product_name, price
FROM products WHERE price > 100;
-- Note execution time
-- 4. Use EXPLAIN ANALYZE (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM products WHERE price > 100;
-- Shows actual execution time and row countsPerformance Metrics
| Metric | Good | Acceptable | Bad |
|---|---|---|---|
| Query time | < 100ms | 100-500ms | > 500ms |
| Rows scanned | < 1000 | 1000-10000 | > 10000 |
| Index usage | Yes | Partial | No |
| Temp tables | 0 | 1 | > 1 |