Skip to content

Exercise 8: Query Optimization 🚀

🎯 Objective

Master query optimization techniques, understand execution plans, and use indexes effectively.

📋 Problem Statement

Slow queries can cripple application performance. Learn to identify bottlenecks and optimize queries systematically.

📥 Tasks

Task 1: Analyze Query Performance

Use EXPLAIN to analyze this query and identify performance issues:

sql
SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;

Task 2: Create Appropriate Indexes

Based on the query above, create indexes to improve performance.

Task 3: Optimize Subquery

Rewrite this slow subquery as a JOIN:

sql
SELECT product_name, price,
  (SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);

Task 4: Avoid SELECT *

Rewrite to select only needed columns and measure performance difference.

Task 5: Optimize OR Conditions

Rewrite this query to avoid slow OR conditions:

sql
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';

Task 6: Pagination Optimization

Implement efficient pagination for large result sets (avoid OFFSET on large offsets).

💡 Hints

Hint 1: EXPLAIN Output Key metrics to look for: - **Seq Scan**: Full table scan (slow for large tables) - **Index Scan**: Using index (fast) - **Rows**: Estimated rows processed - **Cost**: Query planner's cost estimate
Hint 2: Index Types - **B-Tree**: Default, good for equality and range queries - **Hash**: Good for equality only - **GiST/GIN**: For full-text search, arrays - **Partial**: Index subset of rows
Hint 3: Query Optimization Checklist 1. Add indexes on WHERE, JOIN, ORDER BY columns 2. Avoid SELECT * 3. Use LIMIT when possible 4. Replace subqueries with JOINs 5. Use UNION ALL instead of UNION (if duplicates OK) 6. Avoid functions on indexed columns in WHERE

Solutions

Click to reveal solutions

Solution 1: Analyze Query Performance

sql
EXPLAIN QUERY PLAN
SELECT p.product_name, s.supplier_name, COUNT(o.order_id) AS order_count
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE p.price > 100
GROUP BY p.product_id, p.product_name, s.supplier_name
ORDER BY order_count DESC;

Sample Output (SQLite):

QUERY PLAN
|--SCAN TABLE products AS p
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SCAN TABLE orders AS o
   `--USE TEMP B-TREE FOR GROUP BY
   `--USE TEMP B-TREE FOR ORDER BY

Analysis:

  • SCAN TABLE products: Full table scan (no index on price)
  • SEARCH suppliers: Using primary key (good)
  • SCAN TABLE orders: Full table scan (no index on product_id)
  • TEMP B-TREE: Temporary structures for GROUP BY and ORDER BY (expensive)

Performance Issues:

  1. No index on products.price → full table scan
  2. No index on orders.product_id → full table scan for JOIN
  3. GROUP BY and ORDER BY require temporary structures

Solution 2: Create Appropriate Indexes

sql
-- Index for WHERE clause
CREATE INDEX idx_products_price ON products(price);

-- Index for JOIN (foreign key)
CREATE INDEX idx_products_supplier_id ON products(supplier_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

-- Composite index for GROUP BY (optional, advanced)
CREATE INDEX idx_products_id_name ON products(product_id, product_name);

-- Verify indexes
.indexes products  -- SQLite
-- or
SHOW INDEXES FROM products;  -- MySQL

After Indexes - EXPLAIN Output:

QUERY PLAN
|--SEARCH TABLE products AS p USING INDEX idx_products_price (price>?)
|--SEARCH TABLE suppliers AS s USING INTEGER PRIMARY KEY (rowid=?)
`--SEARCH TABLE orders AS o USING INDEX idx_orders_product_id (product_id=?)

Performance Improvement:

  • ✅ Index scan instead of full table scan
  • ✅ Faster JOINs using indexes
  • ✅ Reduced temporary structure usage

Index Guidelines:

  • Index columns in WHERE, JOIN, ORDER BY, GROUP BY
  • Index foreign keys (always!)
  • Don't over-index (slows down INSERT/UPDATE)
  • Monitor index usage (remove unused indexes)

Solution 3: Optimize Subquery

sql
-- SLOW: Correlated subquery (executes once per row)
SELECT product_name, price,
  (SELECT supplier_name FROM suppliers WHERE supplier_id = products.supplier_id) AS supplier
FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- FAST: JOIN + CTE
WITH avg_price AS (
  SELECT AVG(price) AS avg_price FROM products
)
SELECT p.product_name, p.price, s.supplier_name
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
CROSS JOIN avg_price
WHERE p.price > avg_price.avg_price;

Performance Comparison:

  • Subquery version: O(n²) - subquery executes n times
  • JOIN version: O(n) - single pass through data

Benchmark (1000 products):

  • Subquery: ~500ms
  • JOIN: ~50ms
  • 10x faster!

When to use subquery vs JOIN:

  • ✅ JOIN: When you need data from related table
  • ✅ Subquery: When checking existence (EXISTS) or single value
  • ❌ Subquery: Avoid correlated subqueries in SELECT

Solution 4: Avoid SELECT *

sql
-- SLOW: Fetches all columns (including large TEXT/BLOB)
SELECT * FROM products WHERE category = 'Electronics';

-- FAST: Select only needed columns
SELECT product_id, product_name, price FROM products WHERE category = 'Electronics';

Why SELECT * is bad:

  1. Network overhead: Transfers unnecessary data
  2. Memory usage: Loads unused columns into memory
  3. Index coverage: Can't use covering indexes
  4. Maintenance: Breaks when table structure changes
  5. Security: May expose sensitive columns

Covering Index (Advanced):

sql
-- Index includes all columns needed by query
CREATE INDEX idx_products_category_covering 
ON products(category, product_id, product_name, price);

-- Query can be satisfied entirely from index (no table access)
SELECT product_id, product_name, price 
FROM products 
WHERE category = 'Electronics';

Performance Impact:

  • Small tables: Minimal difference
  • Large tables with many columns: 2-5x faster
  • Tables with TEXT/BLOB: 10x+ faster

Solution 5: Optimize OR Conditions

sql
-- SLOW: OR prevents index usage (in some databases)
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Furniture';

-- FAST: Use IN (can use index)
SELECT * FROM products
WHERE category IN ('Electronics', 'Furniture');

-- ALTERNATIVE: UNION ALL (if indexes exist)
SELECT * FROM products WHERE category = 'Electronics'
UNION ALL
SELECT * FROM products WHERE category = 'Furniture';

Performance Comparison:

  • OR: May not use index efficiently
  • IN: Can use index scan
  • UNION ALL: Uses index twice (good if selective)

When to use each:

  • IN: Default choice (simple, efficient)
  • UNION ALL: When conditions are very selective (< 10% of rows each)
  • OR: Avoid if possible (unless database optimizes it well)

Index for this query:

sql
CREATE INDEX idx_products_category ON products(category);

Solution 6: Pagination Optimization

sql
-- SLOW: OFFSET on large offsets (skips rows but still processes them)
SELECT * FROM products
ORDER BY product_id
LIMIT 10 OFFSET 10000;  -- Processes 10,010 rows, returns 10

-- FAST: Keyset pagination (uses WHERE instead of OFFSET)
SELECT * FROM products
WHERE product_id > 10000  -- Last ID from previous page
ORDER BY product_id
LIMIT 10;  -- Processes only 10 rows

-- Implementation:
-- Page 1:
SELECT * FROM products ORDER BY product_id LIMIT 10;
-- Returns IDs 1-10, last_id = 10

-- Page 2:
SELECT * FROM products WHERE product_id > 10 ORDER BY product_id LIMIT 10;
-- Returns IDs 11-20, last_id = 20

-- Page 3:
SELECT * FROM products WHERE product_id > 20 ORDER BY product_id LIMIT 10;
-- And so on...

Performance Comparison (1M rows):

  • OFFSET 10000: ~500ms (processes 10,010 rows)
  • Keyset WHERE: ~5ms (processes 10 rows)
  • 100x faster!

Keyset Pagination Requirements:

  • ✅ Unique, sequential column (ID, timestamp)
  • ✅ Index on pagination column
  • ❌ Can't jump to arbitrary page number
  • ❌ Can't sort by non-indexed column

When to use:

  • ✅ Infinite scroll (mobile apps)
  • ✅ API pagination with large datasets
  • ❌ Traditional page numbers (1, 2, 3...)
  • ❌ Small datasets (< 10,000 rows)

🚀 Follow-up Challenges

  1. Composite Indexes: Create multi-column index for complex WHERE clause
  2. Partial Indexes: Index only active products (WHERE deleted_at IS NULL)
  3. Query Rewrite: Convert DISTINCT to GROUP BY for better performance
  4. Materialized Views: Cache expensive aggregations

🔑 Key Takeaways

Query Optimization Checklist

1. Analyze First

  • [ ] Run EXPLAIN to see execution plan
  • [ ] Identify full table scans
  • [ ] Check for missing indexes
  • [ ] Look for expensive operations (sorts, temp tables)

2. Index Strategy

  • [ ] Index foreign keys (always!)
  • [ ] Index WHERE clause columns
  • [ ] Index JOIN columns
  • [ ] Index ORDER BY columns
  • [ ] Consider composite indexes for multi-column queries
  • [ ] Don't over-index (slows writes)

3. Query Rewriting

  • [ ] Replace correlated subqueries with JOINs
  • [ ] Use IN instead of OR
  • [ ] Avoid SELECT *
  • [ ] Use LIMIT when possible
  • [ ] Use EXISTS instead of COUNT(*) > 0
  • [ ] Use UNION ALL instead of UNION (if duplicates OK)

4. Schema Design

  • [ ] Normalize to reduce redundancy
  • [ ] Denormalize for read-heavy workloads
  • [ ] Partition large tables
  • [ ] Archive old data

Index Types and Use Cases

Index TypeBest ForExample
B-Tree (default)Equality, range queriesWHERE price > 100
HashEquality onlyWHERE id = 123
PartialSubset of rowsWHERE deleted_at IS NULL
CompositeMultiple columnsWHERE category = 'X' AND price > 100
CoveringInclude all SELECT columnsAvoid table access
Full-textText searchWHERE description LIKE '%keyword%'

Performance Anti-Patterns

Avoid These:

  1. **SELECT *** - Fetches unnecessary data
  2. No indexes on foreign keys - Slow JOINs
  3. Functions on indexed columns - WHERE YEAR(date) = 2024 (can't use index)
  4. OR conditions - May prevent index usage
  5. Correlated subqueries - Execute once per row
  6. OFFSET on large offsets - Processes skipped rows
  7. NOT IN with NULLs - Unexpected behavior
  8. Implicit type conversion - WHERE varchar_col = 123 (can't use index)

EXPLAIN Output Interpretation

Good Signs:

  • ✅ Index Scan / Index Seek
  • ✅ Low row count estimates
  • ✅ Low cost estimates
  • ✅ Using covering index

Bad Signs:

  • ❌ Seq Scan / Table Scan (on large tables)
  • ❌ High row count estimates
  • ❌ High cost estimates
  • ❌ Temp tables for sorting
  • ❌ Nested loops on large tables

📚 Advanced Techniques

Technique 1: Covering Index

sql
-- Query needs: category, product_name, price
CREATE INDEX idx_covering ON products(category, product_name, price);

-- Query satisfied entirely from index (no table access)
SELECT product_name, price FROM products WHERE category = 'Electronics';

Technique 2: Partial Index

sql
-- Index only active products (saves space)
CREATE INDEX idx_active_products ON products(category, price)
WHERE deleted_at IS NULL;

-- Query must match index condition
SELECT * FROM products 
WHERE category = 'Electronics' AND deleted_at IS NULL;

Technique 3: Index on Expression

sql
-- Index on computed column
CREATE INDEX idx_lower_name ON products(LOWER(product_name));

-- Query can use index
SELECT * FROM products WHERE LOWER(product_name) = 'laptop';

Technique 4: Query Hints (Database-Specific)

sql
-- Force index usage (MySQL)
SELECT * FROM products USE INDEX (idx_category) WHERE category = 'Electronics';

-- Force index usage (SQL Server)
SELECT * FROM products WITH (INDEX(idx_category)) WHERE category = 'Electronics';

⚠️ Common Pitfalls

⚠️ Cạm bẫy

Pitfall 1: Function on Indexed Column

sql
-- WRONG: Can't use index on date
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- CORRECT: Use range
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

⚠️ Cạm bẫy

Pitfall 2: Implicit Type Conversion

sql
-- WRONG: product_id is VARCHAR, comparing with INT (can't use index)
SELECT * FROM products WHERE product_id = 123;

-- CORRECT: Use correct type
SELECT * FROM products WHERE product_id = '123';

⚠️ Cạm bẫy

Pitfall 3: Too Many Indexes

sql
-- WRONG: 10 indexes on one table (slows INSERT/UPDATE)
CREATE INDEX idx1 ON products(col1);
CREATE INDEX idx2 ON products(col2);
-- ... 8 more indexes

-- CORRECT: Use composite indexes
CREATE INDEX idx_composite ON products(col1, col2, col3);

⚠️ Cạm bẫy

Pitfall 4: Wrong Index Column Order

sql
-- Query: WHERE category = 'X' AND price > 100
-- WRONG: Price first (less selective)
CREATE INDEX idx_wrong ON products(price, category);

-- CORRECT: Category first (more selective)
CREATE INDEX idx_correct ON products(category, price);

📊 Benchmarking

How to Benchmark Queries

sql
-- 1. Clear cache (database-specific)
-- PostgreSQL: SELECT pg_stat_reset();
-- MySQL: RESET QUERY CACHE;

-- 2. Run query multiple times
SELECT /* Query 1 */ * FROM products WHERE price > 100;
-- Note execution time

-- 3. Compare with optimized version
SELECT /* Query 2 */ product_id, product_name, price 
FROM products WHERE price > 100;
-- Note execution time

-- 4. Use EXPLAIN ANALYZE (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM products WHERE price > 100;
-- Shows actual execution time and row counts

Performance Metrics

MetricGoodAcceptableBad
Query time< 100ms100-500ms> 500ms
Rows scanned< 10001000-10000> 10000
Index usageYesPartialNo
Temp tables01> 1