Skip to content

Exercise 4: Subqueries 🎭

🎯 Objective

Master subqueries (nested queries) for complex data retrieval and filtering.

📋 Problem Statement

Using the same tables (products, orders, suppliers), learn to write queries within queries.

📥 Tasks

Task 1: Products Above Average Price

Find all products with price higher than the average price of all products.

Task 2: Suppliers with High-Value Products

Find suppliers who have at least one product priced above $200. Use a subquery with IN.

Task 3: Products Ordered by Specific Customer

Find all products that customer 501 has ordered. Use EXISTS.

Task 4: Most Expensive Product per Category

For each category, find the product(s) with the highest price. Use a correlated subquery.

Task 5: Customers Who Ordered Electronics

Find customer_ids who have ordered at least one Electronics product.

Task 6: Suppliers Without Recent Orders

Find suppliers whose products haven't been ordered in 2024. Use NOT EXISTS.

💡 Hints

Hint 1: Subquery Types - **Scalar subquery**: Returns single value (used with =, <, >) - **Row subquery**: Returns single row - **Column subquery**: Returns single column (used with IN, ANY, ALL) - **Table subquery**: Returns multiple rows and columns (used in FROM)
Hint 2: IN vs EXISTS - `IN`: Good for small result sets, checks if value is in list - `EXISTS`: Better for large result sets, checks if subquery returns any rows - `EXISTS` stops at first match (more efficient)
Hint 3: Correlated Subquery References column from outer query. Executes once per outer row. ```sql SELECT * FROM table1 t1 WHERE column > (SELECT AVG(column) FROM table1 t2 WHERE t2.category = t1.category); ```

Solutions

Click to reveal solutions

Solution 1: Products Above Average Price

sql
SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
ORDER BY price DESC;

Result:

product_name          | price
----------------------|--------
Laptop Pro 15         | 1299.99
Standing Desk         | 499.99
Monitor 27"           | 349.99
Office Chair          | 199.99

Explanation:

  1. Inner query calculates average price: ~$316
  2. Outer query filters products above this average
  3. Subquery executes once (scalar subquery)

Performance: Efficient because subquery returns single value.


Solution 2: Suppliers with High-Value Products

sql
SELECT supplier_name, country
FROM suppliers
WHERE supplier_id IN (
  SELECT DISTINCT supplier_id
  FROM products
  WHERE price > 200
)
ORDER BY supplier_name;

Result:

supplier_name    | country
-----------------|--------
FurniturePro     | Germany
TechCorp         | USA

Explanation: Subquery returns list of supplier_ids with expensive products, outer query filters suppliers.

Alternative (using EXISTS - often faster):

sql
SELECT s.supplier_name, s.country
FROM suppliers s
WHERE EXISTS (
  SELECT 1
  FROM products p
  WHERE p.supplier_id = s.supplier_id AND p.price > 200
);

Performance Comparison:

  • IN: Subquery executes once, returns list
  • EXISTS: Stops at first match per supplier (better for large datasets)

Solution 3: Products Ordered by Specific Customer

sql
SELECT DISTINCT p.product_name, p.price
FROM products p
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.product_id = p.product_id AND o.customer_id = 501
)
ORDER BY p.product_name;

Result:

product_name          | price
----------------------|--------
Laptop Pro 15         | 1299.99
Office Chair          | 199.99
Standing Desk         | 499.99

Explanation: EXISTS checks if there's at least one order by customer 501 for each product.

Alternative (using IN):

sql
SELECT DISTINCT product_name, price
FROM products
WHERE product_id IN (
  SELECT product_id FROM orders WHERE customer_id = 501
);

Why EXISTS here? More semantic - we're checking existence, not matching values.


Solution 4: Most Expensive Product per Category

sql
SELECT p1.category, p1.product_name, p1.price
FROM products p1
WHERE p1.price = (
  SELECT MAX(p2.price)
  FROM products p2
  WHERE p2.category = p1.category
)
ORDER BY p1.category, p1.product_name;

Result:

category     | product_name          | price
-------------|-----------------------|--------
Electronics  | Laptop Pro 15         | 1299.99
Furniture    | Standing Desk         | 499.99

Explanation:

  • Correlated subquery: Inner query references outer query's category
  • For each product, finds max price in its category
  • Keeps products matching that max price

Performance Warning: Correlated subqueries can be slow on large datasets (executes once per row).

Alternative (using window functions - more efficient):

sql
WITH ranked AS (
  SELECT 
    category, 
    product_name, 
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) as rank
  FROM products
)
SELECT category, product_name, price
FROM ranked
WHERE rank = 1;

Solution 5: Customers Who Ordered Electronics

sql
SELECT DISTINCT o.customer_id
FROM orders o
WHERE o.product_id IN (
  SELECT product_id
  FROM products
  WHERE category = 'Electronics'
)
ORDER BY o.customer_id;

Result:

customer_id
-----------
501
502
503
504
506

Explanation: Subquery finds all Electronics product_ids, outer query finds customers who ordered them.

Alternative (using JOIN - often clearer):

sql
SELECT DISTINCT o.customer_id
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Electronics'
ORDER BY o.customer_id;

When to use subquery vs JOIN?

  • Subquery: When you only need to filter, not retrieve data from inner table
  • JOIN: When you need columns from both tables

Solution 6: Suppliers Without Recent Orders

sql
SELECT s.supplier_name, s.country
FROM suppliers s
WHERE NOT EXISTS (
  SELECT 1
  FROM products p
  JOIN orders o ON p.product_id = o.product_id
  WHERE p.supplier_id = s.supplier_id
    AND o.order_date >= '2024-01-01'
)
ORDER BY s.supplier_name;

Result:

supplier_name    | country
-----------------|--------
GlobalParts      | Japan

Explanation:

  • NOT EXISTS checks for absence of orders in 2024
  • Subquery joins products and orders for each supplier
  • GlobalParts has no products, so no orders

Alternative (using LEFT JOIN + NULL check):

sql
SELECT DISTINCT s.supplier_name, s.country
FROM suppliers s
LEFT JOIN products p ON s.supplier_id = p.supplier_id
LEFT JOIN orders o ON p.product_id = o.product_id AND o.order_date >= '2024-01-01'
WHERE o.order_id IS NULL;

🚀 Follow-up Challenges

  1. Nested Subqueries: Find products more expensive than the average price of products from the same supplier
  2. Multiple Subqueries: Find customers who ordered both Electronics AND Furniture
  3. Subquery in SELECT: Show each product with its price and the category average price
  4. ANY/ALL Operators: Find products more expensive than ALL products in Furniture category

🔑 Key Takeaways

Subquery Types

TypeReturnsExample Use
ScalarSingle valueWHERE price > (SELECT AVG(price)...)
ColumnSingle columnWHERE id IN (SELECT id...)
RowSingle rowWHERE (col1, col2) = (SELECT col1, col2...)
TableMultiple rows/colsFROM (SELECT...) AS subquery

IN vs EXISTS

sql
-- IN: Subquery returns list of values
WHERE column IN (SELECT column FROM table WHERE condition)

-- EXISTS: Checks if subquery returns any rows
WHERE EXISTS (SELECT 1 FROM table WHERE condition)

Performance:

  • IN: Better for small result sets
  • EXISTS: Better for large result sets (stops at first match)
  • NOT EXISTS: Often faster than NOT IN (handles NULLs better)

Correlated vs Non-Correlated

Non-Correlated (Independent):

sql
-- Executes once
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);

Correlated (Dependent):

sql
-- Executes once per outer row
SELECT * FROM products p1
WHERE price > (SELECT AVG(price) FROM products p2 WHERE p2.category = p1.category);

Best Practices

  • Prefer JOINs when you need data from multiple tables
  • Use EXISTS for existence checks (more semantic)
  • Avoid correlated subqueries on large datasets (use window functions instead)
  • Use CTEs for complex subqueries (better readability)
  • Test performance - subqueries vs JOINs can vary by database

📚 Advanced Patterns

Pattern 1: Subquery in SELECT (Scalar Subquery)

sql
SELECT 
  product_name,
  price,
  (SELECT AVG(price) FROM products) AS avg_price,
  price - (SELECT AVG(price) FROM products) AS price_diff
FROM products;

Pattern 2: Subquery in FROM (Derived Table)

sql
SELECT category, avg_price
FROM (
  SELECT category, AVG(price) AS avg_price
  FROM products
  GROUP BY category
) AS category_stats
WHERE avg_price > 100;

Pattern 3: Multiple Subqueries

sql
SELECT product_name
FROM products
WHERE price > (SELECT AVG(price) FROM products)
  AND supplier_id IN (SELECT supplier_id FROM suppliers WHERE rating > 4.5);

Pattern 4: ANY/ALL Operators

sql
-- More expensive than ANY Furniture product
SELECT product_name, price
FROM products
WHERE price > ANY (SELECT price FROM products WHERE category = 'Furniture');

-- More expensive than ALL Furniture products
SELECT product_name, price
FROM products
WHERE price > ALL (SELECT price FROM products WHERE category = 'Furniture');

⚠️ Common Pitfalls

⚠️ Cạm bẫy

Pitfall 1: NULL in NOT IN

sql
-- WRONG: Returns no rows if subquery contains NULL
SELECT * FROM products
WHERE supplier_id NOT IN (SELECT supplier_id FROM suppliers WHERE rating < 4.0);

-- CORRECT: Use NOT EXISTS or filter NULLs
SELECT * FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM suppliers s 
  WHERE s.supplier_id = p.supplier_id AND s.rating < 4.0
);

⚠️ Cạm bẫy

Pitfall 2: Correlated Subquery Performance

sql
-- SLOW: Executes subquery for each row
SELECT product_name,
  (SELECT supplier_name FROM suppliers s WHERE s.supplier_id = p.supplier_id)
FROM products p;

-- FAST: Use JOIN instead
SELECT p.product_name, s.supplier_name
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id;

⚠️ Cạm bẫy

Pitfall 3: Forgetting DISTINCT

sql
-- May return duplicates
SELECT customer_id FROM orders WHERE product_id IN (...);

-- Better
SELECT DISTINCT customer_id FROM orders WHERE product_id IN (...);