Giao diện
📊 OLAP/Dimensional Modeling
Level: Core Solves: Thiết kế data models tối ưu cho Online Analytical Processing và data warehousing
Dimensional Modeling Fundamentals
💡 Giáo sư Tom
Dimensional modeling là nghệ thuật của việc denormalize có chủ đích. Bạn trade storage và write complexity để lấy query performance và business user friendliness. Ralph Kimball đã đúng - business users nghĩ theo dimensions và measures.
Star Schema Architecture
┌─────────────────────────────────────────────────────────────────┐
│ STAR SCHEMA │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ dim_product │ │
│ │─────────────│ │
│ │ product_key │ │
│ │ product_id │ │
│ │ name │ │
│ │ category │ │
│ │ brand │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────┐ ┌─────┴─────┐ ┌─────────────┐ │
│ │ dim_customer│ │fact_sales │ │ dim_date │ │
│ │─────────────│ │───────────│ │─────────────│ │
│ │customer_key │◄───│date_key │───►│ date_key │ │
│ │ customer_id │ │customer_key│ │ full_date │ │
│ │ name │ │product_key │ │ year │ │
│ │ segment │ │store_key │ │ quarter │ │
│ │ region │ │───────────│ │ month │ │
│ └─────────────┘ │quantity │ │ week │ │
│ │unit_price │ │ day_of_week │ │
│ ┌─────────────┐ │total_amount│ └─────────────┘ │
│ │ dim_store │ │discount │ │
│ │─────────────│ └───────────┘ │
│ │ store_key │◄────────┘ │
│ │ store_id │ │
│ │ store_name │ │
│ │ city │ │
│ │ country │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Fact Tables
Fact Table Types
| Type | Description | Example |
|---|---|---|
| Transaction | One row per event | fact_sales, fact_orders |
| Periodic Snapshot | One row per period | fact_daily_inventory |
| Accumulating Snapshot | One row per lifecycle | fact_order_fulfillment |
| Factless | Events without measures | fact_student_attendance |
Grain Definition
⚠️ Grain is Everything
Grain là quyết định quan trọng nhất trong dimensional modeling. Sai grain = sai mọi thứ. Luôn declare grain trước khi design.
┌─────────────────────────────────────────────────────────────────┐
│ GRAIN EXAMPLES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ FACT TABLE GRAIN STATEMENT │
│ ────────── ─────────────── │
│ │
│ fact_sales One row per line item per transaction │
│ │
│ fact_daily_inventory One row per product per store per day │
│ │
│ fact_order_fulfillment One row per order (tracks lifecycle) │
│ │
│ fact_page_views One row per page view event │
│ │
│ │
│ GRAIN DETERMINES: │
│ • Which dimensions can be attached │
│ • What measures make sense │
│ • How to aggregate correctly │
│ │
└─────────────────────────────────────────────────────────────────┘Fact Table Design
sql
-- Transaction Fact Table
CREATE TABLE fact_sales (
-- Surrogate keys to dimensions
date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
customer_key INTEGER NOT NULL REFERENCES dim_customer(customer_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
store_key INTEGER NOT NULL REFERENCES dim_store(store_key),
-- Degenerate dimension (no separate dim table)
transaction_id VARCHAR(50) NOT NULL,
-- Measures (facts)
quantity INTEGER NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL,
discount_amount DECIMAL(10, 2) DEFAULT 0,
total_amount DECIMAL(12, 2) NOT NULL,
-- Composite primary key
PRIMARY KEY (date_key, customer_key, product_key, store_key, transaction_id)
);
-- Periodic Snapshot Fact Table
CREATE TABLE fact_daily_inventory (
date_key INTEGER NOT NULL REFERENCES dim_date(date_key),
product_key INTEGER NOT NULL REFERENCES dim_product(product_key),
store_key INTEGER NOT NULL REFERENCES dim_store(store_key),
-- Snapshot measures
quantity_on_hand INTEGER NOT NULL,
quantity_on_order INTEGER NOT NULL,
reorder_point INTEGER NOT NULL,
PRIMARY KEY (date_key, product_key, store_key)
);Dimension Tables
Dimension Design Principles
sql
-- Well-designed dimension table
CREATE TABLE dim_customer (
-- Surrogate key (warehouse-generated)
customer_key SERIAL PRIMARY KEY,
-- Natural key (from source system)
customer_id VARCHAR(50) NOT NULL,
-- Descriptive attributes
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(255),
phone VARCHAR(50),
-- Hierarchies
city VARCHAR(100),
state VARCHAR(100),
country VARCHAR(100),
region VARCHAR(50),
-- Derived attributes
customer_segment VARCHAR(50), -- Gold, Silver, Bronze
lifetime_value_tier VARCHAR(20),
-- SCD metadata
effective_date DATE NOT NULL,
expiration_date DATE,
is_current BOOLEAN DEFAULT TRUE,
-- Audit
source_system VARCHAR(50),
load_timestamp TIMESTAMP DEFAULT NOW()
);Surrogate Keys vs Natural Keys
┌─────────────────────────────────────────────────────────────────┐
│ SURROGATE vs NATURAL KEYS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SURROGATE KEY (Recommended) NATURAL KEY │
│ ──────────────────────── ─────────── │
│ │
│ ✅ Integer, compact ❌ Often VARCHAR, larger │
│ ✅ Stable (never changes) ❌ May change in source │
│ ✅ Enables SCD tracking ❌ Hard to track history │
│ ✅ Source-system independent ❌ Tied to source system │
│ ✅ Better JOIN performance ❌ Slower JOINs │
│ │
│ PATTERN: │
│ • Surrogate key: customer_key (INTEGER, auto-generated) │
│ • Natural key: customer_id (VARCHAR, from source) │
│ • Both stored in dimension table │
│ │
└─────────────────────────────────────────────────────────────────┘Slowly Changing Dimensions (SCD)
SCD Types Overview
| Type | Strategy | History | Use Case |
|---|---|---|---|
| Type 0 | Retain original | No changes | Fixed attributes |
| Type 1 | Overwrite | No history | Corrections only |
| Type 2 | Add new row | Full history | Track changes |
| Type 3 | Add column | Limited history | Previous value only |
| Type 4 | Mini-dimension | Separate table | Rapidly changing |
| Type 6 | Hybrid (1+2+3) | Full + current | Best of all worlds |
SCD Type 2 Implementation
sql
-- SCD Type 2: Full history tracking
CREATE TABLE dim_customer_scd2 (
customer_key SERIAL PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL, -- Natural key
-- Attributes that may change
name VARCHAR(200),
email VARCHAR(255),
address VARCHAR(500),
segment VARCHAR(50),
-- SCD Type 2 metadata
effective_date DATE NOT NULL,
expiration_date DATE DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE,
version INTEGER DEFAULT 1
);
-- Example data showing history
-- customer_key | customer_id | name | segment | effective_date | expiration_date | is_current
-- 1 | C001 | Alice | Bronze | 2023-01-01 | 2023-06-30 | FALSE
-- 2 | C001 | Alice | Silver | 2023-07-01 | 2024-01-31 | FALSE
-- 3 | C001 | Alice | Gold | 2024-02-01 | 9999-12-31 | TRUESCD Type 2 ETL Pattern
sql
-- Step 1: Expire existing current records that have changes
UPDATE dim_customer_scd2 d
SET
expiration_date = CURRENT_DATE - 1,
is_current = FALSE
FROM staging_customers s
WHERE d.customer_id = s.customer_id
AND d.is_current = TRUE
AND (d.name != s.name OR d.segment != s.segment);
-- Step 2: Insert new versions for changed records
INSERT INTO dim_customer_scd2 (customer_id, name, email, segment, effective_date)
SELECT
s.customer_id,
s.name,
s.email,
s.segment,
CURRENT_DATE
FROM staging_customers s
JOIN dim_customer_scd2 d ON s.customer_id = d.customer_id
WHERE d.expiration_date = CURRENT_DATE - 1;
-- Step 3: Insert completely new customers
INSERT INTO dim_customer_scd2 (customer_id, name, email, segment, effective_date)
SELECT
s.customer_id,
s.name,
s.email,
s.segment,
CURRENT_DATE
FROM staging_customers s
WHERE NOT EXISTS (
SELECT 1 FROM dim_customer_scd2 d WHERE d.customer_id = s.customer_id
);Advanced Patterns
Snowflake Schema
┌─────────────────────────────────────────────────────────────────┐
│ SNOWFLAKE SCHEMA │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌─────────────┐ ┌───────────┐ │
│ │dim_brand │◄───│ dim_product │◄───│fact_sales │ │
│ └──────────┘ └──────┬──────┘ └───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │dim_category │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │dim_subcategory│ │
│ └─────────────┘ │
│ │
│ TRADE-OFFS: │
│ ✅ Less storage (normalized dimensions) │
│ ✅ Easier dimension maintenance │
│ ❌ More JOINs = slower queries │
│ ❌ More complex for business users │
│ │
└─────────────────────────────────────────────────────────────────┘Conformed Dimensions
┌─────────────────────────────────────────────────────────────────┐
│ CONFORMED DIMENSIONS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SHARED DIMENSION ACROSS FACT TABLES: │
│ │
│ ┌─────────────┐ │
│ │ dim_date │◄──────┬──────────┬──────────┐ │
│ └─────────────┘ │ │ │ │
│ │ │ │ │
│ ┌─────┴────┐ ┌───┴────┐ ┌───┴────┐ │
│ │fact_sales│ │fact_inv│ │fact_hr │ │
│ └──────────┘ └────────┘ └────────┘ │
│ │
│ BENEFITS: │
│ • Consistent reporting across subject areas │
│ • Drill-across queries possible │
│ • Single source of truth for dimension │
│ │
│ REQUIREMENTS: │
│ • Same surrogate keys │
│ • Same attributes and hierarchies │
│ • Same grain │
│ │
└─────────────────────────────────────────────────────────────────┘Date Dimension
sql
-- Essential date dimension
CREATE TABLE dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD format
full_date DATE NOT NULL UNIQUE,
-- Calendar attributes
year INTEGER NOT NULL,
quarter INTEGER NOT NULL,
month INTEGER NOT NULL,
month_name VARCHAR(20) NOT NULL,
week_of_year INTEGER NOT NULL,
day_of_month INTEGER NOT NULL,
day_of_week INTEGER NOT NULL,
day_name VARCHAR(20) NOT NULL,
-- Fiscal calendar (adjust for your fiscal year)
fiscal_year INTEGER NOT NULL,
fiscal_quarter INTEGER NOT NULL,
fiscal_month INTEGER NOT NULL,
-- Flags
is_weekend BOOLEAN NOT NULL,
is_holiday BOOLEAN DEFAULT FALSE,
holiday_name VARCHAR(100),
-- Relative flags (useful for filtering)
is_current_day BOOLEAN DEFAULT FALSE,
is_current_week BOOLEAN DEFAULT FALSE,
is_current_month BOOLEAN DEFAULT FALSE
);
-- Generate date dimension (PostgreSQL)
INSERT INTO dim_date
SELECT
TO_CHAR(d, 'YYYYMMDD')::INTEGER as date_key,
d as full_date,
EXTRACT(YEAR FROM d) as year,
EXTRACT(QUARTER FROM d) as quarter,
EXTRACT(MONTH FROM d) as month,
TO_CHAR(d, 'Month') as month_name,
EXTRACT(WEEK FROM d) as week_of_year,
EXTRACT(DAY FROM d) as day_of_month,
EXTRACT(DOW FROM d) as day_of_week,
TO_CHAR(d, 'Day') as day_name,
-- Fiscal year starting April
CASE WHEN EXTRACT(MONTH FROM d) >= 4
THEN EXTRACT(YEAR FROM d)
ELSE EXTRACT(YEAR FROM d) - 1 END as fiscal_year,
-- ... more calculations
EXTRACT(DOW FROM d) IN (0, 6) as is_weekend
FROM generate_series('2020-01-01'::date, '2030-12-31'::date, '1 day') as d;🔍 Common Review Comments
⚠️ OLAP/DW Design Review Feedback
Data Warehouse designs often reveal issues only after data is loaded. These comments catch problems before they become expensive to fix.
| Review Comment | Ý nghĩa | Cách khắc phục |
|---|---|---|
| "Grain không declared — làm sao biết aggregation correct?" | Missing grain statement → double-counting risk | Document grain FIRST, before any other design |
| "Dimension này nên conformed across fact tables" | Same dimension với different keys → drill-across fails | Centralize dimension definition, share surrogate keys |
| "SCD Type 2 không có effective_date index" | Historical queries scan entire dimension | Index (natural_key, effective_date) cho range lookups |
| "Fact table thiếu degenerate dimension" | Transaction ID chỉ lưu mà không model → join khó | Include transaction_id in fact, mark as degenerate dimension |
| "Dimension quá rộng — nên snowflake hoặc mini-dimension" | 100+ columns dimension → slow ETL, excessive memory | Split rapidly changing attributes to mini-dimension |
| "Late-arriving fact strategy chưa defined" | Facts arrive after dimension cutoff → orphan facts | Define late-arriving dimension member pattern |
| "Aggregate table refresh schedule conflicts với ETL window" | Materialized views refresh during load → stale data | Coordinate refresh với ETL completion |
📊 Grain & Cardinality Deep-Dive
❌ Grain Is The #1 DW Mistake
Hầu hết data warehouse failures bắt đầu từ grain definition sai. Một khi data được loaded với wrong grain, recovery tốn millions.
Grain Declaration Standards
┌─────────────────────────────────────────────────────────────────┐
│ GRAIN DECLARATION TEMPLATE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ FACT TABLE: fact_sales │
│ │
│ GRAIN STATEMENT (must answer all questions): │
│ ───────────────────────────────────────────── │
│ "One row represents: One line item on one transaction │
│ at one store, by one customer, │
│ on one date, for one product" │
│ │
│ GRAIN COMPONENTS: │
│ • WHAT is being measured? → Line item (not order, not product) │
│ • WHEN? → Transaction timestamp (not day, not week) │
│ • WHERE? → Store location │
│ • WHO? → Customer (if known) │
│ • HOW OFTEN? → Per transaction │
│ │
│ ADDITIVE MEASURES (can SUM across all dimensions): │
│ • quantity, revenue, discount │
│ │
│ SEMI-ADDITIVE (can SUM across some dimensions): │
│ • inventory_on_hand (cannot SUM across time) │
│ │
│ NON-ADDITIVE (cannot SUM): │
│ • unit_price, ratio, percentage │
│ │
└─────────────────────────────────────────────────────────────────┘Cardinality Explosion Warning
┌─────────────────────────────────────────────────────────────────┐
│ CARDINALITY EXPLOSION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SCENARIO: E-commerce data warehouse │
│ │
│ Dimensions: │
│ • dim_date: 3,650 rows (10 years) │
│ • dim_customer: 10,000,000 rows │
│ • dim_product: 500,000 rows │
│ • dim_store: 1,000 rows │
│ │
│ POTENTIAL COMBINATIONS: │
│ 3,650 × 10M × 500K × 1K = 18,250,000,000,000,000,000 rows │
│ (18 quintillion!) │
│ │
│ REALITY: Most combinations don't occur │
│ → Actual fact table: ~500M rows (sparse matrix) │
│ │
│ PROBLEMS WHEN GRAIN IS WRONG: │
│ • Too coarse: Loss of detail, cannot drill down │
│ • Too fine: Storage explosion, slow queries │
│ • Misaligned: JOIN creates Cartesian product │
│ │
│ REAL INCIDENT: │
│ Warehouse defined grain as "order" but stored "line items" │
│ → Report showed $500M revenue instead of $100M │
│ → CEO made investment decision on wrong data │
│ → 3-month restatement project │
│ │
└─────────────────────────────────────────────────────────────────┘⚠️ Anti-patterns
❌ Data Warehouse Design Mistakes
Những anti-patterns này không visible ngay — chúng emerge từ từ qua years của accumulated bad data.
Anti-pattern 1: Wrong Grain Definition
┌─────────────────────────────────────────────────────────────────┐
│ WRONG GRAIN ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ -- Claimed grain: "One row per order" │
│ -- Actual data: One row per order LINE ITEM │
│ │
│ fact_sales: │
│ order_id │ product │ quantity │ revenue │
│ 1 │ iPhone │ 1 │ 999 │
│ 1 │ Case │ 2 │ 50 │
│ 1 │ Charger │ 1 │ 29 │
│ │
│ ❌ WRONG QUERY (assumes order grain): │
│ SELECT COUNT(order_id) FROM fact_sales → Returns 3, not 1! │
│ SELECT SUM(revenue) → Correct by accident │
│ │
│ CONSEQUENCES: │
│ • COUNT inflated: Every report shows wrong counts │
│ • AVG wrong: Average order value divided by line items │
│ • JOIN issues: Each order JOINs multiple times │
│ • Trust destroyed: Users stop believing warehouse │
│ │
│ FIX: │
│ 1. ACKNOWLEDGE actual grain (line items) │
│ 2. Create proper order-level aggregate (if needed) │
│ 3. Update all reports to use DISTINCT counts │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 2: Missing Conformed Dimensions
┌─────────────────────────────────────────────────────────────────┐
│ NON-CONFORMED DIMENSION ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ -- Sales team: dim_customer with customer_key = 1,2,3... │
│ -- Support team: dim_customer with customer_key = 101,102... │
│ -- Marketing: dim_customer with customer_id (natural key) │
│ │
│ CONSEQUENCES: │
│ • Cannot combine: "How many customers bought AND opened ticket?"│
│ • Data silos: Each team has different truth │
│ • Drill-across fails: Cannot navigate between fact tables │
│ • Reconciliation nightmare: Who has the right customer count? │
│ │
│ REAL INCIDENT: │
│ Sales reported 50K active customers │
│ Marketing reported 75K active customers │
│ CEO asked "Which is right?" — Neither could answer │
│ │
│ FIX: │
│ 1. Create ONE conformed customer dimension │
│ 2. All fact tables use same customer_key │
│ 3. Single source of truth for customer attributes │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 3: SCD Type 2 Without Partition Pruning
┌─────────────────────────────────────────────────────────────────┐
│ SLOW SCD TYPE 2 ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ dim_customer_scd2: 100M rows (10M customers × 10 versions avg) │
│ │
│ -- Query current customers: │
│ SELECT * FROM dim_customer_scd2 WHERE is_current = TRUE │
│ -- Scans ALL 100M rows to find 10M current! │
│ │
│ CONSEQUENCES: │
│ • Full table scans: Every query touches all history │
│ • Slow JOINs: Fact-to-dimension JOIN is expensive │
│ • ETL bottleneck: SCD processing takes hours │
│ • Index bloat: B-tree on is_current has poor selectivity │
│ │
│ BETTER APPROACHES: │
│ │
│ 1. PARTIAL INDEX: │
│ CREATE INDEX idx_current ON dim_customer │
│ WHERE is_current = TRUE; │
│ │
│ 2. PARTITIONING: │
│ PARTITION BY LIST (is_current) │
│ -- Current partition is small, history partition is large │
│ │
│ 3. SEPARATE TABLES: │
│ dim_customer_current: 10M rows (active) │
│ dim_customer_history: 90M rows (archived) │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 4: Ignoring Late-Arriving Facts/Dimensions
┌─────────────────────────────────────────────────────────────────┐
│ LATE-ARRIVING DATA ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ -- ETL runs at midnight │
│ -- Fact arrives: order_date = 2024-01-15, customer_id = 999 │
│ -- But customer 999 was created at 2024-01-15 11:00 PM │
│ -- Dimension ETL already ran → customer 999 not in dim yet! │
│ │
│ CONSEQUENCES: │
│ • Orphan facts: Facts reference non-existent dimension keys │
│ • Failed loads: FK constraint violations │
│ • Missing data: Facts dropped silently │
│ • Wrong attribution: Facts assigned to "Unknown" member │
│ │
│ PATTERNS FOR HANDLING: │
│ │
│ 1. INFERRED DIMENSION MEMBER: │
│ Insert placeholder: customer_key=999, name="Unknown" │
│ Update when real data arrives │
│ │
│ 2. LATE-ARRIVING FACT TABLE: │
│ Stage late facts separately │
│ Reprocess when dimensions catch up │
│ │
│ 3. MICRO-BATCH ETL: │
│ Run dimension loads more frequently │
│ Reduce window for late-arriving scenarios │
│ │
└─────────────────────────────────────────────────────────────────┘🔄 Migration Notes
SCD Type 2 Backfill
sql
-- Scenario: Converting Type 1 to Type 2 (adding history)
-- Step 1: Add SCD columns
ALTER TABLE dim_customer ADD COLUMN effective_date DATE;
ALTER TABLE dim_customer ADD COLUMN expiration_date DATE DEFAULT '9999-12-31';
ALTER TABLE dim_customer ADD COLUMN is_current BOOLEAN DEFAULT TRUE;
ALTER TABLE dim_customer ADD COLUMN version INTEGER DEFAULT 1;
-- Step 2: Set effective_date for existing records
UPDATE dim_customer
SET effective_date = COALESCE(created_at::date, '2020-01-01');
-- Step 3: Create partial index for current records
CREATE INDEX idx_dim_customer_current ON dim_customer(customer_id)
WHERE is_current = TRUE;
-- Step 4: Update ETL to handle SCD Type 2 logic going forwardLate-Arriving Dimension Handling
sql
-- Create placeholder for unknown dimension members
INSERT INTO dim_customer (customer_key, customer_id, name, is_inferred)
VALUES (-1, 'UNKNOWN', 'Unknown Customer', FALSE);
-- Insert inferred member when fact references unknown customer
INSERT INTO dim_customer (customer_key, customer_id, name, is_inferred)
SELECT
nextval('dim_customer_seq'),
f.customer_id,
'Inferred - Pending Update',
TRUE
FROM staging_facts f
LEFT JOIN dim_customer d ON f.customer_id = d.customer_id
WHERE d.customer_key IS NULL
ON CONFLICT (customer_id) DO NOTHING;
-- Later: Update inferred members when real data arrives
UPDATE dim_customer d
SET
name = s.name,
email = s.email,
is_inferred = FALSE
FROM staging_customers s
WHERE d.customer_id = s.customer_id
AND d.is_inferred = TRUE;Aggregate Table Refresh Strategy
| Strategy | Use Case | Trade-off |
|---|---|---|
| Full rebuild | Small aggregates (<1M rows) | Simple, but expensive |
| Incremental | Large aggregates, additive measures | Complex, but fast |
| Partition swap | Date-partitioned aggregates | Atomic, no downtime |
| Materialized view | Complex JOINs, need freshness | DB-managed, but refresh overhead |
📎 Cross-References
- 📎 OLTP Modeling - So sánh với transactional modeling
- 📎 Normalization - Khi nào denormalize
- 📎 Schema Evolution - SCD migration strategies
- 📎 SQL Window Functions - Analytics queries
- 📎 SQL Aggregation - GROUP BY patterns cho OLAP