💾 OLTP Modeling

Level: Core Solves: Thiết kế data models tối ưu cho Online Transaction Processing systems

OLTP Characteristics

💡 Giáo sư Tom

OLTP là backbone của mọi business application. Mỗi millisecond latency, mỗi deadlock, mỗi constraint violation đều ảnh hưởng trực tiếp đến user experience và revenue. Design for correctness first, then optimize.

OLTP vs OLAP

┌─────────────────────────────────────────────────────────────────┐
│                    OLTP vs OLAP                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Characteristic      OLTP                    OLAP               │
│  ──────────────      ────                    ────               │
│                                                                 │
│  Purpose             Day-to-day operations   Analysis/Reporting │
│  Operations          INSERT, UPDATE, DELETE  SELECT (complex)   │
│  Query Pattern       Simple, by PK           Aggregations, JOINs│
│  Data Volume/Query   Few rows                Millions of rows   │
│  Response Time       Milliseconds            Seconds to minutes │
│  Normalization       Highly normalized       Denormalized       │
│  Users               Many concurrent         Few analysts       │
│  Data Freshness      Real-time               Periodic refresh   │
│                                                                 │
│  EXAMPLES:                                                      │
│  OLTP: E-commerce checkout, Banking transactions                │
│  OLAP: Sales reports, Customer analytics                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Constraint Design

Types of Constraints

Constraint	Purpose	Performance Impact
PRIMARY KEY	Unique identifier	Creates clustered index
FOREIGN KEY	Referential integrity	Lookup on referenced table
UNIQUE	Prevent duplicates	Creates unique index
NOT NULL	Mandatory fields	Minimal
CHECK	Business rules	Evaluated on INSERT/UPDATE
DEFAULT	Auto-fill values	Minimal

Constraint Best Practices

sql

-- ✅ GOOD: Comprehensive constraints
CREATE TABLE orders (
    order_id        BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT NOT NULL 
                    REFERENCES customers(customer_id)
                    ON DELETE RESTRICT
                    ON UPDATE CASCADE,
    status          VARCHAR(20) NOT NULL DEFAULT 'pending'
                    CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    total_amount    DECIMAL(12, 2) NOT NULL 
                    CHECK (total_amount >= 0),
    created_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    
    -- Composite constraints
    CONSTRAINT valid_order_dates 
        CHECK (shipped_at IS NULL OR shipped_at >= created_at)
);

-- ❌ BAD: Missing constraints (application-level only)
CREATE TABLE orders (
    order_id        BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT,  -- No FK, no NOT NULL
    status          VARCHAR(50),  -- No CHECK, no DEFAULT
    total_amount    DECIMAL(12, 2)  -- Can be negative!
);

Foreign Key Actions

┌─────────────────────────────────────────────────────────────────┐
│                 FOREIGN KEY ACTIONS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ON DELETE:                                                     │
│  ┌─────────────┬────────────────────────────────────────────┐   │
│  │ CASCADE     │ Delete child rows when parent deleted      │   │
│  │ RESTRICT    │ Prevent deletion if children exist         │   │
│  │ SET NULL    │ Set FK to NULL when parent deleted         │   │
│  │ SET DEFAULT │ Set FK to default when parent deleted      │   │
│  │ NO ACTION   │ Like RESTRICT but checked at end of txn    │   │
│  └─────────────┴────────────────────────────────────────────┘   │
│                                                                 │
│  RECOMMENDATIONS:                                               │
│  • orders → customers: RESTRICT (don't lose order history)      │
│  • order_items → orders: CASCADE (items belong to order)        │
│  • audit_logs → users: SET NULL (keep logs, user can be gone)   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Indexing Strategy

Index Types

Type	Use Case	Example
B-Tree	Default, range queries	Most columns
Hash	Equality only	Exact lookups
GIN	Arrays, JSONB, full-text	Tags, search
GiST	Geometric, range types	Geo queries
BRIN	Large tables, sorted data	Time-series

Index Design Principles

sql

-- 1. Index columns used in WHERE clauses
CREATE INDEX idx_orders_status ON orders(status);

-- 2. Index columns used in JOINs
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- 3. Composite indexes for multi-column queries
-- Column order matters! Most selective first
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- 4. Covering indexes to avoid table lookups
CREATE INDEX idx_orders_covering ON orders(customer_id) 
    INCLUDE (status, total_amount);

-- 5. Partial indexes for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at) 
    WHERE status = 'pending';

Index Anti-Patterns

⚠️ Index Overhead

Mỗi index là một write penalty. Đừng index mọi thứ - chỉ index những gì bạn query.

┌─────────────────────────────────────────────────────────────────┐
│                 INDEX ANTI-PATTERNS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ Over-indexing                                               │
│     • Index mọi column → slow writes                            │
│     • Unused indexes → wasted storage                           │
│                                                                 │
│  ❌ Wrong column order in composite index                       │
│     • INDEX(a, b) không help query WHERE b = ?                  │
│     • Leftmost prefix rule                                      │
│                                                                 │
│  ❌ Indexing low-cardinality columns alone                      │
│     • INDEX(gender) với 2 values → full scan faster             │
│     • Combine với high-cardinality column                       │
│                                                                 │
│  ❌ Missing indexes on FKs                                      │
│     • FK không tự động tạo index                                │
│     • JOIN performance suffers                                  │
│                                                                 │
│  ❌ Duplicate indexes                                           │
│     • INDEX(a) và INDEX(a, b) → INDEX(a) redundant              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Transaction Patterns

ACID Properties

┌─────────────────────────────────────────────────────────────────┐
│                    ACID PROPERTIES                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  A - ATOMICITY                                                  │
│      All or nothing. Transaction either completes fully         │
│      or rolls back completely.                                  │
│                                                                 │
│  C - CONSISTENCY                                                │
│      Database moves from one valid state to another.            │
│      Constraints are always satisfied.                          │
│                                                                 │
│  I - ISOLATION                                                  │
│      Concurrent transactions don't interfere.                   │
│      Each sees consistent snapshot.                             │
│                                                                 │
│  D - DURABILITY                                                 │
│      Committed transactions survive crashes.                    │
│      Data is persisted to disk.                                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Isolation Levels

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Use Case
READ UNCOMMITTED	✅	✅	✅	Never use
READ COMMITTED	❌	✅	✅	Default (PostgreSQL)
REPEATABLE READ	❌	❌	✅	Financial reports
SERIALIZABLE	❌	❌	❌	Critical operations

Transaction Best Practices

sql

-- ✅ GOOD: Short transactions, explicit boundaries
BEGIN;
    UPDATE accounts SET balance = balance - 100 WHERE id = 1;
    UPDATE accounts SET balance = balance + 100 WHERE id = 2;
    INSERT INTO transfers (from_id, to_id, amount) VALUES (1, 2, 100);
COMMIT;

-- ❌ BAD: Long-running transaction
BEGIN;
    SELECT * FROM large_table;  -- Holds locks
    -- ... user thinks for 5 minutes ...
    UPDATE accounts SET balance = 0;  -- Other transactions blocked
COMMIT;

-- ✅ GOOD: Optimistic locking for concurrent updates
UPDATE products 
SET stock = stock - 1, version = version + 1
WHERE id = 123 AND version = 5;
-- Check affected rows, retry if 0

-- ✅ GOOD: SELECT FOR UPDATE when needed
BEGIN;
    SELECT * FROM inventory WHERE product_id = 123 FOR UPDATE;
    -- Now we have exclusive lock
    UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
COMMIT;

Common OLTP Patterns

Soft Deletes

sql

-- Pattern: Soft delete with is_deleted flag
ALTER TABLE customers ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE customers ADD COLUMN deleted_at TIMESTAMP;

-- Query active records
SELECT * FROM customers WHERE is_deleted = FALSE;

-- Partial index for performance
CREATE INDEX idx_customers_active ON customers(email) 
    WHERE is_deleted = FALSE;

Audit Trail

sql

-- Pattern: Audit table for change tracking
CREATE TABLE customer_audit (
    audit_id        BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT NOT NULL,
    operation       VARCHAR(10) NOT NULL,  -- INSERT, UPDATE, DELETE
    old_data        JSONB,
    new_data        JSONB,
    changed_by      BIGINT,
    changed_at      TIMESTAMP DEFAULT NOW()
);

-- Trigger for automatic auditing
CREATE OR REPLACE FUNCTION audit_customer_changes()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO customer_audit (customer_id, operation, old_data, new_data)
    VALUES (
        COALESCE(NEW.customer_id, OLD.customer_id),
        TG_OP,
        CASE WHEN TG_OP != 'INSERT' THEN to_jsonb(OLD) END,
        CASE WHEN TG_OP != 'DELETE' THEN to_jsonb(NEW) END
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Optimistic Concurrency

sql

-- Pattern: Version column for optimistic locking
ALTER TABLE products ADD COLUMN version INTEGER DEFAULT 1;

-- Application code pattern:
-- 1. Read: SELECT id, name, price, version FROM products WHERE id = 1
-- 2. Modify in application
-- 3. Update with version check:
UPDATE products 
SET name = 'New Name', price = 99.99, version = version + 1
WHERE id = 1 AND version = 5;
-- 4. If affected_rows = 0, someone else modified → retry

🔍 Common Review Comments

⚠️ OLTP Design Review Feedback

OLTP systems là backbone của business operations. Những comments dưới đây thường xuất hiện trong production-readiness reviews.

Review Comment	Ý nghĩa	Cách khắc phục
"Index này sẽ không được sử dụng với column order như vậy"	Composite index không match query WHERE clause order	Reorder columns theo leftmost prefix rule
"Missing FK index sẽ gây cascade delete deadlocks"	FK columns không indexed → slow lookups, lock escalation	Luôn index FK columns
"Isolation level quá cao cho use case này"	SERIALIZABLE khi READ COMMITTED đủ → unnecessary locking	Match isolation level với actual requirements
"Long-running transaction sẽ block replication"	Transactions hold locks quá lâu → replication lag	Batch operations, shorter transaction windows
"Constraint CHECK này sẽ bị violated bởi existing data"	Adding NOT VALID constraint mà không verify	Backfill/clean data trước, hoặc use NOT VALID + validate later
"Table partition strategy chưa documented"	Large tables không có partitioning plan	Define partition key và retention policy
"Optimistic locking version column ở đâu?"	Concurrent updates không có conflict detection	Add version column cho concurrent-write tables

⚠️ Anti-patterns

❌ OLTP Performance Killers

Những anti-patterns này thường invisible trong development nhưng explode trong production với real traffic.

Anti-pattern 1: Over-Indexing

┌─────────────────────────────────────────────────────────────────┐
│               OVER-INDEXING ANTI-PATTERN                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE INDEX idx_orders_1 ON orders(customer_id);              │
│  CREATE INDEX idx_orders_2 ON orders(status);                   │
│  CREATE INDEX idx_orders_3 ON orders(created_at);               │
│  CREATE INDEX idx_orders_4 ON orders(customer_id, status);      │
│  CREATE INDEX idx_orders_5 ON orders(customer_id, created_at);  │
│  CREATE INDEX idx_orders_6 ON orders(status, created_at);       │
│  CREATE INDEX idx_orders_7 ON orders(customer_id, status, created_at);│
│  -- ... 15 more indexes "just in case"                          │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Write amplification: Every INSERT updates all indexes        │
│  • Storage bloat: Indexes can exceed table size                 │
│  • Vacuum overhead: More indexes = longer maintenance           │
│  • Optimizer confusion: Suboptimal index choice                 │
│  • INSERT latency: ms → seconds during peak                     │
│                                                                 │
│  REAL INCIDENT: E-commerce với 47 indexes trên orders table     │
│  → Black Friday checkout latency 10x normal                     │
│  → Removed 35 unused indexes, solved problem                    │
│                                                                 │
│  CORRECT APPROACH:                                              │
│  1. Profile actual queries (pg_stat_statements)                 │
│  2. Create indexes for top 5-10 query patterns                  │
│  3. Monitor unused indexes (pg_stat_user_indexes)               │
│  4. Review and prune quarterly                                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 2: Long-Running Transactions

┌─────────────────────────────────────────────────────────────────┐
│            LONG TRANSACTION ANTI-PATTERN                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  BEGIN;                                                         │
│    SELECT * FROM products FOR UPDATE;  -- Lock all products     │
│    -- ... API call to external service (5 seconds)              │
│    -- ... Complex calculation (3 seconds)                       │
│    -- ... File processing (10 seconds)                          │
│    UPDATE products SET price = ...;                             │
│  COMMIT;  -- 18 seconds later                                   │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Lock contention: Other transactions wait/timeout             │
│  • Replication lag: WAL accumulates, replicas fall behind       │
│  • Deadlock risk: Long transactions increase deadlock window    │
│  • Resource exhaustion: Connection pool starved                 │
│  • Vacuum blocked: Dead tuples accumulate                       │
│                                                                 │
│  CORRECT APPROACH:                                              │
│  1. Do external calls OUTSIDE transaction                       │
│  2. Batch large operations: 1000 rows at a time                 │
│  3. Use advisory locks for application-level coordination       │
│  4. Set statement_timeout as safety net                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 3: Missing FK Indexes

┌─────────────────────────────────────────────────────────────────┐
│            MISSING FK INDEX ANTI-PATTERN                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE TABLE orders (                                          │
│      order_id BIGINT PRIMARY KEY,                               │
│      customer_id BIGINT REFERENCES customers(customer_id)       │
│      -- Note: PostgreSQL does NOT auto-create FK index!         │
│  );                                                             │
│  -- No index on customer_id                                     │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Slow JOINs: Full table scan for every customer lookup        │
│  • Slow deletes: DELETE FROM customers triggers seq scan        │
│  • Lock escalation: Parent table locked during child check      │
│  • Performance degradation: Gets worse with data growth         │
│                                                                 │
│  REAL INCIDENT: Customer delete took 45 minutes                 │
│  → Cascade check scanned 50M orders without index               │
│  → Adding index reduced to 200ms                                │
│                                                                 │
│  RULE: ALWAYS create index on FK columns                        │
│  (MySQL/InnoDB does this automatically, PostgreSQL does not)    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 4: Unbounded Soft Deletes

┌─────────────────────────────────────────────────────────────────┐
│            UNLIMITED SOFT DELETE ANTI-PATTERN                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMP;            │
│  -- Every query: WHERE deleted_at IS NULL                       │
│  -- No cleanup, no archival, forever accumulating               │
│                                                                 │
│  After 5 years:                                                 │
│  - 100M total rows, 95M deleted                                 │
│  - Every query scans 100M rows to find 5M active                │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Query degradation: Indexes include deleted rows              │
│  • Storage waste: Deleted data stays forever                    │
│  • Compliance risk: Cannot truly delete for GDPR                │
│  • Backup bloat: Backing up mostly deleted data                 │
│  • Index bloat: B-tree includes all soft-deleted rows           │
│                                                                 │
│  BETTER APPROACHES:                                             │
│  1. Partial index: CREATE INDEX idx_active ON orders(...)       │
│                    WHERE deleted_at IS NULL;                    │
│  2. Partition by status: Active vs archived partitions          │
│  3. Archive job: Move to archive table after N days             │
│  4. Hard delete + audit log: True deletion with event record    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

📊 Grain & Cardinality Risks

❌ Wrong Grain = Wrong Reports

Grain mismatch giữa OLTP và reporting là nguồn gốc của những con số sai dẫn đến quyết định business sai.

┌─────────────────────────────────────────────────────────────────┐
│               GRAIN MISMATCH RISKS                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  OLTP GRAIN: One row per order line item                        │
│                                                                 │
│  order_id │ product_id │ quantity │ price                       │
│  1        │ A          │ 2        │ 10.00                       │
│  1        │ B          │ 1        │ 20.00                       │
│  1        │ C          │ 3        │ 5.00                        │
│                                                                 │
│  ❌ WRONG REPORT: COUNT(DISTINCT order_id) = 1 order            │
│     But SUM(quantity) = 6 items, SUM(price) = 35.00             │
│                                                                 │
│  COMMON MISTAKES:                                               │
│  • Counting orders when grain is line items → inflated count    │
│  • Joining without understanding cardinality → duplicated rows  │
│  • Aggregating at wrong level → meaningless averages            │
│                                                                 │
│  CARDINALITY EXPLOSION:                                         │
│  customers (1M) × orders (10M) × line_items (50M)               │
│  = 500 trillion rows if JOINed without proper GROUP BY          │
│                                                                 │
│  MITIGATION:                                                    │
│  1. Document grain explicitly in table comments                 │
│  2. Create views at common reporting grains                     │
│  3. Review reports for accidental fan-out                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

🔄 Migration Notes

Online Schema Changes

┌─────────────────────────────────────────────────────────────────┐
│               ONLINE SCHEMA CHANGE PATTERNS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SAFE OPERATIONS (Minimal/No Locking):                          │
│  • ADD COLUMN with DEFAULT NULL                                 │
│  • CREATE INDEX CONCURRENTLY (PostgreSQL)                       │
│  • ADD CONSTRAINT ... NOT VALID                                 │
│  • DROP COLUMN (marks as dropped, doesn't reclaim)              │
│                                                                 │
│  DANGEROUS OPERATIONS (Locks Table):                            │
│  • ADD COLUMN with DEFAULT value (pre-PG11)                     │
│  • ALTER COLUMN type                                            │
│  • ADD CONSTRAINT (validates existing data)                     │
│  • CREATE INDEX (without CONCURRENTLY)                          │
│                                                                 │
│  PATTERNS FOR DANGEROUS OPERATIONS:                             │
│                                                                 │
│  1. ADD NOT NULL COLUMN:                                        │
│     a. ADD COLUMN nullable                                      │
│     b. Backfill in batches                                      │
│     c. ADD CONSTRAINT CHECK NOT NULL NOT VALID                  │
│     d. VALIDATE CONSTRAINT (separate transaction)               │
│     e. ALTER COLUMN SET NOT NULL                                │
│                                                                 │
│  2. CHANGE COLUMN TYPE:                                         │
│     a. ADD new column with new type                             │
│     b. Dual-write (trigger or application)                      │
│     c. Backfill old data                                        │
│     d. Switch reads to new column                               │
│     e. DROP old column                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Index Management

sql

-- Safe index creation (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
-- Note: Takes longer, but doesn't lock table

-- Monitor index creation progress
SELECT phase, blocks_done, blocks_total, 
       ROUND(100.0 * blocks_done / NULLIF(blocks_total, 0), 2) as pct
FROM pg_stat_progress_create_index;

-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%_pkey'
  AND indexname NOT LIKE '%_unique%';

📎 Cross-References

📎 Normalization - Normalization rules cho OLTP
📎 OLAP Modeling - So sánh với analytical modeling
📎 Schema Evolution - Safe migration strategies
📎 SQL Transactions - Chi tiết về transaction management
📎 SQL Concurrency - Locking và isolation levels

💾 OLTP Modeling ​

OLTP Characteristics ​

OLTP vs OLAP ​

Constraint Design ​

Types of Constraints ​

Constraint Best Practices ​

Foreign Key Actions ​

Indexing Strategy ​

Index Types ​

Index Design Principles ​

Index Anti-Patterns ​

Transaction Patterns ​

ACID Properties ​

Isolation Levels ​

Transaction Best Practices ​

Common OLTP Patterns ​

Soft Deletes ​

Audit Trail ​

Optimistic Concurrency ​

🔍 Common Review Comments ​

⚠️ Anti-patterns ​

Anti-pattern 1: Over-Indexing ​

Anti-pattern 2: Long-Running Transactions ​

Anti-pattern 3: Missing FK Indexes ​

Anti-pattern 4: Unbounded Soft Deletes ​

📊 Grain & Cardinality Risks ​

🔄 Migration Notes ​

Online Schema Changes ​

Index Management ​

📎 Cross-References ​