Giao diện
💾 OLTP Modeling
Level: Core Solves: Thiết kế data models tối ưu cho Online Transaction Processing systems
OLTP Characteristics
💡 Giáo sư Tom
OLTP là backbone của mọi business application. Mỗi millisecond latency, mỗi deadlock, mỗi constraint violation đều ảnh hưởng trực tiếp đến user experience và revenue. Design for correctness first, then optimize.
OLTP vs OLAP
┌─────────────────────────────────────────────────────────────────┐
│ OLTP vs OLAP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Characteristic OLTP OLAP │
│ ────────────── ──── ──── │
│ │
│ Purpose Day-to-day operations Analysis/Reporting │
│ Operations INSERT, UPDATE, DELETE SELECT (complex) │
│ Query Pattern Simple, by PK Aggregations, JOINs│
│ Data Volume/Query Few rows Millions of rows │
│ Response Time Milliseconds Seconds to minutes │
│ Normalization Highly normalized Denormalized │
│ Users Many concurrent Few analysts │
│ Data Freshness Real-time Periodic refresh │
│ │
│ EXAMPLES: │
│ OLTP: E-commerce checkout, Banking transactions │
│ OLAP: Sales reports, Customer analytics │
│ │
└─────────────────────────────────────────────────────────────────┘Constraint Design
Types of Constraints
| Constraint | Purpose | Performance Impact |
|---|---|---|
| PRIMARY KEY | Unique identifier | Creates clustered index |
| FOREIGN KEY | Referential integrity | Lookup on referenced table |
| UNIQUE | Prevent duplicates | Creates unique index |
| NOT NULL | Mandatory fields | Minimal |
| CHECK | Business rules | Evaluated on INSERT/UPDATE |
| DEFAULT | Auto-fill values | Minimal |
Constraint Best Practices
sql
-- ✅ GOOD: Comprehensive constraints
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL
REFERENCES customers(customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
total_amount DECIMAL(12, 2) NOT NULL
CHECK (total_amount >= 0),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
-- Composite constraints
CONSTRAINT valid_order_dates
CHECK (shipped_at IS NULL OR shipped_at >= created_at)
);
-- ❌ BAD: Missing constraints (application-level only)
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT, -- No FK, no NOT NULL
status VARCHAR(50), -- No CHECK, no DEFAULT
total_amount DECIMAL(12, 2) -- Can be negative!
);Foreign Key Actions
┌─────────────────────────────────────────────────────────────────┐
│ FOREIGN KEY ACTIONS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ON DELETE: │
│ ┌─────────────┬────────────────────────────────────────────┐ │
│ │ CASCADE │ Delete child rows when parent deleted │ │
│ │ RESTRICT │ Prevent deletion if children exist │ │
│ │ SET NULL │ Set FK to NULL when parent deleted │ │
│ │ SET DEFAULT │ Set FK to default when parent deleted │ │
│ │ NO ACTION │ Like RESTRICT but checked at end of txn │ │
│ └─────────────┴────────────────────────────────────────────┘ │
│ │
│ RECOMMENDATIONS: │
│ • orders → customers: RESTRICT (don't lose order history) │
│ • order_items → orders: CASCADE (items belong to order) │
│ • audit_logs → users: SET NULL (keep logs, user can be gone) │
│ │
└─────────────────────────────────────────────────────────────────┘Indexing Strategy
Index Types
| Type | Use Case | Example |
|---|---|---|
| B-Tree | Default, range queries | Most columns |
| Hash | Equality only | Exact lookups |
| GIN | Arrays, JSONB, full-text | Tags, search |
| GiST | Geometric, range types | Geo queries |
| BRIN | Large tables, sorted data | Time-series |
Index Design Principles
sql
-- 1. Index columns used in WHERE clauses
CREATE INDEX idx_orders_status ON orders(status);
-- 2. Index columns used in JOINs
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- 3. Composite indexes for multi-column queries
-- Column order matters! Most selective first
CREATE INDEX idx_orders_status_date ON orders(status, created_at);
-- 4. Covering indexes to avoid table lookups
CREATE INDEX idx_orders_covering ON orders(customer_id)
INCLUDE (status, total_amount);
-- 5. Partial indexes for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';Index Anti-Patterns
⚠️ Index Overhead
Mỗi index là một write penalty. Đừng index mọi thứ - chỉ index những gì bạn query.
┌─────────────────────────────────────────────────────────────────┐
│ INDEX ANTI-PATTERNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ Over-indexing │
│ • Index mọi column → slow writes │
│ • Unused indexes → wasted storage │
│ │
│ ❌ Wrong column order in composite index │
│ • INDEX(a, b) không help query WHERE b = ? │
│ • Leftmost prefix rule │
│ │
│ ❌ Indexing low-cardinality columns alone │
│ • INDEX(gender) với 2 values → full scan faster │
│ • Combine với high-cardinality column │
│ │
│ ❌ Missing indexes on FKs │
│ • FK không tự động tạo index │
│ • JOIN performance suffers │
│ │
│ ❌ Duplicate indexes │
│ • INDEX(a) và INDEX(a, b) → INDEX(a) redundant │
│ │
└─────────────────────────────────────────────────────────────────┘Transaction Patterns
ACID Properties
┌─────────────────────────────────────────────────────────────────┐
│ ACID PROPERTIES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ A - ATOMICITY │
│ All or nothing. Transaction either completes fully │
│ or rolls back completely. │
│ │
│ C - CONSISTENCY │
│ Database moves from one valid state to another. │
│ Constraints are always satisfied. │
│ │
│ I - ISOLATION │
│ Concurrent transactions don't interfere. │
│ Each sees consistent snapshot. │
│ │
│ D - DURABILITY │
│ Committed transactions survive crashes. │
│ Data is persisted to disk. │
│ │
└─────────────────────────────────────────────────────────────────┘Isolation Levels
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
|---|---|---|---|---|
| READ UNCOMMITTED | ✅ | ✅ | ✅ | Never use |
| READ COMMITTED | ❌ | ✅ | ✅ | Default (PostgreSQL) |
| REPEATABLE READ | ❌ | ❌ | ✅ | Financial reports |
| SERIALIZABLE | ❌ | ❌ | ❌ | Critical operations |
Transaction Best Practices
sql
-- ✅ GOOD: Short transactions, explicit boundaries
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
INSERT INTO transfers (from_id, to_id, amount) VALUES (1, 2, 100);
COMMIT;
-- ❌ BAD: Long-running transaction
BEGIN;
SELECT * FROM large_table; -- Holds locks
-- ... user thinks for 5 minutes ...
UPDATE accounts SET balance = 0; -- Other transactions blocked
COMMIT;
-- ✅ GOOD: Optimistic locking for concurrent updates
UPDATE products
SET stock = stock - 1, version = version + 1
WHERE id = 123 AND version = 5;
-- Check affected rows, retry if 0
-- ✅ GOOD: SELECT FOR UPDATE when needed
BEGIN;
SELECT * FROM inventory WHERE product_id = 123 FOR UPDATE;
-- Now we have exclusive lock
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
COMMIT;Common OLTP Patterns
Soft Deletes
sql
-- Pattern: Soft delete with is_deleted flag
ALTER TABLE customers ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;
ALTER TABLE customers ADD COLUMN deleted_at TIMESTAMP;
-- Query active records
SELECT * FROM customers WHERE is_deleted = FALSE;
-- Partial index for performance
CREATE INDEX idx_customers_active ON customers(email)
WHERE is_deleted = FALSE;Audit Trail
sql
-- Pattern: Audit table for change tracking
CREATE TABLE customer_audit (
audit_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL,
operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
old_data JSONB,
new_data JSONB,
changed_by BIGINT,
changed_at TIMESTAMP DEFAULT NOW()
);
-- Trigger for automatic auditing
CREATE OR REPLACE FUNCTION audit_customer_changes()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO customer_audit (customer_id, operation, old_data, new_data)
VALUES (
COALESCE(NEW.customer_id, OLD.customer_id),
TG_OP,
CASE WHEN TG_OP != 'INSERT' THEN to_jsonb(OLD) END,
CASE WHEN TG_OP != 'DELETE' THEN to_jsonb(NEW) END
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;Optimistic Concurrency
sql
-- Pattern: Version column for optimistic locking
ALTER TABLE products ADD COLUMN version INTEGER DEFAULT 1;
-- Application code pattern:
-- 1. Read: SELECT id, name, price, version FROM products WHERE id = 1
-- 2. Modify in application
-- 3. Update with version check:
UPDATE products
SET name = 'New Name', price = 99.99, version = version + 1
WHERE id = 1 AND version = 5;
-- 4. If affected_rows = 0, someone else modified → retry🔍 Common Review Comments
⚠️ OLTP Design Review Feedback
OLTP systems là backbone của business operations. Những comments dưới đây thường xuất hiện trong production-readiness reviews.
| Review Comment | Ý nghĩa | Cách khắc phục |
|---|---|---|
| "Index này sẽ không được sử dụng với column order như vậy" | Composite index không match query WHERE clause order | Reorder columns theo leftmost prefix rule |
| "Missing FK index sẽ gây cascade delete deadlocks" | FK columns không indexed → slow lookups, lock escalation | Luôn index FK columns |
| "Isolation level quá cao cho use case này" | SERIALIZABLE khi READ COMMITTED đủ → unnecessary locking | Match isolation level với actual requirements |
| "Long-running transaction sẽ block replication" | Transactions hold locks quá lâu → replication lag | Batch operations, shorter transaction windows |
| "Constraint CHECK này sẽ bị violated bởi existing data" | Adding NOT VALID constraint mà không verify | Backfill/clean data trước, hoặc use NOT VALID + validate later |
| "Table partition strategy chưa documented" | Large tables không có partitioning plan | Define partition key và retention policy |
| "Optimistic locking version column ở đâu?" | Concurrent updates không có conflict detection | Add version column cho concurrent-write tables |
⚠️ Anti-patterns
❌ OLTP Performance Killers
Những anti-patterns này thường invisible trong development nhưng explode trong production với real traffic.
Anti-pattern 1: Over-Indexing
┌─────────────────────────────────────────────────────────────────┐
│ OVER-INDEXING ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ CREATE INDEX idx_orders_1 ON orders(customer_id); │
│ CREATE INDEX idx_orders_2 ON orders(status); │
│ CREATE INDEX idx_orders_3 ON orders(created_at); │
│ CREATE INDEX idx_orders_4 ON orders(customer_id, status); │
│ CREATE INDEX idx_orders_5 ON orders(customer_id, created_at); │
│ CREATE INDEX idx_orders_6 ON orders(status, created_at); │
│ CREATE INDEX idx_orders_7 ON orders(customer_id, status, created_at);│
│ -- ... 15 more indexes "just in case" │
│ │
│ CONSEQUENCES: │
│ • Write amplification: Every INSERT updates all indexes │
│ • Storage bloat: Indexes can exceed table size │
│ • Vacuum overhead: More indexes = longer maintenance │
│ • Optimizer confusion: Suboptimal index choice │
│ • INSERT latency: ms → seconds during peak │
│ │
│ REAL INCIDENT: E-commerce với 47 indexes trên orders table │
│ → Black Friday checkout latency 10x normal │
│ → Removed 35 unused indexes, solved problem │
│ │
│ CORRECT APPROACH: │
│ 1. Profile actual queries (pg_stat_statements) │
│ 2. Create indexes for top 5-10 query patterns │
│ 3. Monitor unused indexes (pg_stat_user_indexes) │
│ 4. Review and prune quarterly │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 2: Long-Running Transactions
┌─────────────────────────────────────────────────────────────────┐
│ LONG TRANSACTION ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ BEGIN; │
│ SELECT * FROM products FOR UPDATE; -- Lock all products │
│ -- ... API call to external service (5 seconds) │
│ -- ... Complex calculation (3 seconds) │
│ -- ... File processing (10 seconds) │
│ UPDATE products SET price = ...; │
│ COMMIT; -- 18 seconds later │
│ │
│ CONSEQUENCES: │
│ • Lock contention: Other transactions wait/timeout │
│ • Replication lag: WAL accumulates, replicas fall behind │
│ • Deadlock risk: Long transactions increase deadlock window │
│ • Resource exhaustion: Connection pool starved │
│ • Vacuum blocked: Dead tuples accumulate │
│ │
│ CORRECT APPROACH: │
│ 1. Do external calls OUTSIDE transaction │
│ 2. Batch large operations: 1000 rows at a time │
│ 3. Use advisory locks for application-level coordination │
│ 4. Set statement_timeout as safety net │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 3: Missing FK Indexes
┌─────────────────────────────────────────────────────────────────┐
│ MISSING FK INDEX ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ CREATE TABLE orders ( │
│ order_id BIGINT PRIMARY KEY, │
│ customer_id BIGINT REFERENCES customers(customer_id) │
│ -- Note: PostgreSQL does NOT auto-create FK index! │
│ ); │
│ -- No index on customer_id │
│ │
│ CONSEQUENCES: │
│ • Slow JOINs: Full table scan for every customer lookup │
│ • Slow deletes: DELETE FROM customers triggers seq scan │
│ • Lock escalation: Parent table locked during child check │
│ • Performance degradation: Gets worse with data growth │
│ │
│ REAL INCIDENT: Customer delete took 45 minutes │
│ → Cascade check scanned 50M orders without index │
│ → Adding index reduced to 200ms │
│ │
│ RULE: ALWAYS create index on FK columns │
│ (MySQL/InnoDB does this automatically, PostgreSQL does not) │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern 4: Unbounded Soft Deletes
┌─────────────────────────────────────────────────────────────────┐
│ UNLIMITED SOFT DELETE ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMP; │
│ -- Every query: WHERE deleted_at IS NULL │
│ -- No cleanup, no archival, forever accumulating │
│ │
│ After 5 years: │
│ - 100M total rows, 95M deleted │
│ - Every query scans 100M rows to find 5M active │
│ │
│ CONSEQUENCES: │
│ • Query degradation: Indexes include deleted rows │
│ • Storage waste: Deleted data stays forever │
│ • Compliance risk: Cannot truly delete for GDPR │
│ • Backup bloat: Backing up mostly deleted data │
│ • Index bloat: B-tree includes all soft-deleted rows │
│ │
│ BETTER APPROACHES: │
│ 1. Partial index: CREATE INDEX idx_active ON orders(...) │
│ WHERE deleted_at IS NULL; │
│ 2. Partition by status: Active vs archived partitions │
│ 3. Archive job: Move to archive table after N days │
│ 4. Hard delete + audit log: True deletion with event record │
│ │
└─────────────────────────────────────────────────────────────────┘📊 Grain & Cardinality Risks
❌ Wrong Grain = Wrong Reports
Grain mismatch giữa OLTP và reporting là nguồn gốc của những con số sai dẫn đến quyết định business sai.
┌─────────────────────────────────────────────────────────────────┐
│ GRAIN MISMATCH RISKS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ OLTP GRAIN: One row per order line item │
│ │
│ order_id │ product_id │ quantity │ price │
│ 1 │ A │ 2 │ 10.00 │
│ 1 │ B │ 1 │ 20.00 │
│ 1 │ C │ 3 │ 5.00 │
│ │
│ ❌ WRONG REPORT: COUNT(DISTINCT order_id) = 1 order │
│ But SUM(quantity) = 6 items, SUM(price) = 35.00 │
│ │
│ COMMON MISTAKES: │
│ • Counting orders when grain is line items → inflated count │
│ • Joining without understanding cardinality → duplicated rows │
│ • Aggregating at wrong level → meaningless averages │
│ │
│ CARDINALITY EXPLOSION: │
│ customers (1M) × orders (10M) × line_items (50M) │
│ = 500 trillion rows if JOINed without proper GROUP BY │
│ │
│ MITIGATION: │
│ 1. Document grain explicitly in table comments │
│ 2. Create views at common reporting grains │
│ 3. Review reports for accidental fan-out │
│ │
└─────────────────────────────────────────────────────────────────┘🔄 Migration Notes
Online Schema Changes
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE SCHEMA CHANGE PATTERNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SAFE OPERATIONS (Minimal/No Locking): │
│ • ADD COLUMN with DEFAULT NULL │
│ • CREATE INDEX CONCURRENTLY (PostgreSQL) │
│ • ADD CONSTRAINT ... NOT VALID │
│ • DROP COLUMN (marks as dropped, doesn't reclaim) │
│ │
│ DANGEROUS OPERATIONS (Locks Table): │
│ • ADD COLUMN with DEFAULT value (pre-PG11) │
│ • ALTER COLUMN type │
│ • ADD CONSTRAINT (validates existing data) │
│ • CREATE INDEX (without CONCURRENTLY) │
│ │
│ PATTERNS FOR DANGEROUS OPERATIONS: │
│ │
│ 1. ADD NOT NULL COLUMN: │
│ a. ADD COLUMN nullable │
│ b. Backfill in batches │
│ c. ADD CONSTRAINT CHECK NOT NULL NOT VALID │
│ d. VALIDATE CONSTRAINT (separate transaction) │
│ e. ALTER COLUMN SET NOT NULL │
│ │
│ 2. CHANGE COLUMN TYPE: │
│ a. ADD new column with new type │
│ b. Dual-write (trigger or application) │
│ c. Backfill old data │
│ d. Switch reads to new column │
│ e. DROP old column │
│ │
└─────────────────────────────────────────────────────────────────┘Index Management
sql
-- Safe index creation (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
-- Note: Takes longer, but doesn't lock table
-- Monitor index creation progress
SELECT phase, blocks_done, blocks_total,
ROUND(100.0 * blocks_done / NULLIF(blocks_total, 0), 2) as pct
FROM pg_stat_progress_create_index;
-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey'
AND indexname NOT LIKE '%_unique%';📎 Cross-References
- 📎 Normalization - Normalization rules cho OLTP
- 📎 OLAP Modeling - So sánh với analytical modeling
- 📎 Schema Evolution - Safe migration strategies
- 📎 SQL Transactions - Chi tiết về transaction management
- 📎 SQL Concurrency - Locking và isolation levels