Giao diện
🌐 Distributed Systems Modeling
Level: Advanced Solves: Thiết kế data models cho distributed databases với proper sharding và consistency
Distributed Data Fundamentals
💡 Giáo sư Tom
Distributed systems không phải là "relational database nhưng to hơn". Đó là một paradigm hoàn toàn khác với trade-offs riêng. CAP theorem không phải là lý thuyết - đó là reality check cho mọi design decision.
CAP Theorem Reality
┌─────────────────────────────────────────────────────────────────┐
│ CAP THEOREM │
├─────────────────────────────────────────────────────────────────┤
│ │
│ CONSISTENCY │
│ ▲ │
│ /│\ │
│ / │ \ │
│ / │ \ │
│ / │ \ │
│ / CP │ CA \ │
│ / │ \ │
│ / │ \ │
│ / │ \ │
│ / │ \ │
│ ▼─────────┴─────────▼ │
│ PARTITION AVAILABILITY │
│ TOLERANCE │
│ \ / │
│ \ AP / │
│ \ / │
│ \ / │
│ \ / │
│ ▼ │
│ │
│ CP: MongoDB, HBase, Redis Cluster │
│ → Consistent but may be unavailable during partition │
│ │
│ AP: Cassandra, DynamoDB, CouchDB │
│ → Available but may return stale data │
│ │
│ CA: Traditional RDBMS (single node) │
│ → Not partition tolerant (not truly distributed) │
│ │
│ REALITY: Network partitions WILL happen. │
│ Choose between C and A during partition. │
│ │
└─────────────────────────────────────────────────────────────────┘Distributed Identifiers
UUID vs Sequential IDs
┌─────────────────────────────────────────────────────────────────┐
│ IDENTIFIER STRATEGIES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SEQUENTIAL (Auto-increment) UUID/ULID │
│ ────────────────────────── ──────── │
│ │
│ ✅ Compact (8 bytes) ❌ Large (16 bytes) │
│ ✅ Sortable by creation ✅ Globally unique │
│ ✅ Human readable ✅ No coordination needed │
│ ❌ Requires coordination ✅ Generate anywhere │
│ ❌ Reveals business info ✅ No info leakage │
│ ❌ Hot spot on insert ❌ Random distribution │
│ │
│ DISTRIBUTED-FRIENDLY OPTIONS: │
│ │
│ 1. UUID v4 (Random) │
│ 550e8400-e29b-41d4-a716-446655440000 │
│ → Truly random, no ordering │
│ │
│ 2. UUID v7 (Time-ordered) │
│ 018e5e5c-7b3a-7000-8000-000000000000 │
│ → Time-sortable, better index locality │
│ │
│ 3. ULID (Universally Unique Lexicographically Sortable ID) │
│ 01ARZ3NDEKTSV4RRFFQ69G5FAV │
│ → Sortable, URL-safe, 26 chars │
│ │
│ 4. Snowflake ID (Twitter) │
│ 1234567890123456789 │
│ → 64-bit, time + machine + sequence │
│ │
└─────────────────────────────────────────────────────────────────┘Snowflake ID Structure
┌─────────────────────────────────────────────────────────────────┐
│ SNOWFLAKE ID (64 bits) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─┬────────────────────────┬──────────┬────────────────┐ │
│ │0│ Timestamp │ Machine │ Sequence │ │
│ │ │ (41 bits) │ (10 bits)│ (12 bits) │ │
│ └─┴────────────────────────┴──────────┴────────────────┘ │
│ 1 41 10 12 │
│ │
│ BREAKDOWN: │
│ • Sign bit: Always 0 (positive number) │
│ • Timestamp: Milliseconds since epoch (69 years) │
│ • Machine ID: 1024 unique machines │
│ • Sequence: 4096 IDs per millisecond per machine │
│ │
│ CAPACITY: 4,096,000 IDs/second/machine │
│ │
│ BENEFITS: │
│ ✅ Time-sortable (roughly) │
│ ✅ No coordination between machines │
│ ✅ Compact (64-bit integer) │
│ ✅ Extractable timestamp │
│ │
└─────────────────────────────────────────────────────────────────┘Sharding Strategies
Sharding Key Selection
⚠️ Sharding Key is Forever
Changing sharding key requires full data migration. Choose wisely upfront.
┌─────────────────────────────────────────────────────────────────┐
│ SHARDING KEY CRITERIA │
├─────────────────────────────────────────────────────────────────┤
│ │
│ GOOD SHARDING KEY: │
│ ✅ High cardinality (many unique values) │
│ ✅ Even distribution (no hot spots) │
│ ✅ Immutable (never changes) │
│ ✅ Frequently used in queries │
│ ✅ Supports common access patterns │
│ │
│ BAD SHARDING KEY: │
│ ❌ Low cardinality (status, country) │
│ ❌ Monotonically increasing (timestamp, auto-increment) │
│ ❌ Skewed distribution (celebrity user_id) │
│ ❌ Mutable (email, username) │
│ │
│ EXAMPLES: │
│ │
│ E-commerce Orders: │
│ ✅ customer_id (queries by customer) │
│ ❌ order_date (hot partition for today) │
│ ❌ status (only 5 values) │
│ │
│ Social Media Posts: │
│ ✅ user_id (queries by user) │
│ ❌ created_at (hot partition for recent) │
│ ⚠️ post_id (good distribution, but queries need user_id) │
│ │
└─────────────────────────────────────────────────────────────────┘Sharding Patterns
┌─────────────────────────────────────────────────────────────────┐
│ SHARDING PATTERNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. HASH-BASED SHARDING │
│ shard = hash(key) % num_shards │
│ │
│ ┌─────┐ hash ┌─────────┐ │
│ │ key │ ──────► │ shard_2 │ │
│ └─────┘ └─────────┘ │
│ │
│ ✅ Even distribution │
│ ❌ Range queries require scatter-gather │
│ ❌ Resharding is expensive │
│ │
│ 2. RANGE-BASED SHARDING │
│ shard = range(key) │
│ │
│ A-M → Shard 1 │
│ N-Z → Shard 2 │
│ │
│ ✅ Range queries efficient │
│ ❌ Potential hot spots │
│ ❌ Uneven distribution │
│ │
│ 3. DIRECTORY-BASED SHARDING │
│ Lookup table maps key → shard │
│ │
│ ┌─────────────────┐ │
│ │ key │ shard │ │
│ │ user1 │ shard_1 │ │
│ │ user2 │ shard_3 │ │
│ └─────────────────┘ │
│ │
│ ✅ Flexible placement │
│ ❌ Lookup overhead │
│ ❌ Directory is SPOF │
│ │
│ 4. CONSISTENT HASHING │
│ Virtual ring with nodes │
│ │
│ ✅ Minimal reshuffling on node add/remove │
│ ✅ Good for caches, distributed storage │
│ ❌ More complex implementation │
│ │
└─────────────────────────────────────────────────────────────────┘Consistency Models
Consistency Spectrum
┌─────────────────────────────────────────────────────────────────┐
│ CONSISTENCY SPECTRUM │
├─────────────────────────────────────────────────────────────────┤
│ │
│ STRONG ◄─────────────────────────────────────────► EVENTUAL │
│ │
│ Linearizable │
│ │ • Single global order │
│ │ • Reads see latest write │
│ │ • Highest latency │
│ │ │
│ Sequential Consistency │
│ │ • Operations in program order │
│ │ • All processes see same order │
│ │ │
│ Causal Consistency │
│ │ • Causally related ops in order │
│ │ • Concurrent ops may differ │
│ │ │
│ Read-Your-Writes │
│ │ • See your own writes │
│ │ • Others may see stale │
│ │ │
│ Eventual Consistency │
│ • Eventually converges │
│ • May read stale data │
│ • Lowest latency │
│ │
└─────────────────────────────────────────────────────────────────┘Consistency Patterns
┌─────────────────────────────────────────────────────────────────┐
│ CONSISTENCY PATTERNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. READ-AFTER-WRITE CONSISTENCY │
│ User sees their own writes immediately │
│ │
│ Implementation: │
│ • Read from leader after write │
│ • Include version in response, require in read │
│ • Sticky sessions to same replica │
│ │
│ 2. MONOTONIC READS │
│ Never see older data after seeing newer │
│ │
│ Implementation: │
│ • Track last-read timestamp per user │
│ • Route to replica with >= timestamp │
│ │
│ 3. QUORUM READS/WRITES │
│ W + R > N ensures overlap │
│ │
│ N=3 replicas: │
│ • W=2, R=2: Strong consistency │
│ • W=1, R=1: Eventual consistency, fast │
│ • W=3, R=1: Fast reads, slow writes │
│ │
│ 4. CONFLICT RESOLUTION │
│ When concurrent writes conflict: │
│ • Last-Write-Wins (LWW): Timestamp decides │
│ • Vector Clocks: Track causality │
│ • CRDTs: Mathematically mergeable │
│ • Application-level: Custom merge logic │
│ │
└─────────────────────────────────────────────────────────────────┘Cross-Shard Operations
Distributed Transactions
┌─────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED TRANSACTION PATTERNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. TWO-PHASE COMMIT (2PC) │
│ │
│ Coordinator │
│ │ │
│ ├──► Prepare? ──► Shard 1 ──► Yes │
│ ├──► Prepare? ──► Shard 2 ──► Yes │
│ │ │
│ ├──► Commit ────► Shard 1 │
│ └──► Commit ────► Shard 2 │
│ │
│ ❌ Blocking if coordinator fails │
│ ❌ High latency │
│ ✅ Strong consistency │
│ │
│ 2. SAGA PATTERN │
│ │
│ T1 ──► T2 ──► T3 ──► Success │
│ │ │ │ │
│ │ │ └──► C3 (compensate) │
│ │ └──► C2 │
│ └──► C1 │
│ │
│ ✅ No blocking │
│ ✅ Better availability │
│ ❌ Eventual consistency │
│ ❌ Complex compensation logic │
│ │
│ 3. OUTBOX PATTERN │
│ │
│ ┌─────────────────────────────────┐ │
│ │ Transaction: │ │
│ │ UPDATE orders SET status=... │ │
│ │ INSERT INTO outbox (event) │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Background process reads outbox → publishes events │
│ │
│ ✅ Atomic with local transaction │
│ ✅ At-least-once delivery │
│ ❌ Requires polling or CDC │
│ │
└─────────────────────────────────────────────────────────────────┘Cross-Shard Query Patterns
sql
-- Pattern 1: Scatter-Gather
-- Query all shards, aggregate results
-- Application code:
-- results = []
-- for shard in shards:
-- results.extend(shard.query("SELECT * FROM orders WHERE status = 'pending'"))
-- return sorted(results, key=lambda x: x.created_at)[:100]
-- Pattern 2: Global Secondary Index
-- Maintain separate index table for cross-shard queries
CREATE TABLE order_status_index (
status VARCHAR(20),
order_id BIGINT,
shard_id INT,
PRIMARY KEY (status, order_id)
);
-- Query index first, then fetch from specific shards
-- Pattern 3: Denormalization
-- Store frequently-queried data in multiple shards
-- orders table (sharded by customer_id)
-- order_by_status table (sharded by status)
-- Write to both, read from appropriate oneData Modeling Best Practices
┌─────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED MODELING CHECKLIST │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ✅ Choose sharding key based on access patterns │
│ ✅ Design for single-shard queries when possible │
│ ✅ Use distributed-friendly IDs (ULID, Snowflake) │
│ ✅ Denormalize to avoid cross-shard joins │
│ ✅ Plan for eventual consistency │
│ ✅ Implement idempotent operations │
│ ✅ Design compensation logic for failures │
│ ✅ Monitor shard distribution and hot spots │
│ ✅ Plan resharding strategy upfront │
│ ✅ Test failure scenarios (network partition, node failure) │
│ │
└─────────────────────────────────────────────────────────────────┘🔍 Common Review Comments
⚠️ Distributed Systems Review Feedback
Distributed system designs have no undo — resharding and consistency changes are multi-quarter projects.
| Review Comment | Ý nghĩa | Cách khắc phục |
|---|---|---|
| "Sharding key không align với primary access pattern" | Most queries need cross-shard scatter-gather | Redesign shard key theo query pattern |
| "Cross-shard transaction sẽ kill throughput" | 2PC across shards adds latency | Saga pattern hoặc single-shard design |
| "Resharding plan chưa có" | Data growth sẽ require resharding | Design for resharding from day 1 |
| "Mutable sharding key được dùng" | Entity shard changes require cross-shard move | Immutable shard key only |
| "No eventual consistency handling" | Read-your-writes fails silently | Sticky sessions hoặc explicit consistency |
| "Clock skew not considered" | Timestamp ordering invalid across nodes | Logical clocks (Lamport, vector) |
⚠️ Anti-patterns
Anti-pattern: Cross-Shard Joins
┌─────────────────────────────────────────────────────────────────┐
│ CROSS-SHARD JOIN ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ Shard A: Users 1-1000 │
│ Shard B: Users 1001-2000 │
│ Query: SELECT * FROM users u JOIN orders o ON u.id = o.user_id │
│ WHERE o.created_at > '2024-01-01' │
│ → Orders for user 500 on Shard A, user 1500 on Shard B │
│ → Scatter-gather to ALL shards, merge at coordinator │
│ │
│ CONSEQUENCES: │
│ • Latency: Proportional to slowest shard │
│ • Memory: Coordinator buffers all results │
│ • Availability: One shard down = query fails │
│ │
│ ✅ SOLUTION: │
│ • Collocate related data: Shard users + orders by user_id │
│ • Denormalize: Embed order summary in user document │
│ • Pre-aggregate: Materialized views per shard │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern: Mutable Sharding Key
┌─────────────────────────────────────────────────────────────────┐
│ MUTABLE SHARD KEY ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ Shard key: customer.region (US, EU, APAC) │
│ → Customer moves from US to EU │
│ → All customer data must migrate from Shard-US to Shard-EU │
│ → During migration: partial data, consistency issues │
│ │
│ CONSEQUENCES: │
│ • Complex migration logic │
│ • Inconsistent state during transition │
│ • Cross-shard transactions required │
│ │
│ ✅ SOLUTION: Immutable shard key │
│ Shard by: customer_id (never changes) │
│ Store region as attribute (can change without resharding) │
│ │
└─────────────────────────────────────────────────────────────────┘Anti-pattern: Ignoring Eventual Consistency UX
┌─────────────────────────────────────────────────────────────────┐
│ CONSISTENCY UX ANTI-PATTERN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ❌ PROBLEM: │
│ User: POST /orders → 201 Created │
│ User: GET /orders → Empty list (read from replica) │
│ User: "Where's my order?!" → Support ticket │
│ │
│ REAL INCIDENT: │
│ E-commerce app với eventually consistent order list │
│ → Users place duplicate orders ("didn't see first one") │
│ → $2M in duplicate shipments before fix │
│ │
│ ✅ SOLUTIONS: │
│ 1. Read-your-writes: Sticky sessions to primary │
│ 2. Optimistic UI: Show pending item immediately │
│ 3. Explicit sync: "Order processing..." spinner │
│ 4. Version checking: Poll until version matches │
│ │
└─────────────────────────────────────────────────────────────────┘🔄 Migration Notes
Resharding Strategies
| Strategy | Description | Trade-off |
|---|---|---|
| Double-write | Write to old + new shard | Consistency complexity |
| Ghost tables | Shadow writes, flip at cutover | Storage cost |
| Logical dump | Export/import with downtime | Simplest, requires window |
| Online migration | Streaming replication + cutover | Complex, zero downtime |
Dual-Write Pattern
┌─────────────────────────────────────────────────────────────────┐
│ DUAL-WRITE MIGRATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ PHASE 1: Start dual-write │
│ Write → Old Shard + New Shard │
│ Read → Old Shard │
│ │
│ PHASE 2: Backfill │
│ Copy historical data: Old → New (batched) │
│ Verify checksums │
│ │
│ PHASE 3: Shadow read │
│ Read both, compare results, log discrepancies │
│ │
│ PHASE 4: Cutover │
│ Read → New Shard │
│ Write → New Shard + Old Shard (safety) │
│ │
│ PHASE 5: Cleanup │
│ Stop writes to Old Shard │
│ Deprecate and archive │
│ │
└─────────────────────────────────────────────────────────────────┘📎 Cross-References
- 📎 NoSQL Modeling - Query-driven design patterns
- 📎 Schema Evolution - Migrating distributed schemas
- 📎 System Design - Distributed system architecture
- 📎 SQL Transactions - ACID vs BASE comparison