Giao diện
🗃️ DATABASE DESIGN
Mở rộng Quy mô Dữ liệu: Từ Single Server đến Distributed System
1. Vertical vs Horizontal Scaling
┌─────────────────────────────────────────────────────────┐
│ SCALING APPROACHES │
├─────────────────────────────────────────────────────────┤
│ │
│ VERTICAL SCALING (Scale Up): │
│ ───────────────────────────── │
│ [Small Server] → [BIGGER Server] → [BIGGEST Server] │
│ 4 vCPU 16 vCPU 64 vCPU │
│ 16GB RAM 128GB RAM 1TB RAM │
│ │
│ ✅ Simple: No code changes │
│ ✅ Strong consistency guaranteed │
│ ❌ Hardware limits (~256 cores, ~12TB RAM today) │
│ ❌ Single point of failure │
│ ❌ Cost grows exponentially │
│ │
│ ───────────────────────────────────────────────────── │
│ │
│ HORIZONTAL SCALING (Scale Out): │
│ ────────────────────────────── │
│ [Server] → [Server][Server][Server][Server]... │
│ │
│ ✅ Theoretically unlimited scale │
│ ✅ Better fault tolerance (redundancy) │
│ ✅ Linear cost growth │
│ ❌ Complexity: distributed transactions, consistency │
│ ❌ Network latency between nodes │
│ │
└─────────────────────────────────────────────────────────┘TIP
🔧 Kỹ sư Raizo: Rule of thumb: Scale VERTICALLY until you hit limits or cost becomes unreasonable, THEN scale horizontally. Premature horizontal scaling = unnecessary complexity.
2. Replication (Sao chép Dữ liệu)
2.1 Master-Slave (Primary-Replica) Architecture
┌─────────────────────────────────────────────────────────┐
│ MASTER-SLAVE REPLICATION │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ MASTER │ ◄── All WRITES go here │
│ │ (Primary) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────┼─────────┐ │
│ │ Replication Stream │ (async or sync) │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ SLAVE 1 │ │ SLAVE 2 │ │ SLAVE 3 │ │
│ │ (Replica)│ │ (Replica)│ │ (Replica)│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └────────────┴────────────┘ │
│ │ │
│ ▼ │
│ READS distributed here │
│ │
└─────────────────────────────────────────────────────────┘2.2 Replication Lag Problem
┌─────────────────────────────────────────────────────────┐
│ REPLICATION LAG │
├─────────────────────────────────────────────────────────┤
│ │
│ T=0ms: Client writes "balance=100" to MASTER │
│ T=1ms: Master confirms write to client │
│ T=2ms: Client reads from SLAVE (load balanced) │
│ → SLAVE still has "balance=0" (lag!) │
│ T=50ms: Replication catches up │
│ │
│ TIMELINE: │
│ ───────── │
│ Master: [0ms: balance=100] ─────────────────────────► │
│ │ │
│ replication │
│ (async, 50ms lag) │
│ │ │
│ Slave: [0ms: balance=0] ───► [50ms: balance=100] │
│ ▲ │
│ │ │
│ Client reads here (STALE!) │
│ │
└─────────────────────────────────────────────────────────┘Solutions:
python
# Solution 1: Read Your Own Writes
# After write, force reads to Master for X seconds
def update_user(user_id: str, data: dict):
master_db.execute("UPDATE users SET ...", data)
# Set marker: "this user modified their data"
cache.set(f"dirty:{user_id}", "1", ttl=30)
def get_user(user_id: str):
if cache.exists(f"dirty:{user_id}"):
return master_db.query(user_id) # Read from master
return replica_db.query(user_id) # Normal: read from replica
# Solution 2: Synchronous Replication (Consistency over Performance)
# PostgreSQL: synchronous_commit = on
# MySQL: rpl_semi_sync_master_enabled = 1
# Trade-off: Write latency increases significantly!
# Solution 3: Monotonic Reads
# Track last read position, only read from replicas that are caught upIMPORTANT
Triết lý HPN về Eventual Consistency: Replication lag là KHÔNG THỂ TRÁNH trong async replication. Bạn PHẢI thiết kế application logic chấp nhận "eventual consistency". Ví dụ: Sau khi post comment, redirect về page với 2-second delay.
3. Sharding (Phân mảnh Dữ liệu)
3.1 Sharding Concept
┌─────────────────────────────────────────────────────────┐
│ SHARDING │
├─────────────────────────────────────────────────────────┤
│ │
│ Split data across multiple databases based on KEY │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ ALL USERS DATA │ │
│ │ (100 million rows - too big for one server!) │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ SHARD KEY │
│ (user_id) │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ SHARD 1 │ │ SHARD 2 │ │ SHARD 3 │ │
│ │ users 1-33M│ │users 34-66M│ │users 67-100M│ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘3.2 Range-based vs Hash-based Sharding
┌─────────────────────────────────────────────────────────┐
│ SHARDING STRATEGIES │
├─────────────────────────────────────────────────────────┤
│ │
│ RANGE-BASED: │
│ ──────────── │
│ Shard 1: user_id 1-1,000,000 │
│ Shard 2: user_id 1,000,001-2,000,000 │
│ Shard 3: user_id 2,000,001-3,000,000 │
│ │
│ ✅ Easy range queries (get users 500-600) │
│ ✅ Simple to understand │
│ ❌ Uneven distribution (new users → last shard) │
│ ❌ Hot shard problem │
│ │
│ ───────────────────────────────────────────────────── │
│ │
│ HASH-BASED: │
│ ─────────── │
│ shard_id = hash(user_id) % num_shards │
│ │
│ ✅ Even distribution (statistically) │
│ ✅ No hot shard by default │
│ ❌ Range queries impossible (must query all shards) │
│ ❌ Resharding nightmare (mod N problem!) │
│ → Use CONSISTENT HASHING instead! │
│ │
└─────────────────────────────────────────────────────────┘3.3 Hot Partition Problem (The Justin Bieber Problem)
┌─────────────────────────────────────────────────────────┐
│ HOT PARTITION DISASTER │
├─────────────────────────────────────────────────────────┤
│ │
│ SCENARIO: Twitter shards by user_id │
│ │
│ Shard 1: [user:12345 - normal user, 10 followers] │
│ [user:12346 - normal user, 50 followers] │
│ ... │
│ CPU: 5%, Memory: 20% │
│ │
│ Shard 7: [user:justinbieber - 150 MILLION followers] │
│ │ │
│ └─► Every tweet = 150M timeline writes! │
│ CPU: 100% 💥, Memory: 100% 💥 │
│ Disk I/O: saturated │
│ Other users on Shard 7: SLOW! │
│ │
└─────────────────────────────────────────────────────────┘Solutions:
python
# Solution 1: Virtual Users / Sub-sharding
# Split hot user's data across multiple "virtual" partitions
def get_shard_for_tweet(user_id: str, tweet_id: str) -> str:
if user_id in HOT_USERS: # Maintain a list of celebrities
# Distribute across 10 sub-shards
sub_shard = hash(tweet_id) % 10
return f"shard_{hash(user_id) % N}_{sub_shard}"
return f"shard_{hash(user_id) % N}"
# Solution 2: Separate Hot Path
# Celebrity tweets go to specialized high-capacity cluster
HOT_USER_THRESHOLD = 1_000_000 # followers
def route_tweet(user_id: str, tweet: Tweet):
follower_count = get_follower_count(user_id)
if follower_count > HOT_USER_THRESHOLD:
return celebrity_cluster.write(tweet) # Dedicated infra
return normal_shard.write(user_id, tweet)
# Solution 3: Fan-out at Read vs Write
# Don't pre-compute timelines for celebrities
# Instead, merge their tweets at read time
def get_timeline(user_id: str) -> List[Tweet]:
# Get tweets from followed normal users (pre-computed)
timeline = cache.get(f"timeline:{user_id}")
# Merge with celebrity tweets on-the-fly
for celeb_id in get_followed_celebrities(user_id):
celeb_tweets = celeb_cache.get_recent(celeb_id)
timeline = merge_sorted(timeline, celeb_tweets)
return timeline[:100] # Top 1004. CAP Theorem Applied
4.1 The CAP Triangle
┌─────────────────────────────────────────────────────────┐
│ CAP THEOREM │
├─────────────────────────────────────────────────────────┤
│ │
│ In a distributed system, you can only have 2 of 3: │
│ │
│ CONSISTENCY (C) │
│ /\ │
│ / \ │
│ / \ │
│ / \ │
│ / ?? \ │
│ / \ │
│ /____________\ │
│ AVAILABILITY (A) PARTITION │
│ TOLERANCE (P) │
│ │
│ ⚠️ In production, Partition WILL happen │
│ (network failure, cable cut, datacenter split) │
│ │
│ So the REAL choice is: C+P or A+P │
│ │
└─────────────────────────────────────────────────────────┘4.2 The Scenario: Cáp quang biển bị đứt
┌─────────────────────────────────────────────────────────┐
│ PARTITION SCENARIO: SUBMARINE CABLE CUT │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ US EAST │ │
│ │ Datacenter │ │
│ │ (Primary) │ │
│ └──────────────┘ │
│ │ │
│ │ ← CABLE CUT! ✂️ │
│ │ (Network Partition) │
│ ╳ │
│ │ │
│ ┌──────────────┐ │
│ │ EUROPE │ │
│ │ Datacenter │ │
│ │ (Replica) │ │
│ └──────────────┘ │
│ │
│ European users are sending requests... │
│ What do you do? │
│ │
└─────────────────────────────────────────────────────────┘4.3 Choice 1: Consistency + Partition Tolerance (CP)
┌─────────────────────────────────────────────────────────┐
│ CP SYSTEM (Consistency First) │
├─────────────────────────────────────────────────────────┤
│ │
│ BEHAVIOR: Reject writes, serve reads from local cache │
│ │
│ European User Request: │
│ "Transfer $1000 from Account A to B" │
│ │
│ System Response: │
│ HTTP 503: "Service temporarily unavailable. │
│ We cannot guarantee consistency. Please retry." │
│ │
│ WHY? Without syncing to Primary: │
│ - User A might overdraft (both DCs accept transfer) │
│ - Account balance would be inconsistent │
│ │
│ EXAMPLES: Banking, Financial systems, Inventory │
│ │
│ Trade-off: Users in EU CANNOT use system during │
│ partition. But data is ALWAYS correct. │
│ │
└─────────────────────────────────────────────────────────┘4.4 Choice 2: Availability + Partition Tolerance (AP)
┌─────────────────────────────────────────────────────────┐
│ AP SYSTEM (Availability First) │
├─────────────────────────────────────────────────────────┤
│ │
│ BEHAVIOR: Accept all operations, resolve later │
│ │
│ European User Request: │
│ "Post new tweet: Hello World!" │
│ │
│ System Response: │
│ HTTP 200: "Tweet posted successfully!" │
│ (Stored locally, will sync when cable restored) │
│ │
│ WHY? For social media: │
│ - User experience > perfect consistency │
│ - Duplicate tweets can be resolved later │
│ - "Eventually consistent" is acceptable │
│ │
│ EXAMPLES: Social media, DNS, Shopping cart │
│ │
│ Trade-off: Some users might see stale data, │
│ conflicts need resolution strategy │
│ │
└─────────────────────────────────────────────────────────┘IMPORTANT
🎓 Giáo sư Tom: CAP không phải binary choice. Thực tế, hầu hết systems là "tunable" - bạn có thể chọn consistency level PER OPERATION. Cassandra cho phép: read ONE (fast, maybe stale) vs read QUORUM (slow, consistent).
5. Decision Matrix
| Requirement | Recommended Approach |
|---|---|
| Read-heavy, write-light | Master + Multiple Replicas |
| Write-heavy | Sharding (hash-based with consistent hashing) |
| Strong consistency | Synchronous replication / CP system |
| Geographic distribution | Multi-master with conflict resolution |
| Analytics on full data | Read replicas dedicated for OLAP |
6. Module Summary
🎓 Fundamentals Complete!
Bạn đã hoàn thành Fundamentals. Bây giờ bạn đã có foundation về:
- Scalability: TCP/UDP, HTTP evolution, QUIC
- Load Balancing: L4/L7, Consistent Hashing
- Caching: Patterns, Thundering Herd, Penetration
- Database Design: Replication, Sharding, CAP