Skip to content

💾 CACHING

Bộ nhớ đệm: Vũ khí tối thượng giảm Latency


1. Locality of Reference (Nguyên lý Nền tảng)

NOTE

🎓 Giáo sư Tom: Mọi caching strategy đều dựa trên một nguyên lý: dữ liệu được truy cập gần đây có khả năng cao sẽ được truy cập lại.

1.1 Temporal Locality (Tính cục bộ theo thời gian)

Nếu bạn vừa truy cập dữ liệu X, 
khả năng cao bạn sẽ truy cập X lại trong tương lai gần.

Ví dụ: User profile sau khi login → truy cập liên tục trong session

1.2 Spatial Locality (Tính cục bộ theo không gian)

Nếu bạn truy cập dữ liệu tại địa chỉ X,
khả năng cao bạn sẽ truy cập các địa chỉ gần X.

Ví dụ: Đọc file → đọc các block liền kề, không nhảy random

Memory Hierarchy (Access Time):

LevelTechnologyLatencyCapacity
L1 CacheSRAM~1ns64KB
L2 CacheSRAM~4ns256KB
L3 CacheSRAM~10ns8-32MB
RAMDRAM~100ns16-128GB
SSDNAND Flash~100μs1-4TB
HDDMagnetic~10ms1-16TB
NetworkTCP/IP~1-100ms

2. Caching Patterns

2.1 Cache-Aside (Lazy Loading) - Phổ biến nhất

┌─────────────────────────────────────────────────────────┐
│                  CACHE-ASIDE PATTERN                    │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  READ PATH:                                             │
│  ──────────                                             │
│  1. App checks Cache                                    │
│  2. Cache HIT → Return data                            │
│  3. Cache MISS → Query Database                         │
│  4. Store result in Cache                               │
│  5. Return to client                                    │
│                                                         │
│  [App] ──1──► [Cache] ──HIT──► Return                   │
│    │              │                                     │
│    │            MISS                                    │
│    │              ▼                                     │
│    └────3────► [Database]                               │
│                   │                                     │
│               4. SET                                    │
│                   ▼                                     │
│              [Cache]                                    │
│                                                         │
│  WRITE PATH:                                            │
│  ───────────                                            │
│  1. Write to Database                                   │
│  2. Invalidate/Delete Cache entry                       │
│                                                         │
└─────────────────────────────────────────────────────────┘
python
# Cache-Aside Implementation
def get_user(user_id: str) -> User:
    # 1. Check cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return deserialize(cached)
    
    # 2. Cache miss - query database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Populate cache with TTL
    redis.setex(f"user:{user_id}", ttl=3600, value=serialize(user))
    
    return user

def update_user(user_id: str, data: dict) -> None:
    # 1. Update database (source of truth)
    db.execute("UPDATE users SET ... WHERE id = ?", data, user_id)
    
    # 2. Invalidate cache (NOT update!)
    redis.delete(f"user:{user_id}")

Pros & Cons:

✅ Pros❌ Cons
Simple to implementCache miss = slow (DB round-trip)
Only cache what's neededApplication handles cache logic
Resilient to cache failureRisk of stale data

2.2 Write-Through (An toàn dữ liệu)

┌─────────────────────────────────────────────────────────┐
│                 WRITE-THROUGH PATTERN                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  WRITE PATH (Synchronous):                              │
│  ─────────────────────────                              │
│  1. Write to Cache                                      │
│  2. Cache writes to Database (synchronously)            │
│  3. Return success only after BOTH complete             │
│                                                         │
│  [App] ──write──► [Cache] ──sync──► [Database]          │
│                      │                  │               │
│                      └───── ACK ────────┘               │
│                                                         │
│  READ: Always from cache (guaranteed fresh!)            │
│                                                         │
└─────────────────────────────────────────────────────────┘

Pros & Cons:

✅ Pros❌ Cons
Data always consistentWrite latency = Cache + DB
Simple read logicWrites are slow
No stale dataCache may store rarely-read data

2.3 Write-Back/Write-Behind (Nhanh nhưng Rủi ro)

┌─────────────────────────────────────────────────────────┐
│                 WRITE-BACK PATTERN                      │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  WRITE PATH (Asynchronous):                             │
│  ──────────────────────────                             │
│  1. Write to Cache only → Return immediately            │
│  2. Cache asynchronously flushes to DB (batched)        │
│                                                         │
│  [App] ──write──► [Cache] ──ACK──►                      │
│                      │                                  │
│                      │ (async, batched)                 │
│                      ▼                                  │
│                 [Database]                              │
│                                                         │
│  ⚠️ RISK: Cache crash = DATA LOSS!                     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Pros & Cons:

✅ Pros❌ Cons
Ultra-fast writesData loss risk on cache failure
Batching reduces DB loadComplex consistency management
Great for write-heavyHard to debug

CAUTION

🔧 Kỹ sư Raizo: Write-back chỉ dùng khi bạn CHẤP NHẬN mất dữ liệu. Ví dụ: Page view counters, analytics. KHÔNG BAO GIỜ dùng cho financial transactions!


3. The Nightmares (Cache Disasters)

3.1 Thundering Herd Problem

┌─────────────────────────────────────────────────────────┐
│              THUNDERING HERD DISASTER                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  SCENARIO: Hot key expires at T=0                       │
│                                                         │
│  T=0ms:   Cache entry "homepage:v1" EXPIRES             │
│                                                         │
│  T=1ms:   1000 requests hit simultaneously              │
│           All get CACHE MISS                            │
│           All query database...                         │
│                                                         │
│  [Req 1] ──MISS──┐                                      │
│  [Req 2] ──MISS──┤                                      │
│  [Req 3] ──MISS──┼────────► [DATABASE] 💥 OVERLOAD!     │
│  ...             │         (1000 identical queries)     │
│  [Req 1000]──MISS┘                                      │
│                                                         │
│  RESULT: Database CPU 100%, response time 10x,          │
│          possible cascade failure                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Solutions:

python
# Solution 1: Distributed Lock (Mutex)
def get_with_lock(key: str) -> Data:
    data = cache.get(key)
    if data:
        return data
    
    # Only ONE request fetches from DB
    lock_key = f"lock:{key}"
    if cache.setnx(lock_key, "1", ttl=5):  # Acquired lock
        try:
            data = database.query(key)
            cache.set(key, data, ttl=3600)
            return data
        finally:
            cache.delete(lock_key)
    else:
        # Wait for the winner to populate cache
        time.sleep(0.1)
        return get_with_lock(key)  # Retry

# Solution 2: Early Expiration / Probabilistic Refresh
def get_with_probabilistic_refresh(key: str, ttl: int = 3600) -> Data:
    data, remaining_ttl = cache.get_with_ttl(key)
    
    if data:
        # Refresh early with probability based on remaining TTL
        # When TTL is low, refresh probability increases
        if remaining_ttl < ttl * 0.1:  # Last 10% of TTL
            if random.random() < 0.2:  # 20% chance
                refresh_async(key)  # Background refresh
        return data
    
    # Normal cache miss handling
    return fetch_and_cache(key)

# Solution 3: Request Coalescing (singleflight pattern)
from singleflight import Group

group = Group()

def get_with_coalescing(key: str) -> Data:
    data = cache.get(key)
    if data:
        return data
    
    # All concurrent requests for same key share ONE db call
    return group.do(key, lambda: fetch_and_cache(key))

3.2 Cache Penetration

┌─────────────────────────────────────────────────────────┐
│                 CACHE PENETRATION                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ATTACK: Query keys that DON'T EXIST                    │
│                                                         │
│  [Attacker] ──"user:999999999"──► [Cache] ──MISS        │
│             ──"user:888888888"──►         ──MISS        │
│             ──"user:777777777"──►         ──MISS        │
│                                              │          │
│                                              ▼          │
│                                         [Database]      │
│                                         (finds nothing) │
│                                         (but still      │
│                                          queries!)      │
│                                                         │
│  Cache never stores misses → Every request hits DB!     │
│                                                         │
└─────────────────────────────────────────────────────────┘

Solutions:

python
# Solution 1: Cache Negative Results
def get_user(user_id: str) -> Optional[User]:
    cached = cache.get(f"user:{user_id}")
    
    if cached == "NULL_MARKER":  # Cached "not found"
        return None
    if cached:
        return deserialize(cached)
    
    user = db.query(user_id)
    
    if user:
        cache.set(f"user:{user_id}", serialize(user), ttl=3600)
    else:
        # Cache the "not found" with shorter TTL
        cache.set(f"user:{user_id}", "NULL_MARKER", ttl=300)
    
    return user

# Solution 2: Bloom Filter (Check existence before query)
from pybloom_live import BloomFilter

# Populate filter with all valid keys (at startup or incrementally)
valid_users = BloomFilter(capacity=10_000_000, error_rate=0.01)
for user_id in db.get_all_user_ids():
    valid_users.add(user_id)

def get_user(user_id: str) -> Optional[User]:
    # Quick check: if not in Bloom Filter, definitely doesn't exist
    if user_id not in valid_users:
        return None  # No DB query!
    
    # Bloom Filter says "maybe exists" → normal flow
    return normal_cache_aside_get(user_id)

3.3 Cache Avalanche (Snowball Effect)

┌─────────────────────────────────────────────────────────┐
│                  CACHE AVALANCHE                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  SCENARIO: All cache entries expire at SAME TIME        │
│  (e.g., cache restart, or all keys with same TTL)       │
│                                                         │
│  T=0:  [K1][K2][K3][K4]...[K1000] ALL EXPIRE!           │
│           │                                             │
│           ▼                                             │
│  T=1:  100,000 requests → ALL CACHE MISS                │
│           │                                             │
│           ▼                                             │
│       [DATABASE] 💥💥💥 COMPLETE MELTDOWN                │
│                                                         │
└─────────────────────────────────────────────────────────┘

Solutions:

python
# Solution 1: Randomize TTL
import random

def cache_with_jitter(key: str, data: Any, base_ttl: int = 3600):
    # Add random jitter: TTL between 3000-4200 seconds
    jittered_ttl = base_ttl + random.randint(-600, 600)
    cache.set(key, data, ttl=jittered_ttl)

# Solution 2: Multi-tier Fallback
def get_with_fallback(key: str) -> Data:
    # L1: Local in-memory cache (fastest)
    if local_cache.exists(key):
        return local_cache.get(key)
    
    # L2: Distributed cache (Redis)
    if redis.exists(key):
        data = redis.get(key)
        local_cache.set(key, data, ttl=60)  # Short TTL
        return data
    
    # L3: Database
    data = database.query(key)
    redis.set(key, data, ttl=3600)
    local_cache.set(key, data, ttl=60)
    return data

# Solution 3: Rate limiting on cache misses
@rate_limit(max_calls=100, period=1)  # 100 DB calls/second max
def fetch_from_database(key: str) -> Data:
    return database.query(key)

4. Eviction Policies

PolicyDescriptionUse Case
LRUEvict Least Recently UsedGeneral purpose, most common
LFUEvict Least Frequently UsedWhen access frequency matters
FIFOFirst In, First OutSimple streaming data
TTLTime-based expirationFreshness requirements
RandomRandom evictionWhen LRU overhead too high

5. Decision Matrix

ScenarioRecommended Pattern
Read-heavy, consistency OKCache-Aside
Strong consistency requiredWrite-Through
Write-heavy, loss OKWrite-Back
Hot keys problemDistributed Lock + Coalescing
Invalid key attacksBloom Filter

6. Tiếp theo

Caching giảm tải database, nhưng khi data quá lớn cho một server? Đó là lúc cần Database Scaling.

👉 Database Design →