Skip to content

💬 DESIGN WHATSAPP

Real-time Messaging & End-to-End Encryption at Scale

🎓 Giáo sư Tom

WhatsApp là bài toán kinh điển về Real-time Communication - làm sao để gửi tin nhắn đến người nhận trong vài trăm milliseconds, đảm bảo delivery guarantees (sent/delivered/read), và implement End-to-End Encryption để ngay cả server cũng không đọc được nội dung?

📊 Back-of-Envelope Calculations

Scale Assumptions

MetricValueRationale
Monthly Active Users (MAU)2BGlobal messaging platform
Daily Active Users (DAU)1.5B~75% of MAU
Messages per user per day50Text, images, voice notes
Average message size100 bytesText messages (encrypted)
Media messages ratio20%Images, videos, voice notes
Average media size300 KBCompressed images/voice
Group size (average)10 membersFamily/friend groups
Concurrent connections500MPeak online users

Message Throughput Calculations

Daily Messages:
Total messages = 1.5B DAU × 50 messages/user = 75B messages/day

Average Message QPS = 75B / 86,400 seconds = ~870,000 QPS
Peak Message QPS = 870,000 × 3 = ~2.6M QPS

Group Message Fan-out:
Assume 30% messages are group messages
Group messages = 75B × 0.3 = 22.5B group messages/day
Average group size = 10 members
Total deliveries = 22.5B × 10 = 225B deliveries/day

Combined QPS (with group fan-out):
Total deliveries = 75B × 0.7 + 225B = 277.5B/day
Delivery QPS = 277.5B / 86,400 = ~3.2M QPS
Calculation Breakdown
  • 86,400 = seconds in a day (24 × 60 × 60)
  • Peak multiplier = 3x = industry standard for messaging apps
  • Group messages amplify delivery count significantly
  • Real-time delivery requires sub-second latency

Connection Handling Capacity

Concurrent WebSocket Connections:
Peak concurrent users = 500M connections

Connection per server (optimized):
Each server handles ~500K connections (with epoll/kqueue)
Required connection servers = 500M / 500K = 1,000 servers

Connection memory overhead:
Per connection = ~10 KB (buffers, state, encryption context)
Total memory = 500M × 10 KB = 5 TB RAM across cluster

Heartbeat traffic:
Heartbeat interval = 30 seconds
Heartbeat QPS = 500M / 30 = ~17M heartbeats/second

Storage Calculations

Message Storage:
Daily text messages = 75B × 0.8 × 100 bytes = 6 TB/day
Daily media metadata = 75B × 0.2 × 1 KB = 15 TB/day
Monthly message storage = 6 TB × 30 = 180 TB/month

Media Storage:
Daily media uploads = 75B × 0.2 = 15B media files
Daily media storage = 15B × 300 KB = 4.5 PB/day
Monthly media storage = 4.5 PB × 30 = 135 PB/month

With 3x replication:
Message storage = 180 TB × 3 = 540 TB/month
Media storage = 135 PB × 3 = 405 PB/month

Key Storage (E2EE):
Pre-keys per user = 100 one-time keys × 32 bytes = 3.2 KB
Total key storage = 2B users × 3.2 KB = 6.4 TB

Bandwidth Calculations

Inbound Bandwidth (Message sends):
Peak inbound = 2.6M QPS × 100 bytes = 260 MB/s = 2.1 Gbps

Outbound Bandwidth (Message delivery):
Peak outbound = 3.2M QPS × 100 bytes = 320 MB/s = 2.6 Gbps

Media Bandwidth:
Media upload = 15B/day × 300 KB / 86,400 = 52 GB/s = 416 Gbps
Media download (assume 2x views) = 104 GB/s = 832 Gbps

Total Peak Bandwidth = ~1.3 Tbps

🔧 Raizo's Note

Connection count là bottleneck chính của messaging systems, không phải throughput. Mỗi user giữ 1 persistent connection, và với 500M concurrent users, bạn cần infrastructure để handle millions of long-lived connections. Đây là lý do WhatsApp dùng Erlang - ngôn ngữ được thiết kế cho telecom với millions of concurrent processes.

🏗️ High-Level Architecture

Component Responsibilities

ComponentResponsibilityTechnology
Connection ServersMaintain persistent WebSocket connections, handle heartbeatsErlang/Elixir, Go
Message RouterRoute messages to correct connection serverRedis (user→server mapping)
Presence ServiceTrack online/offline status, last seenRedis with pub/sub
Message ServiceValidate, store, and route messagesGo/Java microservice
Group ServiceHandle group membership, fan-out group messagesGo microservice
Media ServiceHandle media upload/download, generate thumbnailsGo + FFmpeg
Key ServerStore and distribute public keys for E2EEDedicated secure service
Message DBPersistent message storageCassandra (write-optimized)
Session StoreUser sessions, connection mappingRedis Cluster
Offline QueueStore messages for offline usersKafka + Cassandra

🔧 Raizo's Note

Tại sao WhatsApp dùng Erlang?

Erlang được thiết kế cho telecom systems với yêu cầu:

  • Millions of concurrent processes (lightweight, ~2KB per process)
  • Soft real-time guarantees
  • Hot code reloading (update without downtime)
  • Fault tolerance (let it crash philosophy)

WhatsApp từng handle 2M connections per server với Erlang. Đây là lý do họ có thể scale với team rất nhỏ (50 engineers cho 900M users năm 2014).

🔄 Core Flows

Flow 1: Message Sending (User A to User B)

Flow 2: Message to Offline User

Flow 3: Message Delivery Status (Read Receipts)

🎓 Giáo sư Tom

Message Ordering Guarantees:

WhatsApp đảm bảo causal ordering trong mỗi conversation:

  • Messages từ A→B luôn arrive theo thứ tự gửi
  • Dùng vector clocks hoặc Lamport timestamps để track ordering
  • Mỗi message có sequence_number per conversation
msg_id = {sender_id}_{conversation_id}_{sequence_number}

Nếu message N+1 arrive trước message N, client sẽ buffer và chờ N.

🔧 Raizo's Note

Tại sao cần 3 trạng thái (Sent/Delivered/Read)?

  • SENT (✓): Server đã nhận và lưu message. Sender biết message không bị mất.
  • DELIVERED (✓✓): Recipient's device đã nhận message. Không có nghĩa là đã đọc.
  • READ (✓✓ blue): Recipient đã mở conversation và xem message.

Mỗi trạng thái cần explicit ACK từ next hop. Không có ACK = retry với exponential backoff.

💡 Deep Dive: Protocol Choice

The Core Problem

Messaging apps cần bidirectional, real-time communication. HTTP request-response model không phù hợp vì:

  • Server không thể push messages đến client
  • Mỗi request tạo new connection (overhead)
  • High latency cho real-time use cases

Protocol Comparison

ProtocolMechanismLatencyBatteryComplexityUse Case
HTTP PollingClient polls every N secondsHigh (N seconds)PoorLowLegacy systems
HTTP Long PollingServer holds request until dataMediumMediumMediumFallback option
WebSocketFull-duplex persistent connectionLow (~50ms)GoodMediumReal-time apps
MQTTPub/sub over TCP, QoS levelsLowExcellentMediumIoT, mobile
XMPPXML-based messaging protocolMediumMediumHighEnterprise chat

Option 1: HTTP Long Polling

┌─────────────────────────────────────────────────────────────┐
│                  HTTP LONG POLLING                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Client                              Server                 │
│    │                                    │                   │
│    │──── GET /messages ────────────────►│                   │
│    │                                    │ (hold request)    │
│    │                                    │                   │
│    │                    ... wait ...    │                   │
│    │                                    │                   │
│    │◄─── Response (new messages) ───────│                   │
│    │                                    │                   │
│    │──── GET /messages ────────────────►│ (immediately)     │
│    │                                    │                   │
│                                                             │
│  Pros:                                                      │
│  • Works through firewalls/proxies                         │
│  • Simple to implement                                     │
│  • HTTP infrastructure (caching, load balancing)           │
│                                                             │
│  Cons:                                                      │
│  • Connection overhead on each response                    │
│  • Server holds many open connections                      │
│  • Not truly real-time                                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                     WEBSOCKET                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Client                              Server                 │
│    │                                    │                   │
│    │──── HTTP Upgrade Request ─────────►│                   │
│    │◄─── 101 Switching Protocols ───────│                   │
│    │                                    │                   │
│    │◄════ Persistent TCP Connection ════►│                   │
│    │                                    │                   │
│    │──── Send message ─────────────────►│                   │
│    │◄─── Push notification ─────────────│                   │
│    │◄─── Push message ──────────────────│                   │
│    │──── Send ACK ─────────────────────►│                   │
│    │                                    │                   │
│                                                             │
│  Pros:                                                      │
│  • True bidirectional communication                        │
│  • Low latency (~50ms)                                     │
│  • Efficient (no HTTP headers per message)                 │
│  • Native browser support                                  │
│                                                             │
│  Cons:                                                      │
│  • Stateful connections (harder to load balance)           │
│  • Need sticky sessions or connection routing              │
│  • Some proxies may not support                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Option 3: MQTT (Mobile Optimized)

┌─────────────────────────────────────────────────────────────┐
│                       MQTT                                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Features:                                                  │
│  • Designed for constrained devices (IoT)                  │
│  • Minimal packet overhead (2 bytes header)                │
│  • Built-in QoS levels:                                    │
│    - QoS 0: At most once (fire and forget)                │
│    - QoS 1: At least once (with ACK)                      │
│    - QoS 2: Exactly once (4-way handshake)                │
│  • Last Will and Testament (offline detection)             │
│  • Retained messages (get last state on connect)           │
│                                                             │
│  Pros:                                                      │
│  • Extremely battery efficient                             │
│  • Works well on unreliable networks                       │
│  • Built-in delivery guarantees                            │
│                                                             │
│  Cons:                                                      │
│  • Pub/sub model (not request/response)                    │
│  • Less browser support (need library)                     │
│  • Topic-based routing adds complexity                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

WhatsApp's Choice: Custom Protocol over TCP

🎓 Giáo sư Tom

WhatsApp sử dụng custom binary protocol dựa trên XMPP (Extensible Messaging and Presence Protocol), được tối ưu hóa cho mobile:

  1. Binary encoding thay vì XML (giảm bandwidth 80%)
  2. Noise Protocol cho encryption handshake
  3. Custom framing cho message boundaries
  4. Compression với zlib/gzip

Protocol stack:

┌─────────────────────────┐
│   Application Layer     │  ← WhatsApp messages
├─────────────────────────┤
│   Noise Protocol        │  ← Encryption
├─────────────────────────┤
│   Custom Framing        │  ← Message boundaries
├─────────────────────────┤
│   TCP                   │  ← Reliable transport
└─────────────────────────┘

Connection Management at Scale

python
# Connection Server Architecture (Simplified)
class ConnectionServer:
    def __init__(self):
        self.connections = {}  # user_id -> WebSocket
        self.redis = RedisCluster()
        
    async def handle_connect(self, user_id: str, websocket: WebSocket):
        # 1. Authenticate user
        if not await self.authenticate(websocket):
            await websocket.close()
            return
            
        # 2. Register connection in Redis
        server_id = os.environ['SERVER_ID']
        await self.redis.hset(
            f"user_connections:{user_id}",
            "server", server_id,
            "connected_at", time.time()
        )
        
        # 3. Store local connection
        self.connections[user_id] = websocket
        
        # 4. Fetch and deliver offline messages
        offline_msgs = await self.fetch_offline_messages(user_id)
        for msg in offline_msgs:
            await websocket.send(msg)
            
        # 5. Update presence
        await self.redis.publish("presence", f"{user_id}:online")
        
    async def handle_disconnect(self, user_id: str):
        # 1. Remove from local connections
        del self.connections[user_id]
        
        # 2. Update Redis (with TTL for reconnection grace period)
        await self.redis.hset(
            f"user_connections:{user_id}",
            "status", "disconnected",
            "disconnected_at", time.time()
        )
        await self.redis.expire(f"user_connections:{user_id}", 30)
        
        # 3. Update presence after grace period
        await asyncio.sleep(30)
        if user_id not in self.connections:
            await self.redis.publish("presence", f"{user_id}:offline")
            
    async def route_message(self, to_user: str, message: bytes):
        # 1. Check if user is connected to this server
        if to_user in self.connections:
            await self.connections[to_user].send(message)
            return True
            
        # 2. Check if user is connected to another server
        conn_info = await self.redis.hgetall(f"user_connections:{to_user}")
        if conn_info and conn_info['status'] == 'connected':
            target_server = conn_info['server']
            await self.redis.publish(f"server:{target_server}", message)
            return True
            
        # 3. User is offline - queue message
        await self.queue_offline_message(to_user, message)
        return False

🔧 Raizo's Note

Connection Server Challenges:

  1. Sticky Sessions: User phải connect lại cùng server sau reconnect (hoặc migrate state)
  2. Graceful Shutdown: Khi deploy, cần drain connections slowly
  3. Health Checks: L4 load balancer cần custom health checks cho WebSocket
  4. Memory Pressure: 500K connections × 10KB = 5GB RAM per server

Solution: Dùng consistent hashing để route users to servers. Khi server fails, chỉ 1/N users bị affect.

📬 Message Delivery Status

State Machine

┌─────────────────────────────────────────────────────────────┐
│              MESSAGE DELIVERY STATE MACHINE                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    ┌─────────────┐                          │
│                    │   PENDING   │                          │
│                    │  (Client)   │                          │
│                    └──────┬──────┘                          │
│                           │ Send to server                  │
│                           ▼                                 │
│                    ┌─────────────┐                          │
│                    │    SENT     │  ← Server ACK            │
│                    │     ✓       │                          │
│                    └──────┬──────┘                          │
│                           │ Delivered to recipient device   │
│                           ▼                                 │
│                    ┌─────────────┐                          │
│                    │  DELIVERED  │  ← Recipient device ACK  │
│                    │     ✓✓      │                          │
│                    └──────┬──────┘                          │
│                           │ Recipient opens chat            │
│                           ▼                                 │
│                    ┌─────────────┐                          │
│                    │    READ     │  ← Recipient app ACK     │
│                    │   ✓✓ 🔵     │                          │
│                    └─────────────┘                          │
│                                                             │
│  Failure Paths:                                             │
│  • PENDING → FAILED (network error, retry exhausted)       │
│  • SENT → EXPIRED (recipient never comes online)           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

ACK Protocol Implementation

python
# Message ACK Types
class AckType(Enum):
    SENT = 1        # Server received and stored
    DELIVERED = 2   # Recipient device received
    READ = 3        # Recipient viewed message
    FAILED = 4      # Delivery failed

# ACK Message Format
@dataclass
class MessageAck:
    msg_id: str
    ack_type: AckType
    timestamp: int
    
# Sender-side handling
class MessageSender:
    def __init__(self):
        self.pending_acks = {}  # msg_id -> (message, retry_count, timer)
        
    async def send_message(self, message: Message) -> str:
        msg_id = generate_uuid()
        message.id = msg_id
        message.status = MessageStatus.PENDING
        
        # Store locally
        await self.local_db.save(message)
        
        # Send to server
        await self.connection.send(message.serialize())
        
        # Start ACK timer
        self.pending_acks[msg_id] = (message, 0, self.start_ack_timer(msg_id))
        
        return msg_id
        
    def start_ack_timer(self, msg_id: str, timeout: int = 5000):
        async def on_timeout():
            if msg_id in self.pending_acks:
                message, retry_count, _ = self.pending_acks[msg_id]
                if retry_count < MAX_RETRIES:
                    # Exponential backoff retry
                    await asyncio.sleep(2 ** retry_count)
                    await self.connection.send(message.serialize())
                    self.pending_acks[msg_id] = (
                        message, 
                        retry_count + 1, 
                        self.start_ack_timer(msg_id, timeout * 2)
                    )
                else:
                    # Mark as failed
                    message.status = MessageStatus.FAILED
                    await self.local_db.update(message)
                    del self.pending_acks[msg_id]
                    
        return asyncio.create_task(asyncio.sleep(timeout / 1000), on_timeout())
        
    async def handle_ack(self, ack: MessageAck):
        if ack.msg_id not in self.pending_acks:
            return  # Already processed or unknown
            
        message, _, timer = self.pending_acks[ack.msg_id]
        timer.cancel()
        
        # Update message status
        if ack.ack_type == AckType.SENT:
            message.status = MessageStatus.SENT
            message.sent_at = ack.timestamp
        elif ack.ack_type == AckType.DELIVERED:
            message.status = MessageStatus.DELIVERED
            message.delivered_at = ack.timestamp
            del self.pending_acks[ack.msg_id]  # No more retries needed
        elif ack.ack_type == AckType.READ:
            message.status = MessageStatus.READ
            message.read_at = ack.timestamp
            
        await self.local_db.update(message)
        await self.ui.update_message_status(message)

Retry Logic with Exponential Backoff

┌─────────────────────────────────────────────────────────────┐
│              RETRY WITH EXPONENTIAL BACKOFF                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Attempt 1: Send message                                    │
│      │                                                      │
│      ├── ACK received within 5s → Success ✓                │
│      │                                                      │
│      └── No ACK → Wait 1s, Retry                           │
│                                                             │
│  Attempt 2: Retry                                           │
│      │                                                      │
│      ├── ACK received within 10s → Success ✓               │
│      │                                                      │
│      └── No ACK → Wait 2s, Retry                           │
│                                                             │
│  Attempt 3: Retry                                           │
│      │                                                      │
│      ├── ACK received within 20s → Success ✓               │
│      │                                                      │
│      └── No ACK → Wait 4s, Retry                           │
│                                                             │
│  Attempt 4: Retry                                           │
│      │                                                      │
│      ├── ACK received within 40s → Success ✓               │
│      │                                                      │
│      └── No ACK → Mark as FAILED, Queue for later          │
│                                                             │
│  Backoff formula: wait_time = min(2^attempt, MAX_WAIT)     │
│  With jitter: wait_time = wait_time * (0.5 + random(0.5))  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Handling Edge Cases

ScenarioProblemSolution
Duplicate ACKsNetwork retry sends ACK twiceIdempotent ACK handling (check if already processed)
Out-of-order ACKsDELIVERED arrives before SENTState machine only allows forward transitions
Recipient offlineCan't deliver immediatelyQueue in offline storage, deliver on reconnect
Sender offlineCan't receive ACKStore ACK server-side, deliver on sender reconnect
Both offlineMessage in limboServer stores message, delivers when either comes online
Group messagesMultiple recipientsTrack delivery status per recipient

🔧 Raizo's Note

Read Receipts Privacy:

WhatsApp cho phép users tắt read receipts. Khi tắt:

  • User không gửi READ ACK cho người khác
  • User cũng không nhận READ ACK từ người khác (fair trade-off)
  • DELIVERED ACK vẫn hoạt động bình thường

Implementation: Check user's privacy settings trước khi gửi/nhận READ ACK.

🔐 Security Deep Dive: End-to-End Encryption

Why E2EE Matters

┌─────────────────────────────────────────────────────────────┐
│           WITHOUT E2EE (Server-side encryption)              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Alice ──encrypt──► Server ──decrypt/re-encrypt──► Bob     │
│                        │                                    │
│                        ▼                                    │
│                   Server can read                           │
│                   all messages!                             │
│                                                             │
│  Risks:                                                     │
│  • Server breach exposes all messages                      │
│  • Government subpoena can access content                  │
│  • Malicious insider can read messages                     │
│  • Man-in-the-middle at server level                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              WITH E2EE (End-to-End encryption)               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Alice ──────────── encrypted blob ────────────► Bob       │
│           │              │              │                   │
│           ▼              ▼              ▼                   │
│        Encrypt        Server         Decrypt                │
│        (Alice's      (can only      (Bob's                 │
│         device)      see blob)       device)               │
│                                                             │
│  Benefits:                                                  │
│  • Only sender and recipient can read                      │
│  • Server breach reveals nothing                           │
│  • Government can't compel decryption                      │
│  • True privacy guarantee                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Signal Protocol Overview

WhatsApp sử dụng Signal Protocol (trước đây gọi là Axolotl), được phát triển bởi Open Whisper Systems. Protocol này kết hợp:

  1. X3DH (Extended Triple Diffie-Hellman) - Key exchange
  2. Double Ratchet - Per-message key derivation
  3. AES-256 - Symmetric encryption
  4. HMAC-SHA256 - Message authentication

X3DH Key Exchange (Simplified)

┌─────────────────────────────────────────────────────────────┐
│           X3DH - INITIAL KEY EXCHANGE                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Bob registers (uploads to Key Server):                     │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Identity Key (IK_B)     - Long-term, identifies Bob │   │
│  │  Signed Pre-Key (SPK_B)  - Medium-term, signed by IK │   │
│  │  One-Time Pre-Keys (OPK) - Single-use keys (100+)    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Alice wants to message Bob (first time):                   │
│                                                             │
│  1. Alice fetches Bob's keys from server:                   │
│     IK_B, SPK_B, OPK_B (one of the one-time keys)          │
│                                                             │
│  2. Alice generates ephemeral key pair: EK_A               │
│                                                             │
│  3. Alice computes 4 Diffie-Hellman shared secrets:        │
│     DH1 = DH(IK_A, SPK_B)   - Alice's identity, Bob's SPK  │
│     DH2 = DH(EK_A, IK_B)    - Alice's ephemeral, Bob's ID  │
│     DH3 = DH(EK_A, SPK_B)   - Alice's ephemeral, Bob's SPK │
│     DH4 = DH(EK_A, OPK_B)   - Alice's ephemeral, Bob's OPK │
│                                                             │
│  4. Master Secret = KDF(DH1 || DH2 || DH3 || DH4)          │
│                                                             │
│  5. Alice sends to Bob:                                     │
│     - Her Identity Key (IK_A)                              │
│     - Her Ephemeral Key (EK_A)                             │
│     - Which OPK she used                                   │
│     - Encrypted message                                    │
│                                                             │
│  6. Bob computes same DH values and derives same secret    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🎓 Giáo sư Tom

Tại sao cần 4 DH operations?

  • DH1 (IK_A, SPK_B): Proves Alice's identity to Bob
  • DH2 (EK_A, IK_B): Proves Bob's identity to Alice
  • DH3 (EK_A, SPK_B): Forward secrecy (ephemeral key)
  • DH4 (EK_A, OPK_B): One-time key prevents replay attacks

Nếu thiếu bất kỳ DH nào, protocol sẽ vulnerable với specific attacks.

Double Ratchet Algorithm

┌─────────────────────────────────────────────────────────────┐
│              DOUBLE RATCHET ALGORITHM                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Two "ratchets" working together:                           │
│                                                             │
│  1. SYMMETRIC RATCHET (KDF Chain)                          │
│  ─────────────────────────────────                          │
│     Chain Key → KDF → Message Key + New Chain Key          │
│                                                             │
│     CK_0 ──► CK_1 ──► CK_2 ──► CK_3 ──► ...               │
│       │        │        │        │                         │
│       ▼        ▼        ▼        ▼                         │
│     MK_0     MK_1     MK_2     MK_3                        │
│                                                             │
│     Each message uses unique key (MK_n)                    │
│     Old keys are deleted after use                         │
│                                                             │
│  2. DH RATCHET (Asymmetric)                                │
│  ──────────────────────────────                             │
│     Periodically exchange new DH keys                      │
│     Creates new root key → new chain keys                  │
│                                                             │
│     Alice: DH_A1 ──────────────────────► Bob receives      │
│     Bob:   DH_B1 ◄────────────────────── Bob sends         │
│     Alice: DH_A2 ──────────────────────► New DH exchange   │
│                                                             │
│  Combined Flow:                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Root Key ──DH──► Chain Key ──KDF──► Message Key    │   │
│  │      │                                    │          │   │
│  │      └──────── DH Ratchet ────────────────┘          │   │
│  │                     │                                │   │
│  │                     └── Symmetric Ratchet            │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Security Properties

PropertyDescriptionHow Achieved
ConfidentialityOnly sender/recipient can readAES-256 encryption with shared secret
IntegrityMessage can't be modifiedHMAC-SHA256 authentication
AuthenticationVerify sender identityIdentity keys + signatures
Forward SecrecyPast messages safe if key leakedEphemeral keys, key deletion
Break-in RecoveryFuture messages safe after compromiseDH ratchet creates new keys
DeniabilityCan't prove who sent messageNo digital signatures on messages

Key Server Design

python
# Key Server Schema
class UserKeys:
    user_id: str
    identity_key: bytes          # Long-term public key
    signed_pre_key: bytes        # Medium-term, rotated weekly
    signed_pre_key_signature: bytes
    one_time_pre_keys: List[bytes]  # Pool of single-use keys
    
# Key Server API
class KeyServer:
    async def register_keys(self, user_id: str, keys: UserKeys):
        """Called on app install and key rotation"""
        # Verify signature on signed_pre_key
        if not verify_signature(keys.identity_key, 
                               keys.signed_pre_key, 
                               keys.signed_pre_key_signature):
            raise InvalidSignature()
            
        await self.db.store_keys(user_id, keys)
        
    async def get_keys(self, user_id: str) -> UserKeys:
        """Called when starting new conversation"""
        keys = await self.db.get_keys(user_id)
        
        # Pop one OPK (single use)
        opk = keys.one_time_pre_keys.pop(0)
        await self.db.update_keys(user_id, keys)
        
        # Alert user if OPK pool is low
        if len(keys.one_time_pre_keys) < 10:
            await self.notify_replenish_keys(user_id)
            
        return UserKeys(
            identity_key=keys.identity_key,
            signed_pre_key=keys.signed_pre_key,
            one_time_pre_key=opk
        )
        
    async def replenish_opks(self, user_id: str, new_opks: List[bytes]):
        """Client uploads more one-time pre-keys"""
        keys = await self.db.get_keys(user_id)
        keys.one_time_pre_keys.extend(new_opks)
        await self.db.update_keys(user_id, keys)

🔧 Raizo's Note

Key Server Security Considerations:

  1. Key Server không biết private keys - chỉ store public keys
  2. OPK exhaustion attack: Attacker có thể drain tất cả OPKs. Mitigation: Rate limiting, require auth
  3. Key transparency: Làm sao user biết server không swap keys? WhatsApp có "Security Code" verification
  4. Multi-device: Mỗi device có key pair riêng, messages encrypted cho từng device

Production tip: Key Server cần highest security level - separate network, HSM for signing, audit logs.

Group Encryption

┌─────────────────────────────────────────────────────────────┐
│              GROUP MESSAGE ENCRYPTION                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Option 1: Pairwise Encryption (Simple but expensive)       │
│  ─────────────────────────────────────────────────────      │
│  Sender encrypts message N times (once per member)          │
│  • 100 members = 100 encryptions                           │
│  • Bandwidth: O(N) per message                             │
│  • CPU: O(N) per message                                   │
│                                                             │
│  Option 2: Sender Keys (WhatsApp's approach)               │
│  ─────────────────────────────────────────────────────      │
│  1. Sender generates "Sender Key" for group                │
│  2. Sender Key distributed to all members (pairwise E2EE)  │
│  3. Messages encrypted once with Sender Key                │
│                                                             │
│  Sender ──► Encrypt with Sender Key ──► All members        │
│                                                             │
│  • Bandwidth: O(1) per message                             │
│  • CPU: O(1) per message                                   │
│  • Trade-off: Member removal requires new Sender Key       │
│                                                             │
│  Member Removal Flow:                                       │
│  1. Generate new Sender Key                                │
│  2. Distribute to remaining members (pairwise)             │
│  3. Old messages still readable by removed member          │
│     (forward secrecy, not backward secrecy for groups)     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

💾 Chat History Storage

Why Cassandra/HBase for Messaging?

RequirementSQL (PostgreSQL)Wide-Column (Cassandra)Winner
Write throughput~10K writes/sec per node~100K writes/sec per nodeCassandra
Write latency~5-10ms (with indexes)~1-2msCassandra
Horizontal scalingComplex (sharding)Native (add nodes)Cassandra
Time-series queriesRequires indexesNative (clustering key)Cassandra
Schema flexibilityRigidFlexibleCassandra
TransactionsACIDEventually consistentPostgreSQL
Complex queriesFull SQLLimited (CQL)PostgreSQL

🎓 Giáo sư Tom

Messaging workload characteristics:

  1. Write-heavy: 75B messages/day = 870K writes/sec
  2. Time-series access: "Get messages in conversation X after timestamp Y"
  3. No complex joins: Messages are self-contained
  4. Eventual consistency OK: Slight delay in sync across devices acceptable

Cassandra được thiết kế chính xác cho workload này. Facebook Messenger cũng dùng HBase (similar wide-column store).

Cassandra Schema Design

sql
-- Messages table
-- Partition Key: conversation_id (all messages in a conversation on same partition)
-- Clustering Key: message_id (TimeUUID for time-ordering)
CREATE TABLE messages (
    conversation_id UUID,
    message_id TIMEUUID,
    sender_id BIGINT,
    encrypted_content BLOB,      -- E2EE encrypted payload
    content_type TEXT,           -- 'text', 'image', 'video', 'voice', 'document'
    media_url TEXT,              -- S3/CDN URL for media
    thumbnail BLOB,              -- Small preview for images/videos
    reply_to_id TIMEUUID,        -- For reply threads
    forwarded_from UUID,         -- Original message if forwarded
    status MAP<BIGINT, TEXT>,    -- {recipient_id: 'delivered'/'read'}
    created_at TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy', 
                    'compaction_window_size': 1, 
                    'compaction_window_unit': 'DAYS'};

-- Conversations table (per user)
-- Partition Key: user_id
-- Clustering Key: updated_at DESC (most recent first)
CREATE TABLE conversations (
    user_id BIGINT,
    conversation_id UUID,
    conversation_type TEXT,      -- 'direct', 'group'
    participant_ids SET<BIGINT>,
    group_name TEXT,
    group_avatar_url TEXT,
    last_message_preview TEXT,   -- Encrypted preview
    last_message_at TIMESTAMP,
    unread_count INT,
    is_muted BOOLEAN,
    is_archived BOOLEAN,
    updated_at TIMESTAMP,
    PRIMARY KEY (user_id, updated_at, conversation_id)
) WITH CLUSTERING ORDER BY (updated_at DESC);

-- Group membership (for efficient member lookup)
CREATE TABLE group_members (
    conversation_id UUID,
    user_id BIGINT,
    role TEXT,                   -- 'admin', 'member'
    joined_at TIMESTAMP,
    added_by BIGINT,
    PRIMARY KEY (conversation_id, user_id)
);

-- User's groups (reverse lookup)
CREATE TABLE user_groups (
    user_id BIGINT,
    conversation_id UUID,
    group_name TEXT,
    joined_at TIMESTAMP,
    PRIMARY KEY (user_id, conversation_id)
);

Query Patterns

python
# Common query patterns optimized by schema design

class MessageRepository:
    
    async def get_messages(
        self, 
        conversation_id: UUID, 
        before: datetime = None,
        limit: int = 50
    ) -> List[Message]:
        """Get messages in a conversation (paginated)"""
        # Efficient: Single partition scan with clustering order
        query = """
            SELECT * FROM messages 
            WHERE conversation_id = ?
            AND message_id < ?
            ORDER BY message_id DESC
            LIMIT ?
        """
        before_timeuuid = datetime_to_timeuuid(before) if before else max_timeuuid()
        return await self.session.execute(query, [conversation_id, before_timeuuid, limit])
    
    async def get_conversations(
        self, 
        user_id: int, 
        limit: int = 20
    ) -> List[Conversation]:
        """Get user's recent conversations"""
        # Efficient: Single partition, ordered by updated_at
        query = """
            SELECT * FROM conversations
            WHERE user_id = ?
            ORDER BY updated_at DESC
            LIMIT ?
        """
        return await self.session.execute(query, [user_id, limit])
    
    async def save_message(self, message: Message):
        """Save new message"""
        # Write to messages table
        await self.session.execute("""
            INSERT INTO messages (conversation_id, message_id, sender_id, 
                                  encrypted_content, content_type, created_at)
            VALUES (?, ?, ?, ?, ?, ?)
        """, [message.conversation_id, message.id, message.sender_id,
              message.encrypted_content, message.content_type, message.created_at])
        
        # Update conversation for all participants (denormalized)
        for participant_id in message.participant_ids:
            await self.session.execute("""
                INSERT INTO conversations (user_id, conversation_id, 
                                          last_message_preview, last_message_at, updated_at)
                VALUES (?, ?, ?, ?, ?)
            """, [participant_id, message.conversation_id, 
                  message.preview, message.created_at, message.created_at])

Data Locality and Partitioning

┌─────────────────────────────────────────────────────────────┐
│              CASSANDRA PARTITION STRATEGY                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Partition Key: conversation_id                             │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Partition: conv_123                                 │   │
│  │  ┌─────────────────────────────────────────────┐    │   │
│  │  │ msg_001 │ msg_002 │ msg_003 │ ... │ msg_N  │    │   │
│  │  └─────────────────────────────────────────────┘    │   │
│  │  All messages in conversation stored together        │   │
│  │  Ordered by message_id (time-based)                 │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Benefits:                                                  │
│  • Single partition read for conversation history          │
│  • Sequential disk reads (fast)                            │
│  • Natural time-ordering                                   │
│                                                             │
│  Risks:                                                     │
│  • Hot partition for very active groups                    │
│  • Partition size limit (~100MB recommended)               │
│                                                             │
│  Mitigation for large groups:                              │
│  • Bucket by time: conv_123_2024_01, conv_123_2024_02     │
│  • Or bucket by message count: conv_123_bucket_1          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Media Storage Strategy

┌─────────────────────────────────────────────────────────────┐
│              MEDIA STORAGE ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Upload Flow:                                               │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐             │
│  │  Client  │───►│  Media   │───►│   S3     │             │
│  │          │    │  Service │    │  Bucket  │             │
│  └──────────┘    └────┬─────┘    └──────────┘             │
│                       │                                     │
│                       ▼                                     │
│                 ┌──────────┐                               │
│                 │ Generate │                               │
│                 │Thumbnail │                               │
│                 │ + Hash   │                               │
│                 └────┬─────┘                               │
│                      │                                      │
│                      ▼                                      │
│                ┌───────────┐                               │
│                │ Cassandra │  ← Store metadata only        │
│                │ (msg_id,  │    (URL, hash, size)          │
│                │  media_   │                               │
│                │  url)     │                               │
│                └───────────┘                               │
│                                                             │
│  Download Flow:                                             │
│  Client ──► CDN ──► S3 (if cache miss)                     │
│                                                             │
│  Storage Tiers:                                             │
│  • Hot (< 7 days): S3 Standard                             │
│  • Warm (7-30 days): S3 Infrequent Access                  │
│  • Cold (> 30 days): S3 Glacier                            │
│                                                             │
│  Encryption:                                                │
│  • Media encrypted client-side before upload               │
│  • Encryption key sent in message (E2EE)                   │
│  • Server stores encrypted blob, can't decrypt             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🔧 Raizo's Note

Message Retention và Deletion:

WhatsApp không store messages permanently trên server:

  1. Messages delivered → deleted from server (kept on devices)
  2. Offline messages → stored max 30 days, then deleted
  3. Media → stored until downloaded, then deleted

Compliance considerations:

  • GDPR: Right to deletion
  • Data minimization: Don't store what you don't need
  • Reduces storage costs significantly

Implementation: TTL (Time-To-Live) trên Cassandra rows, S3 lifecycle policies.

⚖️ Trade-offs Analysis

Architecture Decision Matrix

DecisionOption AOption BChosenRationale
ProtocolHTTP Long PollingWebSocketWebSocketLower latency, true bidirectional, battery efficient
Message StoragePostgreSQLCassandraCassandraWrite-heavy workload, time-series queries, horizontal scaling
EncryptionServer-sideEnd-to-End (E2EE)E2EEPrivacy guarantee, regulatory compliance, user trust
Media StorageInline in DBObject Storage (S3)S3 + CDNCost effective, CDN caching, separate scaling
Group MessagesPairwise encryptionSender KeysSender KeysO(1) encryption vs O(N), bandwidth efficient
PresencePoll-basedPub/SubRedis Pub/SubReal-time updates, efficient fan-out
Offline QueueDatabaseMessage QueueKafka + CassandraDurability + high throughput

Protocol Trade-offs Deep Dive

AspectHTTP Long PollingWebSocketMQTT
Latency100-500ms10-50ms10-50ms
BatteryPoor (reconnects)GoodExcellent
FirewallAlways worksUsually worksMay be blocked
ComplexityLowMediumMedium
Browser SupportUniversalModern browsersNeeds library
ScalabilityHard (many connections)MediumGood

Encryption Trade-offs

AspectServer-side EncryptionEnd-to-End Encryption
PrivacyServer can readOnly endpoints can read
SearchServer can indexNo server-side search
BackupServer can backupUser must backup keys
Multi-deviceEasy syncComplex key management
ComplianceCan comply with subpoenasCannot provide content
Spam DetectionCan analyze contentMust use metadata only

🚨 Failure Scenarios & Mitigations

Scenario 1: Connection Server Crash

┌─────────────────────────────────────────────────────────────┐
│           CONNECTION SERVER CRASH                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Impact:                                                    │
│  • 500K users disconnected instantly                       │
│  • Messages to those users delayed                         │
│  • Presence status stale                                   │
│                                                             │
│  Detection:                                                 │
│  • Health check failures (< 5 seconds)                     │
│  • Connection count drop alert                             │
│  • Client reconnection spike                               │
│                                                             │
│  Mitigation:                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ 1. Stateless connection servers                      │   │
│  │    - Session state in Redis, not local memory       │   │
│  │    - Any server can handle reconnection             │   │
│  │                                                      │   │
│  │ 2. Client auto-reconnect with exponential backoff   │   │
│  │    - Immediate retry, then 1s, 2s, 4s, 8s...       │   │
│  │    - Jitter to prevent thundering herd             │   │
│  │                                                      │   │
│  │ 3. Message queue for offline delivery               │   │
│  │    - Messages queued during disconnect             │   │
│  │    - Delivered on reconnect                        │   │
│  │                                                      │   │
│  │ 4. Graceful shutdown for deployments               │   │
│  │    - Drain connections over 30 seconds             │   │
│  │    - Send "reconnect to another server" signal     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Recovery Time: < 30 seconds (client reconnect)            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🔧 Raizo's Note

Thundering Herd Problem:

Khi 1 server crash, 500K clients sẽ reconnect cùng lúc. Nếu không có jitter, tất cả sẽ retry at exactly 1s, 2s, 4s... tạo traffic spikes.

Solution: Add random jitter

python
wait_time = base_delay * (2 ** attempt) * (0.5 + random.random() * 0.5)

Scenario 2: Message Queue Backlog

┌─────────────────────────────────────────────────────────────┐
│           MESSAGE QUEUE BACKLOG                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Impact:                                                    │
│  • Message delivery delays (seconds to minutes)            │
│  • Out-of-order messages possible                          │
│  • User experience degradation                             │
│                                                             │
│  Causes:                                                    │
│  • Traffic spike (viral event, New Year)                   │
│  • Consumer failures                                       │
│  • Downstream service slowdown                             │
│                                                             │
│  Detection:                                                 │
│  • Kafka consumer lag monitoring                           │
│  • Message age in queue > threshold                        │
│  • Delivery latency p99 spike                              │
│                                                             │
│  Mitigation:                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ 1. Per-conversation partitioning                     │   │
│  │    - Partition key = conversation_id                │   │
│  │    - Guarantees ordering within conversation        │   │
│  │                                                      │   │
│  │ 2. Priority queues                                   │   │
│  │    - Active conversations get priority              │   │
│  │    - Separate queue for real-time vs batch         │   │
│  │                                                      │   │
│  │ 3. Backpressure to senders                          │   │
│  │    - Slow down message acceptance                   │   │
│  │    - Return "server busy" to clients               │   │
│  │                                                      │   │
│  │ 4. Auto-scaling consumers                           │   │
│  │    - Scale based on lag metrics                    │   │
│  │    - Pre-scale for known events (New Year)         │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Scenario 3: Key Server Unavailable

┌─────────────────────────────────────────────────────────────┐
│           KEY SERVER UNAVAILABLE                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Impact:                                                    │
│  • New conversations cannot start (no key exchange)        │
│  • Existing conversations continue working                 │
│  • New device registration blocked                         │
│                                                             │
│  Detection:                                                 │
│  • Key fetch failures                                      │
│  • New conversation creation errors                        │
│  • Health check failures                                   │
│                                                             │
│  Mitigation:                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ 1. Pre-fetch and cache recipient public keys        │   │
│  │    - Cache keys of frequent contacts               │   │
│  │    - Refresh in background                         │   │
│  │                                                      │   │
│  │ 2. Multiple key server replicas                     │   │
│  │    - Active-active across regions                  │   │
│  │    - Consistent replication of key data            │   │
│  │                                                      │   │
│  │ 3. Graceful degradation                             │   │
│  │    - Queue key requests for retry                  │   │
│  │    - Show "connecting..." instead of error         │   │
│  │                                                      │   │
│  │ 4. Offline key bundles                              │   │
│  │    - Pre-generate extra one-time keys             │   │
│  │    - Survive longer key server outages            │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Note: Key server is CRITICAL infrastructure               │
│  - Highest availability requirements                       │
│  - Separate security perimeter                             │
│  - HSM for key signing                                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Scenario 4: Cassandra Node Failure

AspectDetails
ImpactTemporary unavailability for affected partitions
DetectionCassandra gossip protocol, health checks
MitigationReplication factor = 3, consistency level = QUORUM
RecoveryAutomatic with hinted handoff, repair
Data LossZero (with RF=3 and proper consistency)

🔧 Raizo's Note

Cassandra Consistency Levels:

  • ONE: Fast but risky (single replica)
  • QUORUM: Balanced (majority of replicas)
  • ALL: Slow but safest (all replicas)

WhatsApp likely uses:

  • Writes: QUORUM (durability)
  • Reads: ONE or LOCAL_ONE (speed, eventual consistency OK)

Formula: R + W > N ensures strong consistency

  • R=1, W=2, N=3 → 1+2=3 > 3? No, eventual consistency
  • R=2, W=2, N=3 → 2+2=4 > 3? Yes, strong consistency

💰 Cost Estimation

Monthly Infrastructure Costs (2B MAU Scale)

ServiceSpecificationUnit CostMonthly Cost
Connection Servers1,000 × c5.xlarge (4 vCPU, 8GB)$0.17/hr$124,000
Message Routing200 × c5.2xlarge (8 vCPU, 16GB)$0.34/hr$49,000
Message Service100 × c5.2xlarge (8 vCPU, 16GB)$0.34/hr$24,500
Group Service50 × c5.xlarge (4 vCPU, 8GB)$0.17/hr$6,100
Media Service100 × c5.xlarge (4 vCPU, 8GB)$0.17/hr$12,200
Session StoreRedis Cluster 1TB (6 shards, 3 replicas)$0.068/GB/hr$50,000
Message DBCassandra 500TB (50 nodes, RF=3)$0.10/GB/mo$50,000
User DBPostgreSQL 10TB (HA, read replicas)$0.115/GB/mo$5,000
Key Server20 × c5.xlarge + HSM$0.17/hr + HSM$10,000
Kafka Cluster30 brokers (m5.2xlarge)$0.384/hr$8,300
Media StorageS3 50PB (with lifecycle)$0.023/GB/mo$1,150,000
CDN100PB egress/month$0.02/GB (volume)$2,000,000
Push Notifications100B/month (FCM/APNs)$0.0001/notification$10,000
Load Balancers10 × NLB (L4)$0.025/hr + LCU$5,000

Cost Summary

CategoryMonthly Cost% of Total
Compute (Servers)$226,0006%
Caching (Redis)$50,0001%
Database (Cassandra + PostgreSQL)$55,0002%
Message Queue (Kafka)$8,300<1%
Media Storage (S3)$1,150,00033%
CDN & Bandwidth$2,000,00057%
Other (Push, LB, Key Server)$25,0001%
Total~$3,500,000100%

🎓 Giáo sư Tom

Cost Breakdown Insights:

  1. CDN + Storage = 90% of costs - Messaging apps are media-heavy
  2. Compute is cheap - Erlang efficiency means fewer servers
  3. Database is manageable - Messages deleted after delivery

WhatsApp's actual efficiency:

  • 2014: 50 engineers, 900M users
  • Cost per user: ~$0.002/month
  • Revenue (subscription): $1/year = $0.083/month
  • Gross margin: ~97%!

Cost Optimization Strategies

StrategySavingsImplementation
Reserved Instances30-40% compute1-3 year commitments
S3 Intelligent Tiering40% storageAuto-move to cheaper tiers
CDN Caching50% bandwidthHigher cache hit ratio
Message Deletion80% storageDelete after delivery
Media Compression30% storage/bandwidthClient-side compression
Regional Pricing20% overallDeploy in cheaper regions

Cost per User Metrics

Monthly cost: $3,500,000
MAU: 2,000,000,000

Cost per MAU per month = $3,500,000 / 2B = $0.00175
Cost per MAU per year = $0.00175 × 12 = $0.021

WhatsApp revenue model:
- Previously: $1/year subscription
- Now: Free (Meta subsidizes)
- Business API: $0.005-0.09 per message

At scale, messaging is incredibly cost-efficient!

🔧 Raizo's Note

Hidden Costs to Watch:

  1. Data transfer between AZs: $0.01/GB × 100PB = $1M/month
  2. Kafka retention: 7 days × 75B messages × 100 bytes = 52TB
  3. Monitoring: DataDog/CloudWatch at this scale = $100K+/month
  4. Security: HSMs, audits, compliance = $50K+/month
  5. On-call/Operations: 24/7 team = $500K+/month in salaries

Real total: Likely $5-6M/month including operations

🎯 Interview Checklist

Must-Mention Items

TopicKey Points
Scale Estimation2B MAU, 75B messages/day, 870K QPS, 500M concurrent connections
Protocol ChoiceWebSocket for bidirectional, persistent connections
Message DeliverySent → Delivered → Read state machine with ACKs
E2EESignal Protocol, X3DH key exchange, Double Ratchet
StorageCassandra for messages (write-heavy), S3 for media
Offline HandlingQueue messages, deliver on reconnect, push notifications

Bonus Points 🌟

  • Erlang/BEAM: Mention WhatsApp's use of Erlang for millions of connections per server
  • Sender Keys: Explain group encryption optimization
  • Key Transparency: How users verify they're talking to the right person
  • Multi-device: Challenges of E2EE across multiple devices
  • Message Ordering: Vector clocks or sequence numbers for causal ordering
  • Presence Optimization: Batching presence updates, subscription limits

Common Mistakes

MistakeWhy It's WrongBetter Approach
HTTP pollingHigh latency, battery drainWebSocket or MQTT
Server-side encryptionNot true privacyEnd-to-end encryption
SQL for messagesWrite bottleneckCassandra/HBase
Single delivery statusUsers expect feedbackSent/Delivered/Read states
Ignoring offline usersMessages lostQueue + push notifications
Centralized architectureSingle point of failureDistributed, stateless services

⚠️ Interview Red Flags

  • Không mention E2EE hoặc chỉ nói "encrypt messages"
  • Không giải thích được message delivery guarantees
  • Dùng HTTP polling thay vì WebSocket
  • Không có strategy cho offline users
  • Không biết tại sao dùng Cassandra thay vì SQL
  • Thiết kế không scale được (single server, single DB)

🎓 Key Takeaways

  1. Connection management là core challenge - 500M concurrent connections cần specialized infrastructure
  2. E2EE không chỉ là feature mà là architectural decision ảnh hưởng toàn bộ system
  3. Message delivery guarantees (Sent/Delivered/Read) cần explicit ACK protocol
  4. Cassandra là perfect fit cho write-heavy, time-series messaging workload
  5. Offline handling quan trọng không kém real-time delivery
  6. Cost structure: Media storage và CDN chiếm 90% chi phí

🔗 Navigation

Case StudyKey LearningLink
TwitterFan-out patterns, timeline cachingDesign Twitter →
YouTubeMedia storage, CDN, video processingDesign YouTube →
UberReal-time location, geospatial indexingDesign Uber →

Prerequisites

Advanced Topics