Skip to content

⚖️ LOAD BALANCING

Điều phối Luồng dữ liệu trong Hệ thống Phân tán


1. Forward Proxy vs Reverse Proxy

1.1 Conceptual Difference

FORWARD PROXY (Client-side representative):
[Client A] ─┐
[Client B] ─┼──► [FORWARD PROXY] ──► [Internet/Servers]
[Client C] ─┘    (Squid, Charles)

Use cases: Corporate network control, anonymity, caching external resources
Key: Servers see PROXY's IP, not Client's IP

────────────────────────────────────────────────────────

REVERSE PROXY (Server-side representative):
                                 ┌──► [Backend Server 1]
[Internet Clients] ──► [REVERSE PROXY] ──┼──► [Backend Server 2]
                       (Nginx, HAProxy)  └──► [Backend Server 3]

Use cases: Load balancing, SSL termination, caching, security
Key: Clients see PROXY's IP, not Server's IP

NOTE

🎓 Giáo sư Tom: Forward Proxy đại diện cho CLIENT, Reverse Proxy đại diện cho SERVER. Đây là "Trust Boundary" trong network security.


2. Load Balancing Strategies

2.1 Round Robin (Simple)

Request Stream:  [R1] [R2] [R3] [R4] [R5] [R6]
Server A:        [R1]      [R4]
Server B:              [R2]      [R5]
Server C:                   [R3]      [R6]

PROS: Simple, zero state, even distribution
CONS: Ignores server capacity, current load, no session affinity

2.2 Least Connections (Intelligent)

LB tracks active connections:
  Server A: ████████████  (12 connections)
  Server B: ████████      (8 connections)  ◄── SEND HERE!
  Server C: ████████████████ (16 connections)

PROS: Adapts to load, handles slow requests
CONS: Requires connection tracking

2.3 IP Hash (Sticky Sessions)

Client IP: 192.168.1.100
  → hash("192.168.1.100") = 0x7F3A...
  → 0x7F3A... mod 3 = 1 → Server B

Same client ALWAYS goes to same server.
Use cases: Stateful apps, WebSocket connections

3. Layer 4 LB vs Layer 7 LB

AspectL4 LBL7 LB
Works atTCP/UDP (IP:Port)HTTP (URL, Headers)
Speed10M+ conn/s100K-1M req/s
Content routing❌ No✅ Yes
SSL Termination❌ Pass-through✅ Can decrypt
Use caseEdge, high throughputIntelligent routing

IMPORTANT

Production Pattern: L4 LB at edge → L7 LB cluster. Never expose L7 directly to internet!


4. Consistent Hashing (Thuật toán Tỷ Đô)

4.1 The Problem with Mod Hashing

server_index = hash(key) % N

With N=3: hash("user:1")=5 → 5%3=2 → Server C
Add server D (N=4): 5%4=1 → Server B ❌ MOVED!

Result: ~75% of keys MUST MOVE when adding 1 server!

4.2 The Hash Ring Solution

Imagine a circular hash space [0, 2^32-1]:



    ┌────┴────┐
   /           \
  │   [A: 45°]  │
  │   [B: 120°] │
   \  [C: 250°] /
    └──────────┘
        180°

Rule: Key routes to FIRST server CLOCKWISE from its position.
Adding server D at 80°: Only keys [45°, 80°] remap to D.
All other keys: UNCHANGED!

4.3 Virtual Nodes

Problem: 3 nodes can be unevenly distributed (one handles 70%!)

Solution: Each physical server gets 100-200 virtual positions:
  Server A → A1@30°, A2@120°, A3@240°, A4@300°
  Server B → B1@60°, B2@150°, B3@210°, B4@330°

Result: Much better balance across the ring.

5. Decision Matrix

ScenarioRecommended Strategy
Stateless APILeast Connections
WebSocketIP Hash + Consistent Hashing
Distributed CacheConsistent Hashing + vnodes
Database replicasLeast Connections

6. Tiếp theo

👉 Caching →