Rate Limiting — Kiểm soát tốc độ truy cập

Một API endpoint đang nhận 50,000 requests/giây thay vì 5,000 dự kiến. Không có rate limiting, hệ thống sẽ sụp trong vài phút — database connection pool cạn kiệt, memory tràn, downstream services cascade failure. Đây không phải kịch bản giả định: đây là 3 giờ sáng thứ Hai tuần trước của bạn.

Rate limiting là lớp phòng thủ đầu tiên của mọi hệ thống production. Nó bảo vệ backend khỏi quá tải, đảm bảo fair usage giữa các client, ngăn chặn abuse (brute-force, scraping, DDoS layer 7), và kiểm soát chi phí infrastructure. Một hệ thống không có rate limiting giống như một con đường cao tốc không có giới hạn tốc độ — sớm muộn cũng xảy ra tai nạn.

Bài này đi sâu vào 4 thuật toán chính, cách triển khai trên distributed system, và những sai lầm mà ngay cả senior engineer cũng mắc phải.

🎯 Mục tiêu

Sau bài học này, bạn sẽ:

Hiểu và so sánh 4 thuật toán rate limiting phổ biến
Triển khai distributed rate limiting với Redis
Tránh các lỗi cấu hình thường gặp trong production
Chọn đúng thuật toán cho đúng use case

Bức tranh tư duy

Hãy nghĩ về rate limiting như cổng thu phí trên cao tốc. Dù có bao nhiêu xe muốn đi qua, cổng chỉ cho phép một số lượng xe nhất định mỗi phút. Xe đến quá nhanh sẽ phải xếp hàng (queuing) hoặc bị từ chối (rejection). Cổng thu phí không quan tâm xe đến từ đâu — nó chỉ kiểm soát tốc độ dòng chảy.

Trong thực tế, rate limiter có thể đặt ở nhiều tầng: edge (CDN/WAF), API Gateway, application layer, hoặc per-service. Mỗi tầng phục vụ mục đích khác nhau — edge chặn DDoS, gateway kiểm soát quota, application bảo vệ business logic.

Cốt lõi kỹ thuật

Token Bucket

Token Bucket là thuật toán phổ biến nhất trong production. Hình dung một cái xô chứa token — mỗi request cần 1 token để đi qua. Token được thêm vào xô với tốc độ cố định (refill rate). Nếu xô hết token, request bị từ chối.

Đặc điểm nổi bật: cho phép burst traffic — nếu xô đầy token, client có thể gửi nhiều request cùng lúc cho đến khi hết token.

python

import time

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.monotonic()

    def allow_request(self) -> bool:
        now = time.monotonic()
        self.tokens = min(self.capacity,
                          self.tokens + (now - self.last_refill) * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Ưu điểm: Đơn giản, memory O(1), cho phép burst hợp lý. Nhược điểm: Khó chính xác tuyệt đối trên distributed system nếu không có coordination.

Sliding Window Log

Thuật toán này lưu timestamp của mỗi request vào một sorted set. Khi có request mới, xóa các entry cũ hơn cửa sổ thời gian, đếm số entry còn lại.

python

from collections import deque
import time

class SlidingWindowLog:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.timestamps: deque[float] = deque()

    def allow_request(self) -> bool:
        now = time.monotonic()
        while self.timestamps and self.timestamps[0] <= now - self.window_seconds:
            self.timestamps.popleft()
        if len(self.timestamps) < self.max_requests:
            self.timestamps.append(now)
            return True
        return False

Ưu điểm: Chính xác tuyệt đối — không có boundary burst problem. Nhược điểm: Memory O(n) — với 10,000 req/s và window 60s, bạn lưu 600,000 timestamps per user.

Sliding Window Counter

Đây là hybrid giữa Fixed Window và Sliding Window Log — cân bằng giữa độ chính xác và hiệu quả bộ nhớ. Ý tưởng: dùng counter của window hiện tại và window trước, tính trọng số theo thời gian đã trôi.

python

import math, time

class SlidingWindowCounter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = window_seconds
        self.curr_count = 0
        self.prev_count = 0
        self.curr_start = math.floor(time.time())

    def allow_request(self) -> bool:
        now = time.time()
        win_start = math.floor(now / self.window) * self.window
        if win_start != self.curr_start:
            self.prev_count, self.curr_count = self.curr_count, 0
            self.curr_start = win_start
        weight = (now - win_start) / self.window
        estimated = self.prev_count * (1 - weight) + self.curr_count
        if estimated < self.max_requests:
            self.curr_count += 1
            return True
        return False

Ưu điểm: Memory O(1), độ chính xác chấp nhận được (~0.003% sai lệch theo Cloudflare). Nhược điểm: Chỉ là xấp xỉ, không chính xác tuyệt đối.

Fixed Window Counter

Thuật toán đơn giản nhất: chia thời gian thành các window cố định (ví dụ mỗi 60 giây), đếm request trong mỗi window. Vấn đề nghiêm trọng: Boundary burst — client gửi N request cuối window hiện tại + N request đầu window tiếp theo → 2N request trong thời gian rất ngắn.

So sánh thuật toán

Thuật toán	Độ chính xác	Memory	Complexity	Burst handling
Token Bucket	Cao	O(1)	Thấp	Burst có kiểm soát
Sliding Window Log	Tuyệt đối	O(n)	Trung bình	Chính xác
Sliding Window Counter	Xấp xỉ	O(1)	Thấp	Xấp xỉ
Fixed Window Counter	Thấp	O(1)	Rất thấp	Boundary burst problem

Thực chiến

API Gateway Rate Limiting

Trong production, rate limiting thường đặt ở tầng API Gateway với nhiều chiều kiểm soát:

nginx

http {
    limit_req_zone $binary_remote_addr zone=per_ip:10m rate=100r/s;
    limit_req_zone $http_x_api_key    zone=per_user:10m rate=50r/s;

    server {
        location /api/ {
            limit_req zone=per_ip burst=20 nodelay;
            limit_req zone=per_user burst=10 nodelay;
            limit_req_status 429;  # Trả 429, không phải 503
            proxy_pass http://backend;
        }
    }
}

Distributed Rate Limiting với Redis

Khi chạy nhiều instance, bạn cần centralized counter. Redis là lựa chọn phổ biến nhất. Lua script đảm bảo atomicity — không bị race condition:

lua

-- KEYS[1] = "rl:user:123", ARGV = {window_ms, max_requests, now_ms}
local key = KEYS[1]
local window = tonumber(ARGV[1])
local max_req = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
local current = redis.call('ZCARD', key)

if current < max_req then
    redis.call('ZADD', key, now, now .. ':' .. math.random(1000000))
    redis.call('PEXPIRE', key, window)
    return 1  -- ALLOWED
end
return 0  -- DENIED

Gọi từ application (Python + redis-py):

python

import redis, time

r = redis.Redis(host='localhost', port=6379, decode_responses=True)
SCRIPT = r.register_script(open('sliding_window_rate_limit.lua').read())

def is_allowed(user_id: str, max_req: int = 100, window_ms: int = 60_000) -> bool:
    now_ms = int(time.time() * 1000)
    return SCRIPT(keys=[f"rl:user:{user_id}"],
                  args=[window_ms, max_req, now_ms]) == 1

Sai lầm điển hình

⚠️ Cạm bẫy

1. Rate limit chỉ ở application layer — Attacker bypass API Gateway truy cập trực tiếp internal service → rate limiting vô nghĩa. Giải pháp: Defense in depth — đặt ở nhiều tầng (edge → gateway → application → per-service).

⚠️ Cạm bẫy

2. Không có distributed coordination — Mỗi instance giữ counter riêng. 10 instances × 100 req/s = thực tế 1,000 req/s. Giải pháp: Centralized store (Redis) hoặc gossip protocol.

⚠️ Cạm bẫy

3. Hard-coded limits không adjust theo load — 1,000 req/s hợp lý cho đến khi database quá tải. Giải pháp: Adaptive rate limiting — giảm limit khi CPU > 80% hoặc latency > P99 threshold.

⚠️ Cạm bẫy

4. Thiếu graceful degradation — Rate limiter crash → trả 500 thay vì 429. Giải pháp: Fail-open với fallback local counter, luôn trả HTTP 429 kèm Retry-After header.

Under the Hood

Time Complexity

Operation	Token Bucket	Sliding Window Log	Sliding Window Counter	Fixed Window
Check request	O(1)	O(n) cleanup	O(1)	O(1)
Memory per key	~16 bytes	~8 bytes × n entries	~24 bytes	~12 bytes
Redis commands	1-2	3-4 (ZREMRANGEBYSCORE + ZCARD + ZADD)	2-3	1-2 (INCR + EXPIRE)

Khi nào dùng thuật toán nào?

Use case	Thuật toán khuyến nghị	Lý do
API Gateway, per-user quota	Token Bucket	Cho phép burst, memory O(1), lựa chọn mặc định
Financial APIs, authentication	Sliding Window Log	Chính xác tuyệt đối, request volume thấp
High-throughput public APIs	Sliding Window Counter	O(1) memory, xấp xỉ chấp nhận được
Internal metrics, non-critical	Fixed Window Counter	Đơn giản nhất, boundary burst OK

Trade-offs quan trọng

Precision vs Memory: Sliding Window Log chính xác nhất nhưng O(n) memory — 100K users × 1K req/min = 100 triệu timestamps.
Centralized vs Local: Redis thêm ~1-2ms latency nhưng chính xác. Local counter nhanh nhưng sai khi scale horizontally.
Fail-open vs Fail-closed: Production thường chọn fail-open với fallback local counter.

Checklist ghi nhớ

✅ Checklist triển khai

Thiết kế

[ ] Xác định rate limit ở từng tầng: edge, gateway, application, per-service
[ ] Chọn thuật toán phù hợp dựa trên yêu cầu precision và memory budget
[ ] Thiết kế rate limit key: per-IP, per-user, per-API-key, per-endpoint, hoặc combination
[ ] Quy hoạch response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After

Triển khai

[ ] Dùng Redis Lua script để đảm bảo atomicity trên distributed system
[ ] Implement fail-open với fallback local counter khi Redis không khả dụng
[ ] Trả HTTP 429 (không phải 500 hay 503) kèm Retry-After header
[ ] Viết unit test cho edge cases: burst, window boundary, clock skew

Vận hành

[ ] Monitor rate limit hit ratio — nếu > 5% request bị reject, review lại threshold
[ ] Thiết lập alerting khi rate limiter Redis latency > 5ms
[ ] Cấu hình adaptive limits dựa trên system health metrics
[ ] Duy trì whitelist cho internal services và health check endpoints

Bài tập luyện tập

🧠 Quiz

Câu 1: Thuật toán nào phù hợp nhất cho rate limiting trên distributed system với yêu cầu chính xác cao?

A. Fixed Window Counter
B. Token Bucket ✅
C. Leaky Bucket
D. Random sampling

Giải thích: Token Bucket cho phép burst traffic trong khi vẫn duy trì average rate, phù hợp với distributed systems khi kết hợp Redis atomic operations. Fixed Window có boundary burst problem, Leaky Bucket không cho phép burst hợp lý.

🧠 Quiz

Câu 2: Khi Redis (dùng làm centralized rate limiter) bị down, chiến lược nào phù hợp nhất cho production?

A. Block tất cả requests (fail-closed)
B. Cho tất cả requests đi qua không giới hạn (fail-open không fallback)
C. Fail-open với fallback sang local in-memory counter ✅
D. Trả HTTP 500 cho mọi request

Giải thích: Fail-open với local fallback đảm bảo availability trong khi vẫn có mức bảo vệ cơ bản. Fail-closed gây outage toàn bộ, fail-open không fallback có thể overload backend, và 500 là sai HTTP semantics.

🧠 Quiz

Câu 3: Hệ thống có 8 application instances, mỗi instance cấu hình rate limit 100 req/s per user bằng local counter. User thực tế có thể gửi tối đa bao nhiêu req/s?

A. 100 req/s
B. 400 req/s
C. 800 req/s ✅
D. Không xác định được

Giải thích: Với round-robin load balancing, request phân bố đều qua 8 instances. Mỗi instance cho phép 100 req/s → user có thể gửi tối đa 8 × 100 = 800 req/s. Đây là lý do cần centralized rate limiting trên distributed system.

Rate Limiting — Kiểm soát tốc độ truy cập ​

Bức tranh tư duy ​

Cốt lõi kỹ thuật ​

Token Bucket ​

Sliding Window Log ​

Sliding Window Counter ​

Fixed Window Counter ​

So sánh thuật toán ​

Thực chiến ​

API Gateway Rate Limiting ​

Distributed Rate Limiting với Redis ​

Sai lầm điển hình ​

Under the Hood ​

Time Complexity ​

Khi nào dùng thuật toán nào? ​

Trade-offs quan trọng ​

Checklist ghi nhớ ​

Bài tập luyện tập ​

Liên kết học tiếp ​

Rate Limiting — Kiểm soát tốc độ truy cập

Bức tranh tư duy

Cốt lõi kỹ thuật

Token Bucket

Sliding Window Log

Sliding Window Counter

Fixed Window Counter

So sánh thuật toán

Thực chiến

API Gateway Rate Limiting

Distributed Rate Limiting với Redis

Sai lầm điển hình

Under the Hood

Time Complexity

Khi nào dùng thuật toán nào?

Trade-offs quan trọng

Checklist ghi nhớ

Bài tập luyện tập

Liên kết học tiếp