Skip to content

🛡️ Production Patterns Battle-Tested

Survival Guide: Kỹ thuật được sử dụng trong HPN Tunnel, Trading Engines, và Game Servers để survive dưới heavy load.

HPN Engineering Insight

💡 HPN TUNNEL PRODUCTION SECRETS

Trong HPN Tunnel, chúng tôi áp dụng các nguyên tắc sau:

  1. Zero malloc in hot path — Pre-allocate tất cả buffers
  2. Zero-copy wherever possible — Data không được copy giữa kernel và userspace
  3. Lock-free data structures — Tránh mutex contention
  4. Batch processing — Xử lý nhiều packets mỗi syscall
  5. CPU pinning — Thread được pin vào specific CPU cores

Những kỹ thuật này giúp HPN Tunnel đạt < 1ms latency với millions of packets/second.


Zero-Copy Networking (@[/perf-profile])

Vấn đề: Copy Overhead

┌─────────────────────────────────────────────────────────────────────────┐
│                    TRADITIONAL DATA PATH (MANY COPIES)                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Network Card (NIC)                                                    │
│        │                                                                │
│        ▼ COPY 1: NIC DMA → Kernel buffer                                │
│   ┌─────────────────┐                                                   │
│   │  Kernel Buffer  │                                                   │
│   └────────┬────────┘                                                   │
│            │                                                            │
│            ▼ COPY 2: Kernel → Userspace (recv syscall)                  │
│   ┌─────────────────┐                                                   │
│   │ Userspace Buffer│                                                   │
│   └────────┬────────┘                                                   │
│            │                                                            │
│            ▼ COPY 3: Parse → Application struct                         │
│   ┌─────────────────┐                                                   │
│   │ Application Data│                                                   │
│   └─────────────────┘                                                   │
│                                                                         │
│   Each copy: ~100ns + cache pollution + memory bandwidth                │
│   At 10 Gbps: 1M packets/s × 3 copies = BOTTLENECK                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Solution: Zero-Copy Techniques

┌─────────────────────────────────────────────────────────────────────────┐
│                    ZERO-COPY DATA PATH                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Technique 1: mmap() + sendfile()                                      │
│   ─────────────────────────────────                                     │
│   File → Kernel buffer → NIC (no userspace copy!)                       │
│   Use case: Static file serving (Nginx uses this)                       │
│                                                                         │
│   Technique 2: MSG_ZEROCOPY (Linux 4.14+)                               │
│   ────────────────────────────────────────                              │
│   Userspace buffer registered with kernel                               │
│   send() uses buffer directly, no copy                                  │
│   Use case: Large message sends                                         │
│                                                                         │
│   Technique 3: io_uring (Linux 5.1+)                                    │
│   ─────────────────────────────────                                     │
│   Ring buffers shared between kernel and userspace                      │
│   No syscall overhead for I/O submission                                │
│   Use case: Extreme performance (millions ops/s)                        │
│                                                                         │
│   Technique 4: DPDK/XDP (Kernel Bypass)                                 │
│   ────────────────────────────────────                                  │
│   NIC DMA → Userspace directly (bypasses kernel!)                       │
│   Use case: HFT, Network Functions Virtualization                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

MSG_ZEROCOPY Example

cpp
#include <sys/socket.h>
#include <linux/errqueue.h>

// Enable zero-copy mode
int one = 1;
setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &one, sizeof(one));

// Send with zero-copy
char buffer[4096];
send(fd, buffer, sizeof(buffer), MSG_ZEROCOPY);

// Important: Buffer must remain valid until notification!
// Check completion via error queue (recvmsg with MSG_ERRQUEUE)

Memory Pool Pattern

Tránh malloc() trong hot path bằng object pool:

cpp
template<typename T, size_t PoolSize = 1024>
class ObjectPool {
public:
    ObjectPool() {
        for (size_t i = 0; i < PoolSize; ++i) {
            free_list_.push(&objects_[i]);
        }
    }
    
    T* acquire() {
        std::lock_guard<std::mutex> lock(mutex_);
        if (free_list_.empty()) {
            return nullptr;  // Pool exhausted
        }
        T* obj = free_list_.top();
        free_list_.pop();
        return obj;
    }
    
    void release(T* obj) {
        std::lock_guard<std::mutex> lock(mutex_);
        free_list_.push(obj);
    }

private:
    std::array<T, PoolSize> objects_;
    std::stack<T*> free_list_;
    std::mutex mutex_;
};

// Usage
ObjectPool<Session> session_pool;

void handle_connection(tcp::socket socket) {
    Session* session = session_pool.acquire();
    if (!session) {
        // Pool exhausted - reject connection
        return;
    }
    
    // ... use session ...
    
    session_pool.release(session);
}

Lock-Free Version

cpp
#include <atomic>
#include <array>

template<typename T, size_t PoolSize = 1024>
class LockFreePool {
public:
    LockFreePool() {
        for (size_t i = 0; i < PoolSize - 1; ++i) {
            nodes_[i].next = i + 1;
        }
        nodes_[PoolSize - 1].next = -1;  // End of list
        head_.store(0);
    }
    
    T* acquire() {
        int old_head;
        int new_head;
        do {
            old_head = head_.load(std::memory_order_acquire);
            if (old_head == -1) return nullptr;
            new_head = nodes_[old_head].next;
        } while (!head_.compare_exchange_weak(old_head, new_head,
                                               std::memory_order_release,
                                               std::memory_order_relaxed));
        return &nodes_[old_head].data;
    }
    
    void release(T* ptr) {
        int index = reinterpret_cast<Node*>(ptr) - nodes_.data();
        int old_head;
        do {
            old_head = head_.load(std::memory_order_acquire);
            nodes_[index].next = old_head;
        } while (!head_.compare_exchange_weak(old_head, index,
                                               std::memory_order_release,
                                               std::memory_order_relaxed));
    }

private:
    struct Node {
        T data;
        int next;
    };
    
    std::array<Node, PoolSize> nodes_;
    std::atomic<int> head_;
};

Load Testing (@[/load-test-sim])

gRPC Load Testing with ghz

bash
# Install ghz
go install github.com/bojand/ghz/cmd/ghz@latest

# Basic load test
ghz --insecure \
    --proto auth.proto \
    --call hpn.auth.AuthService.Login \
    -d '{"username":"test","password":"test"}' \
    -c 100 \       # 100 concurrent connections
    -n 10000 \     # 10,000 total requests
    --connections 10 \
    localhost:50051

Output Analysis

┌─────────────────────────────────────────────────────────────────────────┐
│                    GHZ OUTPUT EXAMPLE                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Summary:                                                              │
│     Count:        10000                                                 │
│     Total:        1.23 s                                                │
│     Slowest:      15.21 ms                                              │
│     Fastest:      0.28 ms                                               │
│     Average:      1.12 ms                                               │
│     Requests/sec: 8130.08                                               │
│                                                                         │
│   Response time histogram:                                              │
│     0.280  [1]     |                                                    │
│     1.000  [7823]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎           │
│     2.000  [1892]  |∎∎∎∎∎∎∎∎∎∎                                          │
│     5.000  [272]   |∎                                                   │
│     15.21  [12]    |                                                    │
│                                                                         │
│   Latency distribution:                                                 │
│     10% in 0.52 ms                                                      │
│     25% in 0.71 ms                                                      │
│     50% in 0.98 ms      ← p50 (median)                                  │
│     75% in 1.31 ms                                                      │
│     90% in 1.89 ms      ← p90                                           │
│     95% in 2.45 ms      ← p95 (SLA target)                              │
│     99% in 5.12 ms      ← p99 (tail latency)                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

HTTP Load Testing with wrk

bash
# Install wrk
sudo apt install wrk

# Basic test
wrk -t12 -c400 -d30s http://localhost:8080/api/health

# With Lua script for POST
wrk -t12 -c400 -d30s -s post.lua http://localhost:8080/api/login
lua
-- post.lua
wrk.method = "POST"
wrk.body   = '{"username":"test","password":"test"}'
wrk.headers["Content-Type"] = "application/json"

Benchmark Targets

┌─────────────────────────────────────────────────────────────────────────┐
│                    LATENCY TARGETS BY SYSTEM TYPE                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   System Type              p50        p95        p99        Target      │
│   ────────────────────     ───────    ───────    ───────    ──────────  │
│   HFT Trading Engine       < 10µs     < 50µs     < 100µs    Microsecs   │
│   Game Server              < 1ms      < 5ms      < 10ms     < 16ms      │
│   HPN Tunnel               < 1ms      < 2ms      < 5ms      Low jitter  │
│   REST API                 < 50ms     < 200ms    < 500ms    Sub-second  │
│   Batch Processing         < 1s       < 5s       < 30s      Minutes OK  │
│                                                                         │
│   ⚠️ WARNING: p99 latency often 10x worse than p50!                    │
│   Always measure AND optimize tail latencies.                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Security Patterns (@[/security-scan])

Payload Size Limits

cpp
// gRPC server configuration
grpc::ServerBuilder builder;

// Limit message sizes (DDoS protection)
builder.SetMaxReceiveMessageSize(4 * 1024 * 1024);  // 4MB max
builder.SetMaxSendMessageSize(4 * 1024 * 1024);     // 4MB max

// Rate limiting per connection
builder.SetOption(
    grpc::MakeChannelArgumentOption(
        "grpc.max_connection_age_ms", 300000));  // 5 min max

Protobuf Validation

cpp
// Custom validation before processing
grpc::Status ValidateLoginRequest(const LoginRequest& request) {
    // Size checks (prevent memory attacks)
    if (request.username().size() > 128) {
        return grpc::Status(grpc::INVALID_ARGUMENT, 
                            "Username too long");
    }
    
    if (request.password().size() > 256) {
        return grpc::Status(grpc::INVALID_ARGUMENT, 
                            "Password too long");
    }
    
    // Character validation (prevent injection)
    for (char c : request.username()) {
        if (!std::isalnum(c) && c != '_' && c != '-') {
            return grpc::Status(grpc::INVALID_ARGUMENT,
                                "Invalid character in username");
        }
    }
    
    return grpc::Status::OK;
}

// In handler
grpc::Status Login(ServerContext* context,
                   const LoginRequest* request,
                   LoginResponse* response) override {
    
    auto validation = ValidateLoginRequest(*request);
    if (!validation.ok()) {
        return validation;
    }
    
    // ... proceed with login ...
}

TLS/SSL Configuration

cpp
// Server with TLS
grpc::SslServerCredentialsOptions ssl_opts;
ssl_opts.pem_root_certs = "";  // Client CA (for mutual TLS)

grpc::SslServerCredentialsOptions::PemKeyCertPair key_cert;
key_cert.private_key = LoadFile("server.key");
key_cert.cert_chain = LoadFile("server.crt");
ssl_opts.pem_key_cert_pairs.push_back(key_cert);

auto creds = grpc::SslServerCredentials(ssl_opts);

grpc::ServerBuilder builder;
builder.AddListeningPort("0.0.0.0:50051", creds);

// Client with TLS
grpc::SslCredentialsOptions client_ssl;
client_ssl.pem_root_certs = LoadFile("ca.crt");

auto channel = grpc::CreateChannel(
    "server.example.com:50051",
    grpc::SslCredentials(client_ssl));
┌─────────────────────────────────────────────────────────────────────────┐
│                    TLS CONFIGURATION CHECKLIST                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ✅ REQUIRED (Non-negotiable)                                           │
│   ─────────────────────────────                                         │
│   • TLS 1.2 minimum (TLS 1.3 preferred)                                 │
│   • Strong cipher suites (AES-256-GCM, ChaCha20-Poly1305)              │
│   • Certificate validation enabled                                      │
│   • Private key protected (file permissions 600)                        │
│                                                                         │
│   🔒 RECOMMENDED                                                         │
│   ──────────────                                                        │
│   • Mutual TLS (mTLS) for internal services                             │
│   • OCSP stapling for certificate revocation                            │
│   • Certificate pinning for mobile clients                              │
│   • Short-lived certificates (rotate every 90 days)                     │
│                                                                         │
│   ❌ NEVER                                                               │
│   ────────                                                              │
│   • Never use InsecureServerCredentials() in production                 │
│   • Never disable certificate validation                                │
│   • Never hardcode certificates in source code                          │
│   • Never use self-signed certs in production                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Rate Limiting

cpp
#include <chrono>
#include <unordered_map>
#include <mutex>

class RateLimiter {
public:
    RateLimiter(int max_requests, std::chrono::seconds window)
        : max_requests_(max_requests), window_(window) {}
    
    bool allow(const std::string& client_id) {
        std::lock_guard<std::mutex> lock(mutex_);
        
        auto now = std::chrono::steady_clock::now();
        auto& bucket = buckets_[client_id];
        
        // Clean old entries
        while (!bucket.empty() && 
               now - bucket.front() > window_) {
            bucket.pop_front();
        }
        
        if (bucket.size() >= max_requests_) {
            return false;  // Rate limited
        }
        
        bucket.push_back(now);
        return true;
    }

private:
    int max_requests_;
    std::chrono::seconds window_;
    std::unordered_map<std::string, 
        std::deque<std::chrono::steady_clock::time_point>> buckets_;
    std::mutex mutex_;
};

// Usage in gRPC interceptor
class RateLimitInterceptor : public grpc::experimental::Interceptor {
public:
    RateLimitInterceptor(RateLimiter& limiter) : limiter_(limiter) {}
    
    void Intercept(grpc::experimental::InterceptorBatchMethods* methods) {
        if (methods->QueryInterceptionHookPoint(
                grpc::experimental::InterceptionHookPoints::
                    PRE_SEND_INITIAL_METADATA)) {
            
            std::string peer = GetClientIP(methods);
            
            if (!limiter_.allow(peer)) {
                methods->FailHijackedRecvMessage();
                methods->FailHijackedSendMessage();
                return;
            }
        }
        methods->Proceed();
    }

private:
    RateLimiter& limiter_;
};

Graceful Shutdown

cpp
#include <csignal>
#include <atomic>

std::atomic<bool> shutdown_requested{false};

void signal_handler(int signal) {
    if (signal == SIGINT || signal == SIGTERM) {
        shutdown_requested.store(true);
    }
}

int main() {
    std::signal(SIGINT, signal_handler);
    std::signal(SIGTERM, signal_handler);
    
    asio::io_context io;
    Server server(io, 8080);
    
    // Shutdown checker
    asio::steady_timer shutdown_timer(io);
    std::function<void()> check_shutdown = [&]() {
        if (shutdown_requested.load()) {
            std::cout << "Shutdown requested, draining..." << std::endl;
            
            // Stop accepting new connections
            server.stop_accepting();
            
            // Wait for existing connections (grace period)
            asio::steady_timer drain_timer(io, std::chrono::seconds(30));
            drain_timer.async_wait([&](auto) {
                io.stop();
            });
        } else {
            shutdown_timer.expires_after(std::chrono::milliseconds(100));
            shutdown_timer.async_wait([&](auto) { check_shutdown(); });
        }
    };
    
    check_shutdown();
    io.run();
    
    std::cout << "Server shutdown complete" << std::endl;
}

Summary: Production Checklist

┌─────────────────────────────────────────────────────────────────────────┐
│              PRODUCTION NETWORKING CHECKLIST                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   🔒 SECURITY                                                           │
│   ────────────                                                          │
│   □ TLS/SSL enabled (no plain TCP)                                      │
│   □ Certificate rotation automated                                      │
│   □ Input validation on all messages                                    │
│   □ Payload size limits configured                                      │
│   □ Rate limiting per client                                            │
│                                                                         │
│   🏎️ PERFORMANCE                                                        │
│   ────────────                                                          │
│   □ Object pools for hot path allocations                               │
│   □ Zero-copy where possible                                            │
│   □ Async/non-blocking I/O                                              │
│   □ Connection pooling for clients                                      │
│   □ Batch processing for high throughput                                │
│                                                                         │
│   📊 OBSERVABILITY                                                       │
│   ───────────────                                                       │
│   □ Latency histograms (p50, p95, p99)                                  │
│   □ Error rate tracking                                                 │
│   □ Connection count monitoring                                         │
│   □ Health check endpoints                                              │
│                                                                         │
│   🛡️ RESILIENCE                                                         │
│   ─────────────                                                         │
│   □ Graceful shutdown (drain connections)                               │
│   □ Timeout on all operations                                           │
│   □ Circuit breaker for downstream calls                                │
│   □ Retry with exponential backoff                                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

🎉 Module Hoàn thành!

Bạn đã học toàn bộ về High-Performance Networking:

  1. 📦 Protobuf serialization (21x faster than JSON)
  2. 🔌 gRPC framework (modern RPC for microservices)
  3. ⚡ Async I/O (Boost.Asio, C++20 Coroutines)
  4. 🛡️ Production patterns (Zero-copy, TLS, Rate limiting)

Bước tiếp theo: Quay lại C++ Roadmap để học các modules khác!