Giao diện
🛡️ Production Patterns Battle-Tested
Survival Guide: Kỹ thuật được sử dụng trong HPN Tunnel, Trading Engines, và Game Servers để survive dưới heavy load.
HPN Engineering Insight
💡 HPN TUNNEL PRODUCTION SECRETS
Trong HPN Tunnel, chúng tôi áp dụng các nguyên tắc sau:
- Zero malloc in hot path — Pre-allocate tất cả buffers
- Zero-copy wherever possible — Data không được copy giữa kernel và userspace
- Lock-free data structures — Tránh mutex contention
- Batch processing — Xử lý nhiều packets mỗi syscall
- CPU pinning — Thread được pin vào specific CPU cores
Những kỹ thuật này giúp HPN Tunnel đạt < 1ms latency với millions of packets/second.
Zero-Copy Networking (@[/perf-profile])
Vấn đề: Copy Overhead
┌─────────────────────────────────────────────────────────────────────────┐
│ TRADITIONAL DATA PATH (MANY COPIES) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Network Card (NIC) │
│ │ │
│ ▼ COPY 1: NIC DMA → Kernel buffer │
│ ┌─────────────────┐ │
│ │ Kernel Buffer │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ COPY 2: Kernel → Userspace (recv syscall) │
│ ┌─────────────────┐ │
│ │ Userspace Buffer│ │
│ └────────┬────────┘ │
│ │ │
│ ▼ COPY 3: Parse → Application struct │
│ ┌─────────────────┐ │
│ │ Application Data│ │
│ └─────────────────┘ │
│ │
│ Each copy: ~100ns + cache pollution + memory bandwidth │
│ At 10 Gbps: 1M packets/s × 3 copies = BOTTLENECK │
│ │
└─────────────────────────────────────────────────────────────────────────┘Solution: Zero-Copy Techniques
┌─────────────────────────────────────────────────────────────────────────┐
│ ZERO-COPY DATA PATH │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Technique 1: mmap() + sendfile() │
│ ───────────────────────────────── │
│ File → Kernel buffer → NIC (no userspace copy!) │
│ Use case: Static file serving (Nginx uses this) │
│ │
│ Technique 2: MSG_ZEROCOPY (Linux 4.14+) │
│ ──────────────────────────────────────── │
│ Userspace buffer registered with kernel │
│ send() uses buffer directly, no copy │
│ Use case: Large message sends │
│ │
│ Technique 3: io_uring (Linux 5.1+) │
│ ───────────────────────────────── │
│ Ring buffers shared between kernel and userspace │
│ No syscall overhead for I/O submission │
│ Use case: Extreme performance (millions ops/s) │
│ │
│ Technique 4: DPDK/XDP (Kernel Bypass) │
│ ──────────────────────────────────── │
│ NIC DMA → Userspace directly (bypasses kernel!) │
│ Use case: HFT, Network Functions Virtualization │
│ │
└─────────────────────────────────────────────────────────────────────────┘MSG_ZEROCOPY Example
cpp
#include <sys/socket.h>
#include <linux/errqueue.h>
// Enable zero-copy mode
int one = 1;
setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &one, sizeof(one));
// Send with zero-copy
char buffer[4096];
send(fd, buffer, sizeof(buffer), MSG_ZEROCOPY);
// Important: Buffer must remain valid until notification!
// Check completion via error queue (recvmsg with MSG_ERRQUEUE)Memory Pool Pattern
Tránh malloc() trong hot path bằng object pool:
cpp
template<typename T, size_t PoolSize = 1024>
class ObjectPool {
public:
ObjectPool() {
for (size_t i = 0; i < PoolSize; ++i) {
free_list_.push(&objects_[i]);
}
}
T* acquire() {
std::lock_guard<std::mutex> lock(mutex_);
if (free_list_.empty()) {
return nullptr; // Pool exhausted
}
T* obj = free_list_.top();
free_list_.pop();
return obj;
}
void release(T* obj) {
std::lock_guard<std::mutex> lock(mutex_);
free_list_.push(obj);
}
private:
std::array<T, PoolSize> objects_;
std::stack<T*> free_list_;
std::mutex mutex_;
};
// Usage
ObjectPool<Session> session_pool;
void handle_connection(tcp::socket socket) {
Session* session = session_pool.acquire();
if (!session) {
// Pool exhausted - reject connection
return;
}
// ... use session ...
session_pool.release(session);
}Lock-Free Version
cpp
#include <atomic>
#include <array>
template<typename T, size_t PoolSize = 1024>
class LockFreePool {
public:
LockFreePool() {
for (size_t i = 0; i < PoolSize - 1; ++i) {
nodes_[i].next = i + 1;
}
nodes_[PoolSize - 1].next = -1; // End of list
head_.store(0);
}
T* acquire() {
int old_head;
int new_head;
do {
old_head = head_.load(std::memory_order_acquire);
if (old_head == -1) return nullptr;
new_head = nodes_[old_head].next;
} while (!head_.compare_exchange_weak(old_head, new_head,
std::memory_order_release,
std::memory_order_relaxed));
return &nodes_[old_head].data;
}
void release(T* ptr) {
int index = reinterpret_cast<Node*>(ptr) - nodes_.data();
int old_head;
do {
old_head = head_.load(std::memory_order_acquire);
nodes_[index].next = old_head;
} while (!head_.compare_exchange_weak(old_head, index,
std::memory_order_release,
std::memory_order_relaxed));
}
private:
struct Node {
T data;
int next;
};
std::array<Node, PoolSize> nodes_;
std::atomic<int> head_;
};Load Testing (@[/load-test-sim])
gRPC Load Testing with ghz
bash
# Install ghz
go install github.com/bojand/ghz/cmd/ghz@latest
# Basic load test
ghz --insecure \
--proto auth.proto \
--call hpn.auth.AuthService.Login \
-d '{"username":"test","password":"test"}' \
-c 100 \ # 100 concurrent connections
-n 10000 \ # 10,000 total requests
--connections 10 \
localhost:50051Output Analysis
┌─────────────────────────────────────────────────────────────────────────┐
│ GHZ OUTPUT EXAMPLE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Summary: │
│ Count: 10000 │
│ Total: 1.23 s │
│ Slowest: 15.21 ms │
│ Fastest: 0.28 ms │
│ Average: 1.12 ms │
│ Requests/sec: 8130.08 │
│ │
│ Response time histogram: │
│ 0.280 [1] | │
│ 1.000 [7823] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ │
│ 2.000 [1892] |∎∎∎∎∎∎∎∎∎∎ │
│ 5.000 [272] |∎ │
│ 15.21 [12] | │
│ │
│ Latency distribution: │
│ 10% in 0.52 ms │
│ 25% in 0.71 ms │
│ 50% in 0.98 ms ← p50 (median) │
│ 75% in 1.31 ms │
│ 90% in 1.89 ms ← p90 │
│ 95% in 2.45 ms ← p95 (SLA target) │
│ 99% in 5.12 ms ← p99 (tail latency) │
│ │
└─────────────────────────────────────────────────────────────────────────┘HTTP Load Testing with wrk
bash
# Install wrk
sudo apt install wrk
# Basic test
wrk -t12 -c400 -d30s http://localhost:8080/api/health
# With Lua script for POST
wrk -t12 -c400 -d30s -s post.lua http://localhost:8080/api/loginlua
-- post.lua
wrk.method = "POST"
wrk.body = '{"username":"test","password":"test"}'
wrk.headers["Content-Type"] = "application/json"Benchmark Targets
┌─────────────────────────────────────────────────────────────────────────┐
│ LATENCY TARGETS BY SYSTEM TYPE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ System Type p50 p95 p99 Target │
│ ──────────────────── ─────── ─────── ─────── ────────── │
│ HFT Trading Engine < 10µs < 50µs < 100µs Microsecs │
│ Game Server < 1ms < 5ms < 10ms < 16ms │
│ HPN Tunnel < 1ms < 2ms < 5ms Low jitter │
│ REST API < 50ms < 200ms < 500ms Sub-second │
│ Batch Processing < 1s < 5s < 30s Minutes OK │
│ │
│ ⚠️ WARNING: p99 latency often 10x worse than p50! │
│ Always measure AND optimize tail latencies. │
│ │
└─────────────────────────────────────────────────────────────────────────┘Security Patterns (@[/security-scan])
Payload Size Limits
cpp
// gRPC server configuration
grpc::ServerBuilder builder;
// Limit message sizes (DDoS protection)
builder.SetMaxReceiveMessageSize(4 * 1024 * 1024); // 4MB max
builder.SetMaxSendMessageSize(4 * 1024 * 1024); // 4MB max
// Rate limiting per connection
builder.SetOption(
grpc::MakeChannelArgumentOption(
"grpc.max_connection_age_ms", 300000)); // 5 min maxProtobuf Validation
cpp
// Custom validation before processing
grpc::Status ValidateLoginRequest(const LoginRequest& request) {
// Size checks (prevent memory attacks)
if (request.username().size() > 128) {
return grpc::Status(grpc::INVALID_ARGUMENT,
"Username too long");
}
if (request.password().size() > 256) {
return grpc::Status(grpc::INVALID_ARGUMENT,
"Password too long");
}
// Character validation (prevent injection)
for (char c : request.username()) {
if (!std::isalnum(c) && c != '_' && c != '-') {
return grpc::Status(grpc::INVALID_ARGUMENT,
"Invalid character in username");
}
}
return grpc::Status::OK;
}
// In handler
grpc::Status Login(ServerContext* context,
const LoginRequest* request,
LoginResponse* response) override {
auto validation = ValidateLoginRequest(*request);
if (!validation.ok()) {
return validation;
}
// ... proceed with login ...
}TLS/SSL Configuration
cpp
// Server with TLS
grpc::SslServerCredentialsOptions ssl_opts;
ssl_opts.pem_root_certs = ""; // Client CA (for mutual TLS)
grpc::SslServerCredentialsOptions::PemKeyCertPair key_cert;
key_cert.private_key = LoadFile("server.key");
key_cert.cert_chain = LoadFile("server.crt");
ssl_opts.pem_key_cert_pairs.push_back(key_cert);
auto creds = grpc::SslServerCredentials(ssl_opts);
grpc::ServerBuilder builder;
builder.AddListeningPort("0.0.0.0:50051", creds);
// Client with TLS
grpc::SslCredentialsOptions client_ssl;
client_ssl.pem_root_certs = LoadFile("ca.crt");
auto channel = grpc::CreateChannel(
"server.example.com:50051",
grpc::SslCredentials(client_ssl));┌─────────────────────────────────────────────────────────────────────────┐
│ TLS CONFIGURATION CHECKLIST │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ✅ REQUIRED (Non-negotiable) │
│ ───────────────────────────── │
│ • TLS 1.2 minimum (TLS 1.3 preferred) │
│ • Strong cipher suites (AES-256-GCM, ChaCha20-Poly1305) │
│ • Certificate validation enabled │
│ • Private key protected (file permissions 600) │
│ │
│ 🔒 RECOMMENDED │
│ ────────────── │
│ • Mutual TLS (mTLS) for internal services │
│ • OCSP stapling for certificate revocation │
│ • Certificate pinning for mobile clients │
│ • Short-lived certificates (rotate every 90 days) │
│ │
│ ❌ NEVER │
│ ──────── │
│ • Never use InsecureServerCredentials() in production │
│ • Never disable certificate validation │
│ • Never hardcode certificates in source code │
│ • Never use self-signed certs in production │
│ │
└─────────────────────────────────────────────────────────────────────────┘Rate Limiting
cpp
#include <chrono>
#include <unordered_map>
#include <mutex>
class RateLimiter {
public:
RateLimiter(int max_requests, std::chrono::seconds window)
: max_requests_(max_requests), window_(window) {}
bool allow(const std::string& client_id) {
std::lock_guard<std::mutex> lock(mutex_);
auto now = std::chrono::steady_clock::now();
auto& bucket = buckets_[client_id];
// Clean old entries
while (!bucket.empty() &&
now - bucket.front() > window_) {
bucket.pop_front();
}
if (bucket.size() >= max_requests_) {
return false; // Rate limited
}
bucket.push_back(now);
return true;
}
private:
int max_requests_;
std::chrono::seconds window_;
std::unordered_map<std::string,
std::deque<std::chrono::steady_clock::time_point>> buckets_;
std::mutex mutex_;
};
// Usage in gRPC interceptor
class RateLimitInterceptor : public grpc::experimental::Interceptor {
public:
RateLimitInterceptor(RateLimiter& limiter) : limiter_(limiter) {}
void Intercept(grpc::experimental::InterceptorBatchMethods* methods) {
if (methods->QueryInterceptionHookPoint(
grpc::experimental::InterceptionHookPoints::
PRE_SEND_INITIAL_METADATA)) {
std::string peer = GetClientIP(methods);
if (!limiter_.allow(peer)) {
methods->FailHijackedRecvMessage();
methods->FailHijackedSendMessage();
return;
}
}
methods->Proceed();
}
private:
RateLimiter& limiter_;
};Graceful Shutdown
cpp
#include <csignal>
#include <atomic>
std::atomic<bool> shutdown_requested{false};
void signal_handler(int signal) {
if (signal == SIGINT || signal == SIGTERM) {
shutdown_requested.store(true);
}
}
int main() {
std::signal(SIGINT, signal_handler);
std::signal(SIGTERM, signal_handler);
asio::io_context io;
Server server(io, 8080);
// Shutdown checker
asio::steady_timer shutdown_timer(io);
std::function<void()> check_shutdown = [&]() {
if (shutdown_requested.load()) {
std::cout << "Shutdown requested, draining..." << std::endl;
// Stop accepting new connections
server.stop_accepting();
// Wait for existing connections (grace period)
asio::steady_timer drain_timer(io, std::chrono::seconds(30));
drain_timer.async_wait([&](auto) {
io.stop();
});
} else {
shutdown_timer.expires_after(std::chrono::milliseconds(100));
shutdown_timer.async_wait([&](auto) { check_shutdown(); });
}
};
check_shutdown();
io.run();
std::cout << "Server shutdown complete" << std::endl;
}Summary: Production Checklist
┌─────────────────────────────────────────────────────────────────────────┐
│ PRODUCTION NETWORKING CHECKLIST │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 🔒 SECURITY │
│ ──────────── │
│ □ TLS/SSL enabled (no plain TCP) │
│ □ Certificate rotation automated │
│ □ Input validation on all messages │
│ □ Payload size limits configured │
│ □ Rate limiting per client │
│ │
│ 🏎️ PERFORMANCE │
│ ──────────── │
│ □ Object pools for hot path allocations │
│ □ Zero-copy where possible │
│ □ Async/non-blocking I/O │
│ □ Connection pooling for clients │
│ □ Batch processing for high throughput │
│ │
│ 📊 OBSERVABILITY │
│ ─────────────── │
│ □ Latency histograms (p50, p95, p99) │
│ □ Error rate tracking │
│ □ Connection count monitoring │
│ □ Health check endpoints │
│ │
│ 🛡️ RESILIENCE │
│ ───────────── │
│ □ Graceful shutdown (drain connections) │
│ □ Timeout on all operations │
│ □ Circuit breaker for downstream calls │
│ □ Retry with exponential backoff │
│ │
└─────────────────────────────────────────────────────────────────────────┘