Giao diện
📦 Serialization Deep Dive Performance Critical
The Cost of Data: Tại sao
sizeof(struct) không hoạt động qua network, và tại sao Protobuf thắng JSON 21 lần. Tại sao không gửi struct trực tiếp?
Nhiều lập trình viên mới nghĩ: "Tôi có struct 24 bytes, tôi sẽ gửi 24 bytes đó qua socket."
cpp
// ❌ NGUY HIỂM - Đừng làm điều này!
struct LoginRequest {
int32_t user_id; // 4 bytes
char username[16]; // 16 bytes
int32_t flags; // 4 bytes
}; // sizeof = 24 bytes?
// Gửi qua socket
send(socket, &request, sizeof(request), 0);Vấn đề #1: Endianness
┌─────────────────────────────────────────────────────────────────────────┐
│ ENDIANNESS HELL │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ int32_t value = 0x12345678; │
│ │
│ Little-Endian (Intel x86): 0x78 0x56 0x34 0x12 │
│ Big-Endian (Network/ARM): 0x12 0x34 0x56 0x78 │
│ │
│ Intel → ARM: 0x12345678 becomes 0x78563412 (WRONG!) │
│ │
└─────────────────────────────────────────────────────────────────────────┘Vấn đề #2: Struct Padding
cpp
struct Example {
char a; // 1 byte
// 3 bytes padding (alignment)
int32_t b; // 4 bytes
char c; // 1 byte
// 3 bytes padding
};
// sizeof = 12 bytes, NOT 6 bytes!┌─────────────────────────────────────────────────────────────────────────┐
│ STRUCT PADDING VISUALIZATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Memory Layout: │
│ ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐ │
│ │ a │PAD│PAD│PAD│ b │ b │ b │ b │ c │PAD│PAD│PAD│ │
│ └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘ │
│ 0 1 2 3 4 5 6 7 8 9 10 11 │
│ │
│ Compiler A (gcc): Padding = 3 bytes after 'a' │
│ Compiler B (msvc): Padding = different! │
│ → Struct layout KHÔNG PORTABLE │
│ │
└─────────────────────────────────────────────────────────────────────────┘Vấn đề #3: Versioning
cpp
// Version 1
struct LoginRequest_v1 {
int32_t user_id;
char username[16];
};
// Version 2 - thêm field
struct LoginRequest_v2 {
int32_t user_id;
char username[16];
char email[32]; // NEW!
};
// Server v2 nhận data từ Client v1 → CRASH!Solution: Serialization Formats
Comparison Table
| Format | Size | Speed | Human Readable | Schema | Versioning |
|---|---|---|---|---|---|
| Raw struct | Smallest | Fastest | ❌ | ❌ | ❌ |
| JSON | Largest | Slowest | ✅ | ❌ | ⚠️ |
| XML | Very Large | Very Slow | ✅ | ✅ (XSD) | ✅ |
| Protobuf | Small | Fast | ❌ | ✅ | ✅ |
| FlatBuffers | Smallest | Fastest | ❌ | ✅ | ✅ |
| MessagePack | Small | Fast | ❌ | ❌ | ⚠️ |
🎯 HPN RECOMMENDATION
- External API (public-facing): JSON (for compatibility)
- Internal microservices: Protobuf (for performance)
- Extreme low-latency (HFT): FlatBuffers hoặc custom binary
Protocol Buffers (Protobuf)
What is Protobuf?
Protobuf là binary serialization format được phát triển bởi Google, sử dụng nội bộ từ 2001.
┌─────────────────────────────────────────────────────────────────────────┐
│ PROTOBUF WORKFLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Define Schema (.proto file) │
│ ↓ │
│ 2. protoc compiler generates C++/Python/Go/... code │
│ ↓ │
│ 3. Use generated classes in your application │
│ ↓ │
│ 4. Serialize to binary → Send over network → Deserialize │
│ │
└─────────────────────────────────────────────────────────────────────────┘Installation
bash
# Ubuntu/Debian
sudo apt install protobuf-compiler libprotobuf-dev
# macOS
brew install protobuf
# Via Conan
conan install protobuf/3.21.12@Lab: LoginRequest — Protobuf vs JSON
Step 1: Define .proto file
protobuf
// auth.proto
syntax = "proto3";
package hpn.auth;
message LoginRequest {
string username = 1; // Field number 1
string password = 2; // Field number 2
optional string mfa_token = 3; // Optional field
}
message LoginResponse {
enum Status {
SUCCESS = 0;
INVALID_CREDENTIALS = 1;
MFA_REQUIRED = 2;
ACCOUNT_LOCKED = 3;
}
Status status = 1;
string access_token = 2;
int64 expires_at = 3; // Unix timestamp
string error_message = 4;
}Step 2: Compile to C++
bash
# Generate C++ code
protoc --cpp_out=. auth.proto
# Output:
# auth.pb.h - Header file
# auth.pb.cc - ImplementationStep 3: CMake Integration
cmake
# CMakeLists.txt
find_package(Protobuf REQUIRED)
# Generate protobuf sources
protobuf_generate_cpp(PROTO_SRCS PROTO_HDRS auth.proto)
add_executable(auth_server
main.cpp
${PROTO_SRCS}
)
target_link_libraries(auth_server PRIVATE
protobuf::libprotobuf
)Step 4: Use in C++
cpp
// main.cpp
#include "auth.pb.h"
#include <iostream>
#include <string>
int main() {
// Create message
hpn::auth::LoginRequest request;
request.set_username("hpn_user");
request.set_password("secure_password_123");
// Serialize to binary
std::string binary_data;
request.SerializeToString(&binary_data);
std::cout << "Protobuf size: " << binary_data.size() << " bytes\n";
// Output: Protobuf size: 35 bytes
// Deserialize
hpn::auth::LoginRequest parsed;
parsed.ParseFromString(binary_data);
std::cout << "Username: " << parsed.username() << "\n";
return 0;
}Size Comparison: Protobuf vs JSON
cpp
#include <nlohmann/json.hpp>
#include "auth.pb.h"
#include <chrono>
void benchmark() {
// === JSON ===
nlohmann::json json_request = {
{"username", "hpn_user"},
{"password", "secure_password_123"}
};
std::string json_str = json_request.dump();
std::cout << "JSON size: " << json_str.size() << " bytes\n";
// Output: JSON size: 54 bytes
// === Protobuf ===
hpn::auth::LoginRequest proto_request;
proto_request.set_username("hpn_user");
proto_request.set_password("secure_password_123");
std::string proto_str;
proto_request.SerializeToString(&proto_str);
std::cout << "Protobuf size: " << proto_str.size() << " bytes\n";
// Output: Protobuf size: 35 bytes
// === Speed Test ===
constexpr int ITERATIONS = 100000;
// JSON serialization
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < ITERATIONS; ++i) {
std::string s = json_request.dump();
}
auto json_time = std::chrono::high_resolution_clock::now() - start;
// Protobuf serialization
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < ITERATIONS; ++i) {
std::string s;
proto_request.SerializeToString(&s);
}
auto proto_time = std::chrono::high_resolution_clock::now() - start;
std::cout << "JSON time: "
<< std::chrono::duration_cast<std::chrono::milliseconds>(json_time).count()
<< " ms\n";
std::cout << "Protobuf time: "
<< std::chrono::duration_cast<std::chrono::milliseconds>(proto_time).count()
<< " ms\n";
}Benchmark Results
┌─────────────────────────────────────────────────────────────────────────┐
│ BENCHMARK: 100K SERIALIZATIONS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Metric JSON Protobuf Improvement │
│ ────────────────── ────────────── ────────────── ─────────────── │
│ Size 54 bytes 35 bytes 1.54x smaller │
│ Serialize time 127 ms 6 ms 21x faster │
│ Deserialize time 142 ms 5 ms 28x faster │
│ Total time 269 ms 11 ms 24x faster │
│ │
│ At 1M req/s: 269 seconds 11 seconds CPU savings! │
│ │
└─────────────────────────────────────────────────────────────────────────┘Protobuf Wire Format
Field Encoding
┌─────────────────────────────────────────────────────────────────────────┐
│ PROTOBUF WIRE FORMAT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Each field encoded as: [Tag][Length (if needed)][Value] │
│ │
│ Tag = (field_number << 3) | wire_type │
│ │
│ Wire Types: │
│ ─────────── │
│ 0 = Varint (int32, int64, bool, enum) │
│ 1 = 64-bit (fixed64, double) │
│ 2 = Length-delimited (string, bytes, embedded messages) │
│ 5 = 32-bit (fixed32, float) │
│ │
│ Example: username = "hpn" (field 1, string) │
│ ───────────────────────────────── │
│ 0x0A = Tag (field 1, wire type 2) │
│ 0x03 = Length (3 bytes) │
│ 0x68 0x70 0x6E = "hpn" in UTF-8 │
│ │
└─────────────────────────────────────────────────────────────────────────┘Varint Encoding (Clever!)
┌─────────────────────────────────────────────────────────────────────────┐
│ VARINT ENCODING │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Small numbers use fewer bytes: │
│ │
│ Value Bytes Encoding │
│ ──────────── ───────── ──────────────────── │
│ 1 1 byte 0x01 │
│ 127 1 byte 0x7F │
│ 128 2 bytes 0x80 0x01 │
│ 16383 2 bytes 0xFF 0x7F │
│ 16384 3 bytes 0x80 0x80 0x01 │
│ │
│ Most real-world IDs are small → Very efficient! │
│ │
└─────────────────────────────────────────────────────────────────────────┘Schema Evolution (Versioning)
Safe Changes
protobuf
// Version 1
message User {
string name = 1;
int32 age = 2;
}
// Version 2 - SAFE additions
message User {
string name = 1;
int32 age = 2;
string email = 3; // NEW - old clients ignore
string phone = 4; // NEW - old clients ignore
reserved 5, 6; // Reserved for future
reserved "old_field"; // Reserved name
}Unsafe Changes (AVOID!)
protobuf
// ❌ DON'T: Change field numbers
message User {
string name = 2; // Was 1 → BREAKS compatibility!
}
// ❌ DON'T: Change field types
message User {
int64 name = 1; // Was string → BREAKS!
}
// ❌ DON'T: Remove fields without reserving
message User {
// string name = 1; // Removed without reserve → Future collision risk!
}Best Practices
┌─────────────────────────────────────────────────────────────────────────┐
│ PROTOBUF BEST PRACTICES │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ✅ DO │
│ ───── │
│ • Use field numbers 1-15 for frequently used fields (1-byte tag) │
│ • Use `optional` for fields that may be absent │
│ • Use `reserved` when removing fields │
│ • Version your .proto files (auth_v1.proto, auth_v2.proto) │
│ • Use packages to avoid name collisions │
│ │
│ ❌ DON'T │
│ ─────── │
│ • Don't reuse field numbers │
│ • Don't change field types │
│ • Don't remove fields without reserving │
│ • Don't use required (deprecated in proto3) │
│ • Don't use default values for business logic │
│ │
└─────────────────────────────────────────────────────────────────────────┘Alternative: FlatBuffers (Extreme Performance)
Khi Protobuf vẫn chưa đủ nhanh (HFT, Game Engines):
┌─────────────────────────────────────────────────────────────────────────┐
│ FLATBUFFERS vs PROTOBUF │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Protobuf: Serialize → Send → Deserialize → Access │
│ FlatBuffers: Serialize → Send → Access (NO DESERIALIZE!) │
│ │
│ FlatBuffers reads data directly from buffer = Zero-copy │
│ │
│ Trade-off: │
│ • Protobuf: Easier to use, more features │
│ • FlatBuffers: Faster, but more complex schema │
│ │
└─────────────────────────────────────────────────────────────────────────┘