Giao diện
🗺️ Maps Internals & Optimization
Go Map không phải "magic" — nó là một Hash Table được implement cẩn thận. Hiểu cách nó hoạt động under the hood sẽ giúp bạn tránh bugs và optimize performance.
🔧 Internal Mechanism: Hash Table
Map là Reference Type
🎓 Professor Tom's Deep Dive: hmap struct
Khi bạn tạo một map, Go allocate một hmap struct trên heap. Variable map của bạn chỉ là pointer đến struct này:
go
// runtime/map.go (simplified)
type hmap struct {
count int // Số key-value pairs
B uint8 // log_2 của số buckets (2^B = buckets)
buckets unsafe.Pointer // Array of buckets
oldbuckets unsafe.Pointer // Used during growth
// ... more fields
}Consequence: Passing map vào function = passing pointer. Modifications affect original!
go
func addKey(m map[string]int) {
m["new"] = 42 // Modifies original map!
}
func main() {
data := map[string]int{"a": 1}
addKey(data)
fmt.Println(data["new"]) // 42 - changed!
}Buckets & Overflow Buckets
🎓 Hash Table Architecture
Go map sử dụng buckets để store key-value pairs. Mỗi bucket chứa tối đa 8 entries:
┌─────────────────────────────────────────────────────────────────┐
│ hmap STRUCT │
├─────────────────────────────────────────────────────────────────┤
│ count: 12 │ B: 2 (means 4 buckets) │ buckets: *bmap │
└─────────────────────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ BUCKET ARRAY (2^B = 4 buckets) │
├────────────────┬────────────────┬────────────────┬──────────────┤
│ Bucket 0 │ Bucket 1 │ Bucket 2 │ Bucket 3 │
│ │ │ │ │
│ ┌──────────┐ │ ┌──────────┐ │ ┌──────────┐ │ ┌────────┐ │
│ │ tophash │ │ │ tophash │ │ │ tophash │ │ │tophash │ │
│ │ [8]uint8 │ │ │ [8]uint8 │ │ │ [8]uint8 │ │ │[8]uint8│ │
│ ├──────────┤ │ ├──────────┤ │ ├──────────┤ │ ├────────┤ │
│ │ keys[8] │ │ │ keys[8] │ │ │ keys[8] │ │ │keys[8] │ │
│ │ vals[8] │ │ │ vals[8] │ │ │ vals[8] │ │ │vals[8] │ │
│ ├──────────┤ │ ├──────────┤ │ ├──────────┤ │ ├────────┤ │
│ │ overflow │──┼─►│ overflow │ │ │ overflow │ │ │overflow│ │
│ │ *bmap │ │ │ (nil) │ │ │ (nil) │ │ │ (nil) │ │
│ └──────────┘ │ └──────────┘ │ └──────────┘ │ └────────┘ │
│ │ │ │ │ │
└───────┼────────┴────────────────┴────────────────┴──────────────┘
│
▼ (khi bucket đầy, chain overflow bucket)
┌──────────────────┐
│ Overflow Bucket │
│ ┌──────────┐ │
│ │ tophash │ │
│ │ keys[8] │ │ ← Thêm space khi bucket chính đầy
│ │ vals[8] │ │
│ │ overflow │────┼──► (more overflow buckets if needed)
│ └──────────┘ │
└──────────────────┘Lookup Process
go
value := myMap["key"]
// Internal steps:
// 1. Compute hash of "key"
// 2. Use low bits of hash to find bucket index
// 3. Use high bits (tophash) to quickly scan entries in bucket
// 4. Compare actual key if tophash matches
// 5. Return value or zero value if not foundTại sao tophash?
- 8-byte key comparison is expensive
- tophash (1 byte) comparison is fast
- Only do full key comparison if tophash matches → significant speedup!
⚠️ Concurrency Trap: Fatal Error
🔥 Raizo's Critical Warning: Map Concurrent Write
Go maps are NOT safe for concurrent use. Writing from multiple goroutines → CRASH!
go
// ❌ FATAL ERROR - concurrent map writes
func main() {
m := make(map[int]int)
go func() {
for i := 0; i < 10000; i++ {
m[i] = i // Goroutine 1 writes
}
}()
go func() {
for i := 0; i < 10000; i++ {
m[i] = i * 2 // Goroutine 2 writes
}
}()
time.Sleep(time.Second)
}
// OUTPUT:
// fatal error: concurrent map writes
// (Program crashes, không thể recover!)Lý do: Go runtime detect concurrent access và panic intentionally để prevent data corruption.
Solution: Mutex Protection
go
// ✅ Safe with sync.RWMutex
type SafeMap struct {
mu sync.RWMutex
data map[string]int
}
func (m *SafeMap) Set(key string, value int) {
m.mu.Lock()
defer m.mu.Unlock()
m.data[key] = value
}
func (m *SafeMap) Get(key string) (int, bool) {
m.mu.RLock() // Read lock - multiple readers OK
defer m.mu.RUnlock()
v, ok := m.data[key]
return v, ok
}Alternative: sync.Map
go
// ✅ For specific use cases
var cache sync.Map
cache.Store("key", "value")
if v, ok := cache.Load("key"); ok {
fmt.Println(v)
}
// Good for:
// - Write-once, read-many
// - Keys written by one goroutine, read by many
// - Disjoint sets of keys per goroutine
// NOT good for:
// - General purpose concurrent map
// - Frequent updates to same keys📌 HPN Standard: Chọn đúng tool
| Use Case | Solution |
|---|---|
| Most cases | sync.RWMutex wrapper |
| Cache (write once, read many) | sync.Map |
| High contention | Sharded Map (see Penrift case) |
⚡ Performance Tuning
Pre-allocate Capacity
🔥 Raizo's Pitfall: Evacuation Cost
Khi map grow quá capacity, Go phải evacuate (rehash) tất cả entries sang bucket array mới. Đây là O(n) operation!
go
// ❌ Bad: Multiple evacuations
m := make(map[string]int) // Default small capacity
for i := 0; i < 1000000; i++ {
m[fmt.Sprintf("key%d", i)] = i // Triggers multiple grows!
}
// ✅ Good: Pre-allocate
m := make(map[string]int, 1000000) // Allocate upfront
for i := 0; i < 1000000; i++ {
m[fmt.Sprintf("key%d", i)] = i // No evacuations needed
}
// Benchmark difference:
// Without capacity: ~450ms
// With capacity: ~280ms (38% faster!)Evacuation Process:
Before growth (B=1, 2 buckets, at capacity):
┌─────────┬─────────┐
│Bucket 0 │Bucket 1 │ ← Full, need to grow
│ 8 items │ 8 items │
└─────────┴─────────┘
During growth (B=2, 4 buckets):
┌─────────┬─────────┬─────────┬─────────┐
│Bucket 0'│Bucket 1'│Bucket 2'│Bucket 3'│ ← New array
│ │ │ │ │
└─────────┴─────────┴─────────┴─────────┘
↑ ↑
│ Rehash and redistribute all entries
│ (expensive!)
┌─────────┬─────────┐
│Bucket 0 │Bucket 1 │ ← Old array (GC'd after)
└─────────┴─────────┘🔑 Key Restrictions: Comparable Types
🎓 Why Slices Can't Be Map Keys
Map keys phải là comparable — có thể dùng == operator.
go
// ✅ Valid key types (comparable)
map[string]int
map[int]string
map[*User]bool // Pointer comparison
map[[3]int]bool // Arrays are comparable
map[MyStruct]int // Structs (if all fields comparable)
// ❌ Invalid key types
map[[]int]string // Slices - NOT comparable
map[map[string]int]bool // Maps - NOT comparable
map[func()]int // Functions - NOT comparableLý do kỹ thuật:
- Hash function cần consistent output cho equal keys
- Slices are mutable → hash could change after insertion
- Slice equality is undefined (
[]int{1,2} == []int{1,2}không compile)
Workaround cho slice keys:
go
// Convert slice to string key
func sliceKey(s []int) string {
b, _ := json.Marshal(s)
return string(b)
}
// Hoặc dùng array nếu size cố định
type ArrayKey [3]int
m := make(map[ArrayKey]string)
m[ArrayKey{1, 2, 3}] = "value"🎮 Scenario Analysis: Penrift Sharded Map
🧠 Production Case Study
Câu hỏi: Tại sao Penrift Cache không dùng map mặc định của Go để lưu trữ dữ liệu tần suất cao mà lại dùng Sharded Map?
💡 Phân tích vấn đề
Bottleneck với single map + mutex:
┌─────────────┐
Goroutine 1 ───►│ │
Goroutine 2 ───►│ Mutex │───► Single Map
Goroutine 3 ───►│ (Lock) │
Goroutine 4 ───►│ │
└─────────────┘
↑
All goroutines WAIT
for single lock!Với concurrent requests cao:
- 1 goroutine writes → tất cả phải wait
- Lock contention tăng exponentially với goroutine count
- Throughput giảm drastically
✅ Solution: Sharded Map
Chia map thành nhiều shards, mỗi shard có lock riêng:
┌──────────┐
Goroutine 1 ───►│ Shard 0 │ (key hash % N = 0)
│ Lock 0 │
└──────────┘
┌──────────┐
Goroutine 2 ───►│ Shard 1 │ (key hash % N = 1)
│ Lock 1 │
└──────────┘
┌──────────┐
Goroutine 3 ───►│ Shard 2 │ (key hash % N = 2)
│ Lock 2 │
└──────────┘Implementation concept:
go
const numShards = 32
type ShardedMap struct {
shards [numShards]*Shard
}
type Shard struct {
mu sync.RWMutex
data map[string][]byte
}
func (m *ShardedMap) getShard(key string) *Shard {
hash := fnv32(key)
return m.shards[hash % numShards]
}
func (m *ShardedMap) Set(key string, value []byte) {
shard := m.getShard(key)
shard.mu.Lock()
shard.data[key] = value
shard.mu.Unlock()
}Kết quả:
- Lock contention giảm 32x (với 32 shards)
- Goroutines accessing different shards → parallel execution
- Penrift handles 100K+ req/sec với minimal latency
📊 Benchmark Comparison
| Approach | 10K concurrent writes | Lock wait time |
|---|---|---|
| Single Map + Mutex | 850ms | ~78% |
| sync.Map | 620ms | ~45% |
| Sharded Map (32) | 180ms | ~8% |
Trade-off: Sharded map uses more memory (32 maps instead of 1), nhưng cho high-throughput scenarios, đây là trade-off acceptable.
📊 Summary
| Concept | Key Point |
|---|---|
| Map Type | Reference type → pointer to hmap struct |
| Buckets | 8 entries per bucket, overflow chaining |
| Concurrency | NOT SAFE → use mutex or sharded map |
| Capacity | Pre-allocate to avoid expensive evacuation |
| Key Types | Must be comparable (no slices, maps, funcs) |
➡️ Tiếp theo
Maps nắm vững rồi! Tiếp theo: Slices & Maps Deep Dive - Slice internals và memory management.