Skip to content

📡 I/O Architecture & Data Serialization

"Everything is a file" — Unix philosophy.
Trong Go, mọi data source đều implement io.Reader. Master interface này để xây dựng data pipelines hiệu quả.

🔌 The Universal Interfaces

io.Reader & io.Writer

go
// Hai interfaces quan trọng nhất trong Go
type Reader interface {
    Read(p []byte) (n int, err error)
}

type Writer interface {
    Write(p []byte) (n int, err error)
}

🎓 Professor Tom's Deep Dive: Universality

Mọi thứ trong Go implement hai interfaces này:

Source/DestTypeReader/Writer
Files*os.FileBoth
Networknet.ConnBoth
HTTP Body*http.Response.BodyReader
HTTP Responsehttp.ResponseWriterWriter
Memory*bytes.BufferBoth
Compression*gzip.Reader/WriterBoth
Encryptioncipher.StreamReaderReader
Strings*strings.ReaderReader

Power: Viết function nhận io.Reader → work với TẤT CẢ sources!

The Pipe Metaphor: Chaining

go
// Unix pipe: cat file.txt | gzip | encrypt | nc server 8080
// Go equivalent:
func ProcessAndSend(inputPath string, conn net.Conn, key []byte) error {
    // Open file
    file, err := os.Open(inputPath)
    if err != nil {
        return err
    }
    defer file.Close()
    
    // Chain: File → Gzip → Encrypt → Network
    gzipWriter := gzip.NewWriter(conn)
    defer gzipWriter.Close()
    
    encryptWriter := cipher.NewOFBWriter(gzipWriter, key)
    
    // Data flows through the chain
    _, err = io.Copy(encryptWriter, file)
    return err
}

Visual Pipeline:

┌─────────────────────────────────────────────────────────────────────┐
│                     DATA PIPELINE (Unix Pipes)                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │
│   │   File   │───►│   Gzip   │───►│ Encrypt  │───►│ Network  │    │
│   │ (Reader) │    │ (Writer) │    │ (Writer) │    │  (Conn)  │    │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘    │
│                                                                     │
│   io.Copy(encryptWriter, file)                                     │
│   → Data streams through without loading entire file into RAM!     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

📖 Efficient File Processing

The Problem: Loading Huge Files

🔥 Raizo's Pitfall: Memory Explosion

go
// ❌ BAD: Load entire file into memory
func ProcessFileBad(path string) error {
    data, err := os.ReadFile(path)  // 10GB file = 10GB RAM!
    if err != nil {
        return err
    }
    
    lines := strings.Split(string(data), "\n")  // Double memory!
    for _, line := range lines {
        process(line)
    }
    return nil
}

Memory usage với 10GB file:

  • os.ReadFile: 10GB allocated
  • strings.Split: Additional ~10GB for string copies
  • Total: 20GB+ for a 10GB file!

The Solution: Buffered I/O with bufio

go
// ✅ GOOD: Stream with constant memory
func ProcessFileGood(path string) error {
    file, err := os.Open(path)
    if err != nil {
        return err
    }
    defer file.Close()
    
    scanner := bufio.NewScanner(file)
    
    // Optional: increase buffer for long lines
    buf := make([]byte, 64*1024)  // 64KB buffer
    scanner.Buffer(buf, 1024*1024) // Max 1MB per line
    
    for scanner.Scan() {
        line := scanner.Text()
        process(line)  // Process one line at a time
    }
    
    return scanner.Err()
}

Memory Comparison

┌─────────────────────────────────────────────────────────────────────┐
│              MEMORY USAGE: Processing 10GB Log File                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   os.ReadFile (BAD):                                               │
│   ┌────────────────────────────────────────────────────────────┐   │
│   │ ████████████████████████████████████████████████ ~20GB RAM  │   │
│   └────────────────────────────────────────────────────────────┘   │
│   💥 Server crash với 512MB RAM!                                   │
│                                                                     │
│   bufio.Scanner (GOOD):                                            │
│   ┌────┐                                                            │
│   │ ██ │ ~64KB buffer (constant!)                                  │
│   └────┘                                                            │
│   ✅ Works perfectly với 512MB RAM                                 │
│                                                                     │
│   Time to first output:                                             │
│   - ReadFile: Must wait for entire 10GB to load                    │
│   - Scanner: Immediate (reads first line instantly)                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

bufio.Reader vs bufio.Scanner

go
// bufio.Scanner - Line-by-line processing (most common)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
    line := scanner.Text()
    // process line
}

// bufio.Reader - More control (custom delimiters, peeking)
reader := bufio.NewReader(file)
for {
    line, err := reader.ReadString('\n')
    if err == io.EOF {
        break
    }
    // process line
}

📦 Data Serialization

JSON: The Universal Format

go
import "encoding/json"

type User struct {
    ID        int64     `json:"id"`
    Name      string    `json:"name"`
    Email     string    `json:"email,omitempty"`  // Omit if empty
    Password  string    `json:"-"`                 // Never serialize
    CreatedAt time.Time `json:"created_at"`
}

// Marshal: Struct → JSON bytes
user := User{ID: 1, Name: "Raizo", Email: "raizo@hpn.dev"}
data, err := json.Marshal(user)
// {"id":1,"name":"Raizo","email":"raizo@hpn.dev","created_at":"..."}

// Unmarshal: JSON bytes → Struct
var parsed User
err = json.Unmarshal(data, &parsed)

Streaming JSON for APIs

go
// ❌ BAD: Marshal to bytes, then write
func HandleUserBad(w http.ResponseWriter, r *http.Request) {
    user := getUser()
    data, _ := json.Marshal(user)  // Allocate full byte slice
    w.Write(data)
}

// ✅ GOOD: Stream directly to writer
func HandleUserGood(w http.ResponseWriter, r *http.Request) {
    user := getUser()
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(user)  // Stream directly!
}

// Benefits:
// - No intermediate byte slice allocation
// - Works with chunked transfer encoding
// - Lower memory footprint for large responses

Gob: Go's Binary Format

🎓 Professor Tom's Deep Dive: When to Use Gob

Gob là Go-specific binary format:

AspectJSONGob
Readable✅ Human readable❌ Binary
SizeLarger~50% smaller
SpeedSlower2-5x faster
Interop✅ Any language❌ Go only
Use CaseAPIs, configInternal storage, RPC
go
import "encoding/gob"

// Encode to file
func SaveSession(path string, session *Session) error {
    file, err := os.Create(path)
    if err != nil {
        return err
    }
    defer file.Close()
    
    encoder := gob.NewEncoder(file)
    return encoder.Encode(session)
}

// Decode from file
func LoadSession(path string) (*Session, error) {
    file, err := os.Open(path)
    if err != nil {
        return nil, err
    }
    defer file.Close()
    
    var session Session
    decoder := gob.NewDecoder(file)
    if err := decoder.Decode(&session); err != nil {
        return nil, err
    }
    return &session, nil
}

📌 HPN Application: Penalgo Save Files

Penalgo sử dụng Gob cho "Save Progress" feature:

go
// Compact và nhanh cho internal tools
type PenalgoProgress struct {
    UserID        string
    CompletedLabs []string
    Scores        map[string]int
    LastAccess    time.Time
}

// Save: 50% smaller than JSON, 3x faster encode
func SaveProgress(userID string, progress *PenalgoProgress) error {
    path := filepath.Join(dataDir, userID+".gob")
    return SaveSession(path, progress)
}

📁 Config File Patterns

Reading JSON Config at Startup

go
// config/config.go
type Config struct {
    Server   ServerConfig   `json:"server"`
    Database DatabaseConfig `json:"database"`
    Redis    RedisConfig    `json:"redis"`
}

type ServerConfig struct {
    Port         int           `json:"port"`
    ReadTimeout  time.Duration `json:"read_timeout"`
    WriteTimeout time.Duration `json:"write_timeout"`
}

func LoadConfig(path string) (*Config, error) {
    file, err := os.Open(path)
    if err != nil {
        return nil, fmt.Errorf("open config: %w", err)
    }
    defer file.Close()
    
    var cfg Config
    if err := json.NewDecoder(file).Decode(&cfg); err != nil {
        return nil, fmt.Errorf("decode config: %w", err)
    }
    
    return &cfg, nil
}

// main.go
func main() {
    cfg, err := config.LoadConfig("configs/config.json")
    if err != nil {
        log.Fatalf("failed to load config: %v", err)
    }
    
    server := NewServer(cfg)
    server.Run()
}

Config Structure

project/
├── cmd/
│   └── api/
│       └── main.go
├── configs/
│   ├── config.json        # Default config
│   ├── config.dev.json    # Development overrides
│   └── config.prod.json   # Production settings
├── internal/
│   └── config/
│       └── config.go      # Config types & loader

🎮 Scenario Analysis

🧠 Production Challenge

Scenario: Bạn cần xử lý một file log nặng 10GB trên server chỉ có 512MB RAM.

Bạn sẽ dùng os.ReadFile hay bufio.Scanner?

💡 Phân tích cơ chế quản lý bộ nhớ

os.ReadFile Approach ( Sẽ Crash)

go
data, _ := os.ReadFile("10gb.log")  // 💥 Out of Memory!

Vấn đề:

  1. os.ReadFile allocate slice 10GB để chứa toàn bộ file
  2. Server 512MB RAM → OOM Kill ngay lập tức
  3. Không có cách nào để giảm memory footprint

bufio.Scanner Approach ( OK)

go
file, _ := os.Open("10gb.log")
scanner := bufio.NewScanner(file)
for scanner.Scan() {
    process(scanner.Text())  // ~64KB buffer
}

Cơ chế:

  1. bufio.Scanner dùng internal buffer (default 64KB)
  2. Mỗi lần Scan():
    • Đọc data vào buffer
    • Tìm delimiter (newline)
    • Return line, reuse buffer cho line tiếp theo
  3. Memory constant (~64KB) bất kể file size!

Memory Timeline

os.ReadFile:
Time 0: ░░░░░░░░░░░░░░░░░░░░ 0MB
Time 1: ██████████████████████████████████████████ 10GB → 💥 OOM

bufio.Scanner:
Time 0: ██ 64KB
Time 1: ██ 64KB (same buffer reused)
Time 2: ██ 64KB
...
Time N: ██ 64KB → ✅ Completes successfully!

Production Tips

  1. Large files: Always use streaming (bufio, io.Copy)
  2. Memory limit: Set via GOMEMLIMIT (Go 1.19+)
  3. Monitor: Use runtime.ReadMemStats trong dev
  4. Long lines: Configure scanner.Buffer(buf, maxLineSize)

📊 Summary: Module 2 Complete!

ConceptKey Point
io.Reader/WriterUniversal interfaces for all I/O
ChainingPipe data through transformations
bufioConstant memory for large files
JSONHuman-readable, use for APIs
GobBinary, faster, Go-only internal use
ConfigLoad at startup, fail fast

🦴 Module 2 Complete: The Skeleton

🎉 Chúc mừng! Module 2 Hoàn thành!

Bạn đã xây dựng xong "Bộ xương" của Go:

  • Structs & Interfaces — Cấu trúc dữ liệu
  • Memory Layout — Alignment và optimization
  • I/O Architecture — Mạch máu dữ liệu
  • Serialization — Giao tiếp với thế giới
  • Reflection — Tự nhận thức runtime

Tiếp theo: Module 3 - Concurrency

"Next, we breathe life into the monster using Goroutines."

  • 🔥 Goroutines — Lightweight concurrency (2KB stack)
  • 🔥 Channels — Safe communication between goroutines
  • 🔥 Context — Cancellation, timeouts, request-scoped data
  • 🔥 sync Primitives — Mutex, WaitGroup, atomic operations
  • 🔥 Concurrency Patterns — Worker pools, fan-out/fan-in

"Do not communicate by sharing memory; share memory by communicating."