Production Deployment — Triển khai Python API

Tháng 3 năm ngoái, một payment service viết bằng FastAPI bị sập lúc 2 giờ sáng — đúng đợt flash sale. Nguyên nhân không phải code lỗi. Application chạy đúng logic, test xanh lè. Vấn đề nằm ở chỗ: container chạy bằng user root, không có health check, Gunicorn dùng sync worker mặc định, và khi Kubernetes gửi SIGTERM thì process bị kill ngay — 847 giao dịch đang xử lý dở bị mất trắng.

Viết code chạy đúng trên máy local là điều kiện cần. Nhưng triển khai (deployment) cho production là một kỷ luật kỹ thuật hoàn toàn khác — nó đòi hỏi bạn suy nghĩ về failure modes, resource limits, khả năng quan sát hệ thống (observability), và cách ứng dụng "chết đẹp" (graceful shutdown) thay vì sập bất ngờ.

Mỗi quyết định deployment đều là trade-off. Nhiều worker hơn → nhiều RAM hơn. Docker image nhỏ hơn → build phức tạp hơn. Health check quá nghiêm ngặt → false positive, quá lỏng → không phát hiện lỗi. Hiểu rõ các trade-off này chính là điều phân biệt một developer viết code "chạy được" với một engineer xây dựng hệ thống "sống được" trên production.

Bức tranh tư duy

Hãy hình dung việc triển khai một API service giống như mở một quán phở:

Bếp trưởng (WSGI/ASGI server): Gunicorn/Uvicorn điều phối bao nhiêu đầu bếp (worker) cùng nấu, ai xử lý đơn nào
Nhà bếp đóng gói (Docker): Container hóa toàn bộ quán để "mở chi nhánh" ở bất kỳ đâu với chất lượng đồng nhất
Kiểm tra vệ sinh (Health checks): Tự kiểm tra bếp còn hoạt động không (liveness), sẵn sàng nhận khách chưa (readiness)
Đóng cửa có trật tự (Graceful shutdown): Phục vụ nốt khách đang ăn rồi mới tắt bếp — không đuổi khách giữa bữa
Camera giám sát (Observability): Log, metrics, tracing — biết quán hoạt động ra sao mà không cần đứng canh 24/7

Kiến trúc triển khai production:

  ┌─────────────────────────────────────────────────┐
  │               Load Balancer (Nginx / ALB)        │
  └──────────┬──────────────┬───────────────┬───────┘
             │              │               │
  ┌──────────▼──┐  ┌───────▼────┐  ┌───────▼────┐
  │ Container 1 │  │ Container 2│  │ Container 3│
  │  Gunicorn   │  │  Gunicorn  │  │  Gunicorn  │
  │  ├─Worker 1 │  │  ├─Worker 1│  │  ├─Worker 1│
  │  ├─Worker 2 │  │  ├─Worker 2│  │  ├─Worker 2│
  │  └─Worker 3 │  │  └─Worker 3│  │  └─Worker 3│
  │  /health ✓  │  │  /health ✓ │  │  /health ✓ │
  └─────────────┘  └────────────┘  └────────────┘
        │                │               │
        └───────┬────────┴───────┬───────┘
          ┌─────▼──────┐  ┌──────▼─────┐
          │ PostgreSQL  │  │   Redis    │
          └────────────┘  └────────────┘

Mỗi container là một "chi nhánh" độc lập. Load balancer dựa vào health check để biết container nào còn sống. Khi cần tắt, container báo "không nhận khách mới" (readiness fail), xử lý nốt request đang chạy, rồi mới tắt hẳn.

Cốt lõi kỹ thuật

WSGI và ASGI — Hai giao thức nền tảng

WSGI (Web Server Gateway Interface) là giao thức đồng bộ (synchronous) — mỗi request chiếm một thread/process cho đến khi xong. ASGI (Asynchronous Server Gateway Interface) là giao thức bất đồng bộ — một worker xử lý hàng nghìn kết nối đồng thời nhờ event loop.

Nếu bạn chạy FastAPI (ASGI app) với WSGI worker thuần, bạn mất toàn bộ lợi thế async — giống mua xe thể thao rồi chỉ chạy số 1.

python

# WSGI: Mỗi request chiếm 1 worker cho đến khi xong
# 4 workers = tối đa 4 requests đồng thời
def wsgi_handler(environ, start_response):
    result = call_database()  # Block 100ms — worker không làm gì khác
    start_response("200 OK", [])
    return [result]

# ASGI: Worker xử lý nhiều requests song song nhờ async
# 4 workers × hàng nghìn connections mỗi worker
async def asgi_handler(scope, receive, send):
    result = await call_database()  # Nhả quyền điều khiển — xử lý request khác
    await send({"type": "http.response.body", "body": result})

Gunicorn + Uvicorn — Kết hợp sức mạnh

Uvicorn là ASGI server nhanh nhưng chỉ single-process. Gunicorn là process manager quản lý nhiều worker. Kết hợp Gunicorn (quản lý) + Uvicorn worker (xử lý async) = mô hình production tối ưu.

python

# gunicorn.conf.py — Cấu hình production-grade
import multiprocessing
import os

bind = os.getenv("BIND_ADDRESS", "0.0.0.0:8000")
backlog = 2048

# Workers: (2 × CPU) + 1 cho I/O-bound, giới hạn bởi RAM (~150MB/worker)
_cpu_count = multiprocessing.cpu_count()
workers = int(os.getenv("WEB_CONCURRENCY", _cpu_count * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000

# Timeouts
timeout = 120          # Kill worker nếu không phản hồi sau 120s
graceful_timeout = 30  # Chờ worker xử lý nốt requests khi shutdown
keepalive = 5

# Chống memory leak — restart worker sau N requests
max_requests = 2000
max_requests_jitter = 200  # Random thêm để tránh restart đồng loạt

# Logging ra stdout/stderr (container best practice)
accesslog = "-"
errorlog = "-"
loglevel = os.getenv("LOG_LEVEL", "info")

proc_name = "payment-api"
daemon = False  # Không chạy daemon trong container

# Hooks cho observability
def post_fork(server, worker):
    server.log.info("Worker spawned — PID %s", worker.pid)

def worker_exit(server, worker):
    server.log.info("Worker exited — PID %s", worker.pid)

Docker multi-stage builds — Image nhỏ, bảo mật cao

Image build chứa compiler, header files, build tools — tất cả đều không cần thiết và tăng attack surface trên production. Multi-stage build cho phép build ở stage 1, chỉ copy kết quả sang stage 2 — giảm image từ ~900MB xuống ~150MB.

dockerfile

# === Stage 1: Build dependencies ===
FROM python:3.12-slim AS builder
WORKDIR /build

RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# === Stage 2: Runtime (image cuối cùng) ===
FROM python:3.12-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PATH="/opt/venv/bin:$PATH"

WORKDIR /app

COPY --from=builder /opt/venv /opt/venv

# Non-root user TRƯỚC khi copy code
RUN groupadd --gid 1001 appgroup && \
    useradd --uid 1001 --gid appgroup --shell /bin/false appuser
COPY --chown=appuser:appgroup . .
USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/live')" || exit 1

CMD ["gunicorn", "app.main:app", "-c", "gunicorn.conf.py"]

Health checks — Liveness, Readiness, Startup

Ba loại health check phục vụ ba mục đích khác nhau — nhầm lẫn chúng là nguyên nhân phổ biến gây sập hệ thống:

Loại	Câu hỏi	Khi fail	Ví dụ
Liveness	Process còn sống không?	Restart container	Deadlock detection
Readiness	Sẵn sàng nhận traffic?	Ngừng route traffic đến	DB chưa sẵn sàng
Startup	Khởi động xong chưa?	Chờ thêm (tạm tắt liveness)	Loading ML model

python

import asyncio
import time
from datetime import datetime, timezone
from fastapi import FastAPI, status
from fastapi.responses import JSONResponse


class HealthState:
    """Quản lý trạng thái health của application."""
    def __init__(self) -> None:
        self.is_ready: bool = False
        self.started_at: datetime | None = None

    def mark_ready(self) -> None:
        self.is_ready = True
        self.started_at = datetime.now(timezone.utc)

    def mark_not_ready(self) -> None:
        self.is_ready = False


health_state = HealthState()


@app.get("/health/live")
async def liveness():
    """Process có đang chạy? KHÔNG check dependencies — tránh cascading failure."""
    return {"status": "alive", "timestamp": datetime.now(timezone.utc).isoformat()}


@app.get("/health/ready")
async def readiness():
    """Sẵn sàng nhận traffic? Check TẤT CẢ dependencies quan trọng."""
    if not health_state.is_ready:
        return JSONResponse(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            content={"status": "not_ready", "reason": "Application initializing"},
        )

    db_ok, redis_ok = await asyncio.gather(
        _check_database(), _check_redis(), return_exceptions=True,
    )
    checks = {"database": db_ok, "redis": redis_ok}
    all_healthy = all(
        isinstance(c, dict) and c.get("status") == "healthy"
        for c in [db_ok, redis_ok]
    )
    return JSONResponse(
        status_code=status.HTTP_200_OK if all_healthy else status.HTTP_503_SERVICE_UNAVAILABLE,
        content={"status": "ready" if all_healthy else "not_ready", "checks": checks},
    )


async def _check_database() -> dict:
    try:
        start = time.monotonic()
        async with db_pool.acquire() as conn:
            await asyncio.wait_for(conn.execute("SELECT 1"), timeout=3.0)
        return {"status": "healthy", "latency_ms": round((time.monotonic() - start) * 1000, 2)}
    except Exception as exc:
        return {"status": "unhealthy", "error": str(exc)}


async def _check_redis() -> dict:
    try:
        await asyncio.wait_for(redis_client.ping(), timeout=2.0)
        return {"status": "healthy"}
    except Exception as exc:
        return {"status": "unhealthy", "error": str(exc)}

Graceful shutdown — Lifespan protocol

Graceful shutdown (tắt máy có trật tự) đảm bảo: không request nào bị cắt giữa chừng, connections được đóng đúng cách, background tasks hoàn tất hoặc bị cancel có kiểm soát.

python

import asyncio
import logging
from contextlib import asynccontextmanager
from collections.abc import AsyncGenerator
from fastapi import FastAPI

logger = logging.getLogger(__name__)
shutdown_event = asyncio.Event()


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Vòng đời application: startup → serve → shutdown."""

    # === STARTUP ===
    logger.info("Initializing resources...")
    db_pool = await create_db_pool(dsn=settings.DATABASE_URL, min_size=5, max_size=20)
    redis_client = await create_redis_client(settings.REDIS_URL)
    health_state.mark_ready()
    logger.info("Application ready — accepting traffic")

    yield  # Application đang serve requests

    # === SHUTDOWN ===
    logger.info("Graceful shutdown initiated...")

    # 1. Báo load balancer ngừng gửi traffic mới
    health_state.mark_not_ready()
    shutdown_event.set()

    # 2. Drain: chờ in-flight requests hoàn tất
    await asyncio.sleep(10)

    # 3. Cleanup theo thứ tự ngược
    await redis_client.close()
    await db_pool.close()
    logger.info("Shutdown complete")


app = FastAPI(title="Payment API", lifespan=lifespan)

Thực chiến

Tình huống: Deploy FastAPI service cho hệ thống thanh toán xử lý 10K requests/phút

API nhận payment requests, validate, lưu vào PostgreSQL, push event vào Redis stream. Cần zero-downtime deployment, structured logging cho đối soát, và health checks để Kubernetes tự phục hồi.

Structured logging — Nền tảng observability

python

# app/logging_config.py
import json
import logging
import sys
from datetime import datetime, timezone


class StructuredJSONFormatter(logging.Formatter):
    """Format log thành JSON — dễ parse bởi Elasticsearch/Loki/Datadog."""

    def format(self, record: logging.LogRecord) -> str:
        log_entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "module": record.module,
            "function": record.funcName,
            "line": record.lineno,
            "pid": record.process,
        }

        for field in ("request_id", "user_id", "trace_id", "method", "path"):
            if hasattr(record, field):
                log_entry[field] = getattr(record, field)

        if record.exc_info and record.exc_info[0] is not None:
            log_entry["exception"] = {
                "type": record.exc_info[0].__name__,
                "message": str(record.exc_info[1]),
                "traceback": self.formatException(record.exc_info),
            }
        return json.dumps(log_entry, ensure_ascii=False, default=str)


def setup_logging(log_level: str = "INFO") -> None:
    root_logger = logging.getLogger()
    root_logger.setLevel(getattr(logging, log_level.upper(), logging.INFO))
    root_logger.handlers.clear()

    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(StructuredJSONFormatter())
    root_logger.addHandler(handler)

    # Giảm noise từ thư viện bên thứ ba
    logging.getLogger("uvicorn.access").setLevel(logging.WARNING)

Request logging middleware

python

# app/middleware.py
import logging
import time
from uuid import uuid4
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware, RequestResponseEndpoint
from starlette.responses import Response

logger = logging.getLogger(__name__)


class RequestLoggingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next: RequestResponseEndpoint) -> Response:
        request_id = request.headers.get("X-Request-ID", str(uuid4()))
        request.state.request_id = request_id
        start_time = time.monotonic()

        logger.info(
            "Request started: %s %s", request.method, request.url.path,
            extra={"request_id": request_id, "method": request.method, "path": request.url.path},
        )
        try:
            response = await call_next(request)
        except Exception:
            logger.exception(
                "Request failed", extra={"request_id": request_id, "path": request.url.path},
            )
            raise

        duration_ms = round((time.monotonic() - start_time) * 1000, 2)
        logger.info(
            "Request completed: %s in %sms", response.status_code, duration_ms,
            extra={"request_id": request_id, "status_code": response.status_code, "duration_ms": duration_ms},
        )
        response.headers["X-Request-ID"] = request_id
        return response

Kubernetes deployment manifest

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    spec:
      terminationGracePeriodSeconds: 45
      containers:
        - name: payment-api
          image: registry.example.com/payment-api:1.0.0
          ports:
            - containerPort: 8000
          resources:
            requests: { cpu: "500m", memory: "256Mi" }
            limits: { cpu: "2", memory: "1Gi" }
          envFrom:
            - secretRef:
                name: payment-api-secrets
          startupProbe:
            httpGet: { path: /health/live, port: 8000 }
            periodSeconds: 5
            failureThreshold: 30
          livenessProbe:
            httpGet: { path: /health/live, port: 8000 }
            periodSeconds: 15
            failureThreshold: 3
          readinessProbe:
            httpGet: { path: /health/ready, port: 8000 }
            periodSeconds: 10
            failureThreshold: 3

Sai lầm điển hình

1. Chạy container bằng user root

Container chạy root → attacker exploit được app → toàn quyền trên container → có thể escape ra host.

dockerfile

# ❌ SAI
FROM python:3.12-slim
COPY . /app
CMD ["gunicorn", "app.main:app"]

# ✅ ĐÚNG: Non-root user với quyền tối thiểu
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN groupadd --gid 1001 appgroup && \
    useradd --uid 1001 --gid appgroup --shell /bin/false appuser
COPY --chown=appuser:appgroup . .
USER appuser
CMD ["gunicorn", "app.main:app", "-c", "gunicorn.conf.py"]

2. Không giới hạn tài nguyên

Không có resource limits → một container ngốn hết RAM/CPU → ảnh hưởng toàn bộ service cùng node.

yaml

# ❌ SAI: Không giới hạn
containers:
  - name: payment-api
    image: payment-api:latest

# ✅ ĐÚNG: Requests + limits rõ ràng
containers:
  - name: payment-api
    image: payment-api:latest
    resources:
      requests: { cpu: "500m", memory: "256Mi" }
      limits: { cpu: "2", memory: "1Gi" }

3. Hardcode secrets

Secrets trong ENV dễ lộ qua docker inspect, /proc/*/environ, crash dumps.

python

# ❌ SAI
DATABASE_URL = "postgresql://admin:SuperSecret123@db:5432/payments"

# ✅ ĐÚNG: Kubernetes Secrets / Vault mount
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    redis_url: str
    jwt_secret_key: str

    model_config = {"secrets_dir": "/run/secrets", "case_sensitive": False}

4. Không có graceful shutdown

SIGTERM mà app tắt ngay → requests bị cắt ngang, transactions không commit.

python

# ❌ SAI: Không xử lý shutdown
app = FastAPI()

@app.on_event("startup")
async def startup():
    app.state.db = await create_pool()
# SIGTERM → pool bị kill → transactions mất

# ✅ ĐÚNG: Lifespan với drain period
@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.db = await create_pool()
    yield
    health_state.mark_not_ready()  # Báo LB ngừng gửi traffic
    await asyncio.sleep(10)        # Drain in-flight requests
    await app.state.db.close()     # Cleanup

app = FastAPI(lifespan=lifespan)

5. Liveness check gọi database

Liveness check phụ thuộc DB → DB chậm → liveness fail → K8s restart TẤT CẢ pods → cascading failure.

python

# ❌ SAI: Liveness phụ thuộc external service
@app.get("/health/live")
async def liveness():
    await db.execute("SELECT 1")  # DB chậm → restart loop!
    return {"status": "alive"}

# ✅ ĐÚNG: Liveness chỉ kiểm tra process sống
@app.get("/health/live")
async def liveness():
    return {"status": "alive"}

# Dependencies thuộc về READINESS
@app.get("/health/ready")
async def readiness():
    db_ok = await check_database()
    if not db_ok:
        return JSONResponse(status_code=503, content={"status": "not_ready"})
    return {"status": "ready"}

Under the Hood

Mô hình pre-fork worker

Gunicorn dùng mô hình pre-fork: master process fork() ra nhiều worker process con. Mỗi worker độc lập với memory space riêng. Khi kết hợp Uvicorn worker, mỗi worker chạy một event loop xử lý hàng nghìn connections đồng thời.

  Master Process (PID 1)
  ├── Quản lý lifecycle workers, lắng nghe signals
  ├── Respawn worker nếu crash
  └── Bind socket trước, workers inherit
       ├── Worker 1 (PID 10) → Uvicorn Event Loop → async handlers
       ├── Worker 2 (PID 11) → Uvicorn Event Loop → async handlers
       └── Worker 3 (PID 12) → Uvicorn Event Loop → async handlers

Signal handling và connection draining

Khi Kubernetes tắt pod, chuỗi sự kiện:

SIGTERM → Gunicorn master nhận, gửi SIGTERM đến workers
Workers ngừng nhận connection mới, xử lý nốt requests đang chạy
Readiness probe trả 503 → load balancer ngừng route traffic
Sau graceful_timeout giây, workers còn sống bị SIGKILL
Master thoát

Quy tắc quan trọng: terminationGracePeriodSeconds (K8s) phải lớn hơn graceful_timeout (Gunicorn). Nếu không, K8s SIGKILL trước khi Gunicorn kịp drain.

Bảng tinh chỉnh hiệu năng

Tham số	Mặc định	Khuyến nghị	Giải thích
`workers`	1	`(2 × CPU) + 1`	Async I/O-bound workload
`worker_connections`	1000	1000-2000	Connections đồng thời mỗi worker
`timeout`	30	120	Kill worker nếu không phản hồi (giây)
`keepalive`	2	5	Giữ TCP connection mở (giây)
`graceful_timeout`	30	30	Chờ worker drain khi shutdown (giây)
`max_requests`	0	1000-5000	Restart worker sau N requests (chống leak)
`max_requests_jitter`	0	100-500	Random tránh restart đồng loạt

Tính worker cho server 4GB RAM:

python

import multiprocessing

cpu_count = multiprocessing.cpu_count()  # Ví dụ: 4 cores
max_by_cpu = cpu_count * 2 + 1           # = 9
# RAM: 4GB - 1GB (OS) = 3GB, mỗi worker ~150MB
max_by_ram = (4096 - 1024) // 150        # = 20
workers = min(max_by_cpu, max_by_ram)    # = 9

Checklist ghi nhớ

✅ Checklist triển khai

Docker & Container

[ ] Multi-stage build — runtime image không chứa build tools
[ ] Non-root user (USER appuser)
[ ] PYTHONDONTWRITEBYTECODE=1 và PYTHONUNBUFFERED=1
[ ] .dockerignore loại bỏ .git, __pycache__, tests/, .env
[ ] Pin version cụ thể cho base image (không dùng latest)

Server & Workers

[ ] Gunicorn + Uvicorn workers cho ASGI apps
[ ] Worker count tính theo cả CPU lẫn RAM — chọn giá trị nhỏ hơn
[ ] max_requests + max_requests_jitter chống memory leak
[ ] graceful_timeout < terminationGracePeriodSeconds

Health Checks & Shutdown

[ ] Liveness chỉ kiểm tra process — KHÔNG gọi external dependencies
[ ] Readiness kiểm tra tất cả dependencies quan trọng
[ ] Startup probe cho apps có thời gian khởi động dài
[ ] Graceful shutdown: mark not-ready → drain → cleanup resources

Observability & Security

[ ] Structured logging (JSON) ghi ra stdout
[ ] Metrics endpoint /metrics cho Prometheus
[ ] Request ID qua header X-Request-ID
[ ] Secrets mount từ K8s Secrets / Vault — KHÔNG ENV plain-text
[ ] Resource limits (CPU + memory) cho mọi container

Bài tập luyện tập

Bài 1: Phân tích cấu hình Gunicorn

python

# gunicorn.conf.py trên server 2 CPU cores, 2GB RAM
workers = 17
worker_class = "uvicorn.workers.UvicornWorker"
timeout = 10
max_requests = 0
graceful_timeout = 60

🧠 Quiz

Câu hỏi: Có bao nhiêu vấn đề trong cấu hình trên? Liệt kê từng vấn đề và cách sửa.

Xem lời giải

4 vấn đề nghiêm trọng:

workers = 17 quá nhiều — CPU cho (2×2)+1 = 5, RAM cho (2048-512)/150 ≈ 10. 17 workers × 150MB = 2.5GB → OOM Kill. Sửa: workers = 5
timeout = 10 quá ngắn — Request hơn 10s (DB query phức tạp, external API) → worker bị kill. Sửa: timeout = 120
max_requests = 0 — Worker chạy mãi → memory leak tích lũy. Sửa: max_requests = 2000, max_requests_jitter = 200
graceful_timeout = 60 — Nếu K8s terminationGracePeriodSeconds mặc định 30s → SIGKILL trước khi drain xong. Sửa: Đảm bảo terminationGracePeriodSeconds > graceful_timeout + 10

python

# Cấu hình đã sửa
workers = 5
worker_class = "uvicorn.workers.UvicornWorker"
timeout = 120
max_requests = 2000
max_requests_jitter = 200
graceful_timeout = 30

Bài 2: Thiết kế health check

Microservice cần kết nối PostgreSQL, Redis (cache), và gọi external payment gateway API.

🧠 Quiz

Câu hỏi: Viết liveness và readiness endpoint. Payment gateway API nên nằm ở liveness, readiness, hay không nằm ở cả hai?

Xem lời giải

python

@app.get("/health/live")
async def liveness():
    """Liveness: KHÔNG check dependency nào.
    DB chậm → liveness fail → K8s restart TẤT CẢ pods → cascading failure."""
    return {"status": "alive"}

@app.get("/health/ready")
async def readiness():
    """Readiness: Check PostgreSQL + Redis (dependencies BẮT BUỘC).

    Payment gateway KHÔNG nằm ở đây vì:
    1. External service ngoài tầm kiểm soát
    2. Gateway down ≠ service ta hỏng
    3. Check gateway → gateway down → tất cả pods not-ready → toàn bộ down
    → Dùng circuit breaker pattern cho payment gateway thay vì health check.
    """
    checks = {}
    all_ok = True

    try:
        async with db_pool.acquire() as conn:
            await asyncio.wait_for(conn.execute("SELECT 1"), timeout=3.0)
        checks["postgresql"] = {"status": "healthy"}
    except Exception as exc:
        checks["postgresql"] = {"status": "unhealthy", "error": str(exc)}
        all_ok = False

    try:
        await asyncio.wait_for(redis_client.ping(), timeout=2.0)
        checks["redis"] = {"status": "healthy"}
    except Exception as exc:
        checks["redis"] = {"status": "unhealthy", "error": str(exc)}
        all_ok = False

    return JSONResponse(
        status_code=200 if all_ok else 503,
        content={"status": "ready" if all_ok else "not_ready", "checks": checks},
    )

Bài 3: Tối ưu Docker image

Image hiện tại 1.2GB:

dockerfile

FROM python:3.12
RUN apt-get update && apt-get install -y gcc g++ make curl wget vim
COPY . .
RUN pip install -r requirements.txt
CMD ["gunicorn", "app.main:app"]

🧠 Quiz

Câu hỏi: Viết lại Dockerfile để giảm xuống dưới 200MB. Giải thích từng thay đổi.

Xem lời giải

dockerfile

FROM python:3.12-slim AS builder
WORKDIR /build
RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential && \
    rm -rf /var/lib/apt/lists/*
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.12-slim AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PATH="/opt/venv/bin:$PATH"
WORKDIR /app
COPY --from=builder /opt/venv /opt/venv
RUN groupadd --gid 1001 appgroup && \
    useradd --uid 1001 --gid appgroup --shell /bin/false appuser
COPY --chown=appuser:appgroup . .
USER appuser
EXPOSE 8000
CMD ["gunicorn", "app.main:app", "-c", "gunicorn.conf.py"]

Thay đổi	Tiết kiệm	Lý do
`python:3.12` → `python:3.12-slim`	~600MB	Bỏ Debian packages không cần
Multi-stage build	~200MB	Build tools chỉ ở stage 1
Bỏ `curl wget vim`	~50MB	Debug tools không thuộc production
`--no-install-recommends`	~30MB	Bỏ recommended packages
`pip --no-cache-dir`	~20MB	Không cache pip downloads

Kết quả: 1.2GB → ~150MB (giảm ~87%).

Production Deployment — Triển khai Python API ​

Bức tranh tư duy ​

Cốt lõi kỹ thuật ​

WSGI và ASGI — Hai giao thức nền tảng ​

Gunicorn + Uvicorn — Kết hợp sức mạnh ​

Docker multi-stage builds — Image nhỏ, bảo mật cao ​

Health checks — Liveness, Readiness, Startup ​

Graceful shutdown — Lifespan protocol ​

Thực chiến ​

Structured logging — Nền tảng observability ​

Request logging middleware ​

Kubernetes deployment manifest ​

Sai lầm điển hình ​

1. Chạy container bằng user root ​

2. Không giới hạn tài nguyên ​

3. Hardcode secrets ​

4. Không có graceful shutdown ​

5. Liveness check gọi database ​

Under the Hood ​

Mô hình pre-fork worker ​

Signal handling và connection draining ​

Bảng tinh chỉnh hiệu năng ​

Checklist ghi nhớ ​

Docker & Container ​

Server & Workers ​

Health Checks & Shutdown ​

Observability & Security ​

Bài tập luyện tập ​

Bài 1: Phân tích cấu hình Gunicorn ​

Bài 2: Thiết kế health check ​

Bài 3: Tối ưu Docker image ​

Liên kết học tiếp ​

Production Deployment — Triển khai Python API

Bức tranh tư duy

Cốt lõi kỹ thuật

WSGI và ASGI — Hai giao thức nền tảng

Gunicorn + Uvicorn — Kết hợp sức mạnh

Docker multi-stage builds — Image nhỏ, bảo mật cao

Health checks — Liveness, Readiness, Startup

Graceful shutdown — Lifespan protocol

Thực chiến

Structured logging — Nền tảng observability

Request logging middleware

Kubernetes deployment manifest

Sai lầm điển hình

1. Chạy container bằng user root

2. Không giới hạn tài nguyên

3. Hardcode secrets

4. Không có graceful shutdown

5. Liveness check gọi database

Under the Hood

Mô hình pre-fork worker

Signal handling và connection draining

Bảng tinh chỉnh hiệu năng

Checklist ghi nhớ

Docker & Container

Server & Workers

Health Checks & Shutdown

Observability & Security

Bài tập luyện tập

Bài 1: Phân tích cấu hình Gunicorn

Bài 2: Thiết kế health check

Bài 3: Tối ưu Docker image

Liên kết học tiếp