Thực hành: HPA & Scaling

🎯 Mục tiêu

🎯 Sau bài thực hành này, bạn sẽ:

Cài đặt và kiểm tra Metrics Server
Cấu hình HPA với CPU và Memory targets
Thực hiện load testing để trigger autoscaling

Mô tả bài tập

Ứng dụng web nhận traffic không đều — bình thường 100 req/s, peak time lên 5000 req/s. Bạn cần thiết lập HPA để tự động scale đáp ứng workload, đồng thời tránh over-provisioning lãng phí tài nguyên.

Yêu cầu

Bài 1: Chuẩn bị — Deployment có Resource Requests

Tạo Deployment với resource requests (BẮT BUỘC cho HPA):

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx:1.24-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
  ports:
  - port: 80
    targetPort: 80

Kiểm tra Metrics Server:

bash

# Bật Metrics Server (minikube)
minikube addons enable metrics-server

# Đợi 1-2 phút, rồi kiểm tra
kubectl top nodes
kubectl top pods

Bài 2: Cấu hình HPA

Tạo HPA cho web-app:

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: # TODO: Minimum replicas
  maxReplicas: # TODO: Maximum replicas
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: # TODO: Target CPU %
  behavior:
    scaleDown:
      stabilizationWindowSeconds: # TODO: Tránh flapping

Yêu cầu:

Min replicas: 2 (high availability)
Max replicas: 10
Target CPU: 50%
Scale down stabilization: 120 seconds

Bài 3: Load Testing — Trigger Autoscaling

Chạy load test để đẩy CPU lên và quan sát HPA scale:

bash

# Terminal 1: Theo dõi HPA
kubectl get hpa web-app-hpa --watch

# Terminal 2: Theo dõi Pods
kubectl get pods -l app=web-app --watch

# Terminal 3: Tạo tải
kubectl run load-generator --rm -it --image=busybox:1.36 -- /bin/sh -c \
  "while true; do wget -q -O- http://web-app.default.svc.cluster.local; done"

Quan sát:

CPU utilization tăng dần → HPA tạo thêm Pod
Dừng load generator (Ctrl+C)
Đợi stabilization window → HPA scale down

Câu hỏi tự kiểm tra:

HPA mất bao lâu để scale up lần đầu?
Sau khi dừng tải, mất bao lâu để scale down?
Nếu xóa resource requests, HPA có hoạt động không?

Gợi ý

💡 Xem gợi ý

Bài 1: Metrics Server cần 1-2 phút để thu thập metrics đầu tiên. Nếu kubectl top pods báo "error: metrics not available", hãy đợi thêm
Bài 2: minReplicas: 2 đảm bảo HA, maxReplicas: 10 tránh runaway scaling. Target CPU 50% cho headroom
Bài 3: Scale up thường nhanh (15-30s), scale down chậm hơn (stabilization window + cooldown)
HPA tính: desiredReplicas = ceil(currentReplicas × (currentMetricValue / desiredMetricValue))
Nếu Pod không có requests, HPA không thể tính % utilization

Lời giải

✅ Xem lời giải

Bài 2: HPA YAML hoàn chỉnh

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Bài 3: Tạo tải nặng hơn với multiple connections

bash

# Tạo tải nặng hơn
kubectl run load-gen --rm -it --image=busybox:1.36 -- /bin/sh -c \
  "while true; do for i in $(seq 1 50); do wget -q -O- http://web-app >/dev/null 2>&1 & done; wait; done"

# Kiểm tra HPA status
kubectl describe hpa web-app-hpa

# Xem events
kubectl get events --field-selector reason=SuccessfulRescale

Kiểm tra nhanh bằng CLI:

bash

# Tạo HPA nhanh (không cần YAML)
kubectl autoscale deployment web-app \
  --min=2 --max=10 --cpu-percent=50

Thực hành: HPA & Scaling ​

Mô tả bài tập ​

Yêu cầu ​

Bài 1: Chuẩn bị — Deployment có Resource Requests ​

Bài 2: Cấu hình HPA ​

Bài 3: Load Testing — Trigger Autoscaling ​

Gợi ý ​

Lời giải ​

Liên kết liên quan ​

Thực hành: HPA & Scaling

Mô tả bài tập

Yêu cầu

Bài 1: Chuẩn bị — Deployment có Resource Requests

Bài 2: Cấu hình HPA

Bài 3: Load Testing — Trigger Autoscaling

Gợi ý

Lời giải

Liên kết liên quan