🖥️ Compute Decisioning

Level: Core Solves: Chọn đúng compute platform cho workload dựa trên requirements về control, scalability, và operational overhead

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

Áp dụng Decision Framework để chọn compute platform phù hợp cho từng workload
Thiết kế EC2 Strategy với instance selection, Graviton migration, và Spot instances
Triển khai Container Workloads với ECS hoặc EKS dựa trên team expertise
Tối ưu Lambda cho event-driven workloads với cold start mitigation
Implement Spot Strategy cho cost optimization của fault-tolerant workloads
Cấu hình Security Baseline cho mỗi loại compute (IMDSv2, container scanning)

✅ Khi nào dùng

Compute Option	Use Case chính	Tại sao
EC2	Full OS control, legacy apps, GPU workloads	Làm được mọi thứ, nhưng ops overhead cao
ECS Fargate	Containerized apps, small teams	Serverless containers, ít ops
ECS EC2	High-volume containers, cost-sensitive	Tự manage capacity, Reserved Instances
EKS	K8s ecosystem, multi-cloud strategy	Portable, rich ecosystem (Istio, ArgoCD)
Lambda	Event-driven, < 15 min, variable traffic	Pay-per-use, zero ops

❌ Khi nào KHÔNG dùng

Pattern	Vấn đề	Thay thế
EC2 cho stateless APIs	Over-provisioning, slow scaling	ECS/EKS hoặc Lambda
EKS cho team < 5 người	Complexity overhead, $72/tháng control plane	ECS Fargate
Lambda cho high-throughput	Cost explodes, cold starts	ECS/EKS
Fargate cho GPU workloads	Không support GPU	EC2 với GPU instances
Spot cho databases	Data loss khi interruption	On-Demand hoặc Reserved

⚠️ Cảnh báo từ Raizo

"Tôi đã thấy startup 3 người chọn EKS vì 'industry standard'. 6 tháng sau, 50% thời gian DevOps là debug K8s thay vì build features. ECS Fargate có thể deploy trong 1 ngày. Chọn compute dựa trên team capability, không phải CV."

🎮 Scenario Choice Quiz

Hãy trả lời các câu hỏi sau để xác định compute platform phù hợp:

📝 Tình huống 1: API Backend cho Mobile App

Mô tả: Bạn cần deploy REST API cho mobile app với:

Traffic biến đổi: 100 req/s ban ngày, 1000 req/s giờ cao điểm
Team 4 developers, không có DevOps chuyên trách
Latency p99 < 200ms
Budget limited

Lựa chọn:

A) EC2 với Auto Scaling Group
B) ECS Fargate
C) EKS
D) Lambda + API Gateway

💡 Xem đáp án

Đáp án: B) ECS Fargate hoặc D) Lambda + API Gateway

Phân tích:

ECS Fargate:
- ✅ Auto-scaling built-in
- ✅ Ít ops overhead cho team nhỏ
- ✅ Consistent latency
- ✅ Cost hiệu quả cho sustained traffic
Lambda + API Gateway:
- ✅ Zero ops
- ✅ Pay-per-request tốt cho variable traffic
- ⚠️ Cold starts có thể ảnh hưởng p99 latency
- ⚠️ Cần Provisioned Concurrency nếu latency critical
Tại sao không EC2: Over-provisioning, ops overhead cao
Tại sao không EKS: Team 4 người không có K8s expertise, control plane cost

📝 Tình huống 2: ML Training Pipeline

Mô tả: Bạn cần run ML training jobs:

Cần GPU (NVIDIA A100)
Job chạy 2-8 giờ
Data input từ S3, model output về S3
Chạy hàng ngày, có thể tạm dừng nếu cần

Lựa chọn:

A) EC2 Spot với p4d instances
B) ECS Fargate
C) Lambda
D) SageMaker Training Jobs

💡 Xem đáp án

Đáp án: A) EC2 Spot với p4d instances hoặc D) SageMaker

Phân tích:

EC2 Spot + GPU:
- ✅ GPU support (p4d/g5)
- ✅ Tiết kiệm 60-90% cost với Spot
- ✅ Full control để optimize
- ⚠️ Cần implement checkpointing cho Spot interruption
SageMaker Training:
- ✅ Managed, integrated với ML workflow
- ✅ Automatic Spot handling
- ✅ Distributed training dễ hơn
- ⚠️ Chi phí cao hơn EC2
Tại sao không Fargate: Không support GPU
Tại sao không Lambda: 15 min limit, không GPU

📝 Tình huống 3: Image Processing Pipeline

Mô tả: Xử lý images được upload lên S3:

Mỗi image cần resize, watermark, generate thumbnails
Xử lý mất 10-30 giây/image
Volume: 10,000 images/ngày, peaks 1000/giờ
Không cần real-time

Lựa chọn:

A) EC2 với SQS consumer
B) ECS Fargate với SQS
C) Lambda triggered by S3
D) Step Functions với Lambda

💡 Xem đáp án

Đáp án: C) Lambda triggered by S3 hoặc D) Step Functions

Phân tích:

Lambda với S3 trigger:
- ✅ Perfect fit: event-driven, short duration
- ✅ Auto-scale từ 0 đến thousands
- ✅ Pay only khi có images
- ✅ Simple implementation
Step Functions + Lambda:
- ✅ Nếu cần complex workflow (retry, branching)
- ✅ Visibility vào processing state
- ⚠️ Thêm complexity

Cost estimate (Lambda):

10,000 images × 20s × 1GB memory
= 200,000 GB-seconds = ~$3.33/ngày = ~$100/tháng

📝 Tình huống 4: Microservices Platform

Mô tả: Migrate 20 microservices từ on-prem:

Team 15 developers, 3 DevOps
Đã có expertise Kubernetes on-prem
Cần service mesh, canary deployments
Multi-cloud strategy trong roadmap

Lựa chọn:

A) EC2 với Docker Compose
B) ECS với Application Load Balancer
C) EKS với Istio
D) Lambda cho tất cả services

💡 Xem đáp án

Đáp án: C) EKS với Istio

Phân tích:

EKS:
- ✅ Team đã có K8s expertise
- ✅ Service mesh (Istio) support
- ✅ Canary/Blue-green với ArgoCD
- ✅ Multi-cloud portable (GKE, AKS)
- ⚠️ $72/tháng control plane (acceptable với scale này)

Tại sao EKS đúng trong case này:

Team đã có expertise - không learning curve
20 services + service mesh = K8s ecosystem fit
Multi-cloud requirement = portability quan trọng
3 DevOps có thể manage EKS

Decision Framework

Quick Decision Matrix

┌─────────────────────────────────────────────────────────────────┐
│                 COMPUTE DECISION MATRIX                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Need full OS control?                                          │
│  ├── YES → EC2                                                  │
│  └── NO → Continue...                                           │
│                                                                 │
│  Workload duration?                                             │
│  ├── < 15 minutes, event-driven → Lambda                        │
│  └── Long-running → Continue...                                 │
│                                                                 │
│  Need Kubernetes ecosystem?                                     │
│  ├── YES → EKS                                                  │
│  └── NO → Continue...                                           │
│                                                                 │
│  Container orchestration needs?                                 │
│  ├── Simple → ECS Fargate                                       │
│  └── Complex, cost-sensitive → ECS EC2                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Detailed Comparison

Factor	EC2	ECS Fargate	ECS EC2	EKS	Lambda
Control	Full	Container	Container + Host	Full K8s	Function
Scaling	Manual/ASG	Automatic	ASG	HPA/Karpenter	Automatic
Pricing	Per instance	Per vCPU/memory	Per instance	Control plane + nodes	Per invocation
Cold Start	Minutes	Seconds	Seconds	Seconds	Milliseconds-seconds
Max Runtime	Unlimited	Unlimited	Unlimited	Unlimited	15 minutes
Ops Overhead	High	Low	Medium	High	Very Low

EC2 Deep Dive

Instance Selection Strategy

Instance Family Guide

Family	Use Case	Example Workloads
M	General purpose	Web servers, small DBs
C	Compute optimized	Batch processing, gaming
R	Memory optimized	In-memory caching, analytics
I	Storage optimized	Data warehousing, Elasticsearch
P/G	GPU	ML training, video encoding
T	Burstable	Dev/test, low-traffic sites

Graviton (ARM) Considerations

💡 Graviton Benefits

20-40% better price/performance vs x86
Lower power consumption
Native support for most workloads
Requires ARM-compatible AMIs and containers

┌─────────────────────────────────────────────────────────────────┐
│              GRAVITON MIGRATION CHECKLIST                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ✅ Compatible:                                                 │
│  • Most interpreted languages (Python, Node.js, Ruby)           │
│  • Java (with ARM JDK)                                          │
│  • Go (cross-compile for ARM)                                   │
│  • Containerized workloads (multi-arch images)                  │
│                                                                 │
│  ⚠️ Check Compatibility:                                        │
│  • Native compiled binaries                                     │
│  • Third-party agents/tools                                     │
│  • Specific library dependencies                                │
│                                                                 │
│  ❌ Not Compatible:                                             │
│  • x86-specific assembly code                                   │
│  • Windows workloads                                            │
│  • Some legacy applications                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Container Services

ECS vs EKS Decision

┌─────────────────────────────────────────────────────────────────┐
│                    ECS vs EKS                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  CHOOSE ECS WHEN:                    CHOOSE EKS WHEN:           │
│  ─────────────────                   ─────────────────          │
│  • AWS-native preferred              • K8s ecosystem needed     │
│  • Simpler operations                • Multi-cloud strategy     │
│  • Tight AWS integration             • Existing K8s expertise   │
│  • Cost-sensitive (no control plane) • Complex networking       │
│  • Smaller teams                     • Service mesh (Istio)     │
│                                                                 │
│  ECS Pricing:                        EKS Pricing:               │
│  • $0 for control plane              • $0.10/hour control plane │
│  • Pay for compute only              • + compute costs          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Fargate vs EC2 Launch Type

Aspect	Fargate	EC2 Launch Type
Management	Serverless	You manage instances
Pricing	Per task vCPU/memory	Per instance
Cost at Scale	Higher	Lower (with Reserved)
GPU Support	No	Yes
Spot Support	Yes (Fargate Spot)	Yes
Customization	Limited	Full

ECS Task Definition Example

json

{
  "family": "web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "containerDefinitions": [
    {
      "name": "app",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "app"
        }
      },
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
        }
      ]
    }
  ]
}

Lambda Considerations

When Lambda Shines

┌─────────────────────────────────────────────────────────────────┐
│                 LAMBDA SWEET SPOTS                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ✅ IDEAL FOR:                                                  │
│  • Event-driven processing (S3, SQS, DynamoDB Streams)          │
│  • API backends with variable traffic                           │
│  • Scheduled tasks (cron jobs)                                  │
│  • Data transformation pipelines                                │
│  • Webhooks and integrations                                    │
│                                                                 │
│  ⚠️ CONSIDER ALTERNATIVES WHEN:                                 │
│  • Execution > 15 minutes                                       │
│  • Consistent high traffic (cost inefficient)                   │
│  • Cold start latency unacceptable                              │
│  • Need persistent connections (WebSockets - use API GW)        │
│  • Large memory requirements (> 10GB)                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Cold Start Mitigation

Lambda Cost Optimization

┌─────────────────────────────────────────────────────────────────┐
│              LAMBDA COST FORMULA                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Cost = (Requests × $0.20/1M) + (GB-seconds × $0.0000166667)   │
│                                                                 │
│  Example: 1M requests/month, 512MB, 200ms average               │
│  ─────────────────────────────────────────────────────────────  │
│  Request cost: 1M × $0.20/1M = $0.20                            │
│  Compute cost: 1M × 0.5GB × 0.2s × $0.0000166667 = $1.67       │
│  Total: $1.87/month                                             │
│                                                                 │
│  OPTIMIZATION TIPS:                                             │
│  • Right-size memory (affects CPU allocation)                   │
│  • Use ARM (Graviton2) for 20% cost reduction                   │
│  • Batch processing where possible                              │
│  • Consider Fargate for consistent high traffic                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Spot Instances Strategy

Spot Best Practices

Diversify across instance types and AZs
Use Spot Fleet or ASG with mixed instances
Implement graceful shutdown handling
Set appropriate Spot price limits
Use Spot for stateless, fault-tolerant workloads

Spot Interruption Handling

python

# Check for Spot interruption notice (2-minute warning)
import requests

def check_spot_interruption():
    try:
        response = requests.get(
            "http://169.254.169.254/latest/meta-data/spot/instance-action",
            timeout=2
        )
        if response.status_code == 200:
            # Interruption notice received
            action = response.json()
            # Graceful shutdown: drain connections, save state
            initiate_graceful_shutdown()
    except requests.exceptions.RequestException:
        # No interruption notice
        pass

Best Practices Checklist

[ ] Use Graviton instances where compatible
[ ] Implement auto-scaling for all compute
[ ] Use Spot for fault-tolerant workloads
[ ] Right-size instances based on metrics
[ ] Use Reserved Instances for baseline capacity
[ ] Implement proper health checks
[ ] Use IMDSv2 for metadata service
[ ] Enable detailed monitoring

⚖️ Trade-offs

Trade-off 1: ECS vs EKS

Khía cạnh	ECS	EKS
Learning curve	AWS-native, dễ học	K8s complexity, steep curve
Control plane cost	$0	$72/tháng (~$0.10/giờ)
Ecosystem	AWS-only integrations	Rich K8s ecosystem
Portability	AWS lock-in	Multi-cloud portable
Team size	Nhỏ (2-5 người)	Lớn (5+ với DevOps)

Ngữ cảnh enterprise: Một fintech startup ban đầu chọn EKS vì "hầu hết big tech dùng K8s". Sau 1 năm, họ realize:

40% thời gian DevOps là maintain K8s
Upgrade control plane là nightmare
Họ chỉ có 10 services, không cần Istio

Họ migrate sang ECS và giảm DevOps headcount từ 3 xuống 1.

Trade-off 2: Fargate vs EC2 Launch Type

Khía cạnh	Fargate	EC2 Launch Type
Management	Serverless	Tự manage instances
Cost/task	Cao hơn	Thấp hơn với Reserved
Cost at low scale	Thấp (chỉ pay khi run)	Cao (always-on instances)
Cost at high scale	Cao	Thấp với Reserved
GPU	Không support	Có

Break-even analysis:

Fargate: 0.25 vCPU, 0.5GB = $0.01244/giờ = $8.96/tháng per task

EC2 (t3.micro Reserved 1yr): $3.80/tháng
  - Có thể chạy 2-3 small tasks
  - Break-even: ~3 concurrent tasks

Kết luận:
- < 3 concurrent tasks: Fargate rẻ hơn
- > 3 concurrent tasks stable: EC2 rẻ hơn

Trade-off 3: Lambda vs Always-On Containers

Khía cạnh	Lambda	ECS/EKS
Billing	Per-invocation	Per-second
Cold starts	Có (100ms - 10s)	Không
Max duration	15 phút	Unlimited
Concurrent limit	1000 (soft)	Unlimited
Cost at variable load	Thấp	Cao (over-provision)
Cost at steady high load	Cao	Thấp

Cost crossover point:

Lambda (1GB, 200ms avg):
- Per request: $0.0000166667 × 0.2s + $0.0000002 = $0.00000533
- 1M requests = $5.33

Fargate (0.5 vCPU, 1GB, running 24/7):
- $29.55/tháng
- Break-even: ~5.5M requests/tháng

Nếu > 5.5M requests/tháng stable: Fargate rẻ hơn Lambda

🚨 Failure Modes

Failure Mode 1: Cold Start Impact

🔥 Incident thực tế

Lambda function cho payment API. Cold start 3-5 giây vi phạm SLA 2s response time. Khách hàng nhận được timeout errors. Team không biết về Provisioned Concurrency.

Cách phát hiện	Cách phòng tránh
CloudWatch `Duration` metric spikes	Provisioned Concurrency cho latency-critical
p99 latency >> p50	Optimize package size, lazy loading
Init duration trong X-Ray	Dùng Graviton (faster cold start)
User complaints	Keep-warm strategy cho low-traffic functions

Failure Mode 2: Spot Interruption không handle

┌─────────────────────────────────────────────────────────────────┐
│                 SPOT INTERRUPTION DISASTER                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Timeline:                                                      │
│  T=0      Spot instance running ML training job (hour 6 of 8)   │
│  T=2min   AWS sends interruption notice                         │
│  T=2min   App ignores notice (no handler implemented)           │
│  T=2min   Instance terminated                                   │
│  T=2min   6 giờ training LOST, phải restart từ đầu               │
│                                                                 │
│  Impact: +8 giờ delay, cost gấp đôi                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Cách phát hiện	Cách phòng tránh
CloudWatch Spot interruption events	Implement 2-minute warning handler
Incomplete jobs	Checkpoint state thường xuyên
Repeated job failures	Multi-AZ, multi-instance-type Spot Fleet

Failure Mode 3: Container Image Vulnerabilities

Cách phát hiện	Cách phòng tránh
ECR scan findings	Enable ECR scanning, block critical CVEs
Security Hub container findings	Scan trong CI/CD trước khi push
Third-party scanner (Trivy, Snyk)	Base image update policy
Runtime security (Falco)	Minimal base images (distroless, Alpine)

🔐 Security Baseline

Compute Security Requirements

Requirement	EC2	ECS/EKS	Lambda
IMDSv2	Bắt buộc	N/A	N/A
Container scanning	N/A	ECR scan	N/A
Code signing	AMI signing	Image signing	Code signing
Secrets	Secrets Manager	Secrets Manager/K8s Secrets	Secrets Manager
IAM	Instance Profile	Task Role/IRSA	Execution Role
Network	Security Groups	Security Groups	VPC (optional)

EC2 Security Hardening

json

{
  "MetadataOptions": {
    "HttpTokens": "required",
    "HttpPutResponseHopLimit": 1,
    "HttpEndpoint": "enabled"
  }
}

Container Security Checklist

┌─────────────────────────────────────────────────────────────────┐
│               CONTAINER SECURITY LAYERS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. BUILD TIME:                                                 │
│     ☐ Minimal base images (distroless/Alpine)                   │
│     ☐ No secrets trong image                                    │
│     ☐ Scan trong CI pipeline                                    │
│     ☐ Sign images                                               │
│                                                                 │
│  2. REGISTRY:                                                   │
│     ☐ ECR scanning enabled                                      │
│     ☐ Immutable tags                                            │
│     ☐ Lifecycle policies                                        │
│                                                                 │
│  3. RUNTIME:                                                    │
│     ☐ Read-only root filesystem                                 │
│     ☐ Non-root user                                             │
│     ☐ Drop all capabilities                                     │
│     ☐ Resource limits                                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

📊 Ops Readiness

Metrics cần Monitoring

Compute Type	Key Metrics	Alert Threshold
EC2	CPUUtilization, StatusCheckFailed	> 80%, > 0
ECS	CPUUtilization, MemoryUtilization, RunningTaskCount	> 80%, < desired
EKS	Pod CPU/Memory, Node status, PodRestarts	> 80%, NotReady, > 5
Lambda	Duration, Errors, Throttles, ConcurrentExecutions	> 80% timeout, > 0, > 0, > 80% limit

Alerting Configuration

json

{
  "AlarmName": "HighLambdaErrors",
  "MetricName": "Errors",
  "Namespace": "AWS/Lambda",
  "Dimensions": [
    {"Name": "FunctionName", "Value": "my-function"}
  ],
  "Statistic": "Sum",
  "Period": 300,
  "Threshold": 10,
  "ComparisonOperator": "GreaterThanThreshold",
  "EvaluationPeriods": 1,
  "TreatMissingData": "notBreaching",
  "AlarmActions": ["arn:aws:sns:region:account:compute-alerts"]
}

Runbook Entry Points

Tình huống	Runbook
EC2 instance failed health check	`runbook/ec2-health-check-failure.md`
ECS task failing to start	`runbook/ecs-task-troubleshooting.md`
EKS pod CrashLoopBackOff	`runbook/eks-pod-debugging.md`
Lambda throttling	`runbook/lambda-concurrency-increase.md`
Spot interruption spike	`runbook/spot-capacity-management.md`
Container CVE detected	`runbook/container-vulnerability-response.md`

✅ Design Review Checklist

Compute Selection

[ ] Compute platform phù hợp với team expertise
[ ] Cost model match với traffic pattern
[ ] Scaling strategy defined (HPA/ASG/Lambda concurrency)
[ ] Reserved capacity cho baseline (nếu applicable)

Security

[ ] IMDSv2 enforced (EC2)
[ ] Container images scanned và không có critical CVEs
[ ] Secrets không hardcoded, dùng Secrets Manager
[ ] IAM roles least-privilege

Reliability

[ ] Multi-AZ deployment
[ ] Health checks configured
[ ] Graceful shutdown implemented (Spot, SIGTERM)
[ ] Auto-scaling tested

Operations

[ ] Monitoring và alerting configured
[ ] Logs centralized
[ ] Runbooks documented
[ ] Cost monitoring per workload

📎 Liên kết

📎 GCP Compute Patterns - So sánh với GCE, GKE, Cloud Run, Cloud Functions
📎 VPC & Networking - Network requirements cho compute options
📎 Cost Governance - Compute cost optimization strategies
📎 Reliability & DR - High availability patterns cho compute
📎 Terraform Modules - IaC patterns cho compute deployments

🖥️ Compute Decisioning ​

🎯 Mục tiêu (Outcomes) ​

✅ Khi nào dùng ​

❌ Khi nào KHÔNG dùng ​

🎮 Scenario Choice Quiz ​

Decision Framework ​

Quick Decision Matrix ​

Detailed Comparison ​

EC2 Deep Dive ​

Instance Selection Strategy ​

Instance Family Guide ​

Graviton (ARM) Considerations ​

Container Services ​

ECS vs EKS Decision ​

Fargate vs EC2 Launch Type ​

ECS Task Definition Example ​

Lambda Considerations ​

When Lambda Shines ​

Cold Start Mitigation ​

Lambda Cost Optimization ​

Spot Instances Strategy ​

Spot Best Practices ​

Spot Interruption Handling ​

Best Practices Checklist ​

⚖️ Trade-offs ​

Trade-off 1: ECS vs EKS ​

Trade-off 2: Fargate vs EC2 Launch Type ​

Trade-off 3: Lambda vs Always-On Containers ​

🚨 Failure Modes ​

Failure Mode 1: Cold Start Impact ​

Failure Mode 2: Spot Interruption không handle ​

Failure Mode 3: Container Image Vulnerabilities ​

🔐 Security Baseline ​

Compute Security Requirements ​

EC2 Security Hardening ​

Container Security Checklist ​

📊 Ops Readiness ​

Metrics cần Monitoring ​

Alerting Configuration ​

Runbook Entry Points ​

✅ Design Review Checklist ​

Compute Selection ​

Security ​

Reliability ​

Operations ​

📎 Liên kết ​

🖥️ Compute Decisioning

🎯 Mục tiêu (Outcomes)

✅ Khi nào dùng

❌ Khi nào KHÔNG dùng

🎮 Scenario Choice Quiz

Decision Framework

Quick Decision Matrix

Detailed Comparison

EC2 Deep Dive

Instance Selection Strategy

Instance Family Guide

Graviton (ARM) Considerations

Container Services

ECS vs EKS Decision

Fargate vs EC2 Launch Type

ECS Task Definition Example

Lambda Considerations

When Lambda Shines

Cold Start Mitigation

Lambda Cost Optimization

Spot Instances Strategy

Spot Best Practices

Spot Interruption Handling

Best Practices Checklist

⚖️ Trade-offs

Trade-off 1: ECS vs EKS

Trade-off 2: Fargate vs EC2 Launch Type

Trade-off 3: Lambda vs Always-On Containers

🚨 Failure Modes

Failure Mode 1: Cold Start Impact

Failure Mode 2: Spot Interruption không handle

Failure Mode 3: Container Image Vulnerabilities

🔐 Security Baseline

Compute Security Requirements

EC2 Security Hardening

Container Security Checklist

📊 Ops Readiness

Metrics cần Monitoring

Alerting Configuration

Runbook Entry Points

✅ Design Review Checklist

Compute Selection

Security

Reliability

Operations

📎 Liên kết