Skip to content

🔴 Testing & CI/CD

Level: Advanced Solves: Đảm bảo chất lượng và tự động hóa deployment cho Terraform infrastructure

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Thiết lập Testing Pyramid cho IaC
  • Sử dụng Static Analysis (tflint, checkov)
  • Implement Policy-as-Code với OPA/Conftest
  • Viết Integration Tests với Terratest
  • Thiết kế CI/CD Pipeline an toàn
  • Cấu hình OIDC Authentication

Khi nào dùng

Test TypeUse CaseLý do
terraform validateMọi commitSyntax check
TFLintMọi commitBest practices
CheckovMọi PRSecurity scan
OPA policiesMọi PRCustom policies
TerratestModule releaseIntegration test

Khi nào KHÔNG dùng

PatternVấn đềThay thế
Skip validateBroken commitsAlways validate
No security scanVulnerabilities shippedCheckov/tfsec
Local applyNo audit, conflictsCI/CD only
No plan reviewAccidental destroysRequire approval

⚠️ Cảnh báo từ Raizo

"Developer push trực tiếp và apply không qua review. Destroy production database. Sau đó team bắt buộc PR + approval + CI/CD. 0 incidents kể từ đó."

Tại sao cần Testing cho IaC?

💡 Giáo sư Tom

"It works on my machine" trong IaC là "terraform plan looks good". Plan không catch logic errors, security misconfigurations, hay integration issues. Testing IaC không phải luxury - nó là necessity cho production.

Testing Pyramid cho IaC

┌─────────────────────────────────────────────────────────────────┐
│                    IAC TESTING PYRAMID                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                         ▲                                       │
│                        /│\                                      │
│                       / │ \     End-to-End Tests                │
│                      /  │  \    (Full infrastructure)           │
│                     /   │   \   Slow, expensive                 │
│                    ─────┼─────                                  │
│                   /     │     \  Integration Tests              │
│                  /      │      \ (Module + cloud)               │
│                 /       │       \ Medium speed                  │
│                ─────────┼─────────                              │
│               /         │         \  Unit Tests                 │
│              /          │          \ (Static analysis)          │
│             /           │           \ Fast, cheap               │
│            ─────────────┴─────────────                          │
│                                                                 │
│  TESTING TYPES:                                                 │
│  • Static Analysis: terraform validate, fmt, tflint             │
│  • Policy Tests: OPA, Sentinel, Checkov                         │
│  • Unit Tests: Module logic without cloud                       │
│  • Integration Tests: Terratest with real cloud                 │
│  • E2E Tests: Full environment deployment                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Static Analysis

terraform validate & fmt

bash
# Validate syntax
terraform validate

# Check formatting
terraform fmt -check -recursive

# Auto-format
terraform fmt -recursive

TFLint

bash
# Install
brew install tflint

# Initialize with plugins
tflint --init

# Run linting
tflint --recursive
hcl
# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.27.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}

rule "aws_instance_invalid_type" {
  enabled = true
}

Checkov (Security Scanning)

bash
# Install
pip install checkov

# Scan directory
checkov -d .

# Scan with specific framework
checkov -d . --framework terraform

# Output as JSON
checkov -d . -o json > checkov-results.json
yaml
# .checkov.yaml
skip-check:
  - CKV_AWS_79  # Skip specific check
  - CKV_AWS_88
  
framework:
  - terraform
  
compact: true

Policy-as-Code

Open Policy Agent (OPA)

rego
# policy/terraform.rego
package terraform

# Deny public S3 buckets
deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  resource.change.after.acl == "public-read"
  msg := sprintf("S3 bucket '%s' cannot be public", [resource.address])
}

# Require encryption
deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  not resource.change.after.server_side_encryption_configuration
  msg := sprintf("S3 bucket '%s' must have encryption enabled", [resource.address])
}

# Enforce tagging
deny[msg] {
  resource := input.resource_changes[_]
  required_tags := {"Environment", "Owner", "Project"}
  provided_tags := {tag | resource.change.after.tags[tag]}
  missing := required_tags - provided_tags
  count(missing) > 0
  msg := sprintf("Resource '%s' missing required tags: %v", [resource.address, missing])
}
bash
# Generate plan JSON
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json

# Evaluate with OPA
opa eval --data policy/ --input tfplan.json "data.terraform.deny"

Conftest (OPA Wrapper)

bash
# Install
brew install conftest

# Run policy tests
conftest test tfplan.json -p policy/

# Output
# FAIL - tfplan.json - terraform - S3 bucket 'aws_s3_bucket.public' cannot be public

HashiCorp Sentinel

python
# policy/require-tags.sentinel
import "tfplan/v2" as tfplan

required_tags = ["Environment", "Owner", "Project"]

# Get all resources
all_resources = filter tfplan.resource_changes as _, rc {
  rc.mode is "managed" and
  rc.change.actions contains "create"
}

# Check tags
main = rule {
  all all_resources as _, resource {
    all required_tags as tag {
      resource.change.after.tags contains tag
    }
  }
}

Integration Testing with Terratest

Basic Terratest Example

go
// test/vpc_test.go
package test

import (
    "testing"
    
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
    t.Parallel()
    
    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "name":       "test",
            "cidr_block": "10.0.0.0/16",
        },
    })
    
    // Clean up after test
    defer terraform.Destroy(t, terraformOptions)
    
    // Deploy infrastructure
    terraform.InitAndApply(t, terraformOptions)
    
    // Get outputs
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    cidrBlock := terraform.Output(t, terraformOptions, "vpc_cidr_block")
    
    // Assertions
    assert.NotEmpty(t, vpcId)
    assert.Equal(t, "10.0.0.0/16", cidrBlock)
}

Testing with AWS SDK

go
func TestVpcWithAwsSdk(t *testing.T) {
    t.Parallel()
    
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "name":       "test",
            "cidr_block": "10.0.0.0/16",
        },
    }
    
    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)
    
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    
    // Use AWS SDK to verify
    awsRegion := "us-east-1"
    vpc := aws.GetVpcById(t, vpcId, awsRegion)
    
    assert.Equal(t, "10.0.0.0/16", aws.GetCidrBlockForVpc(t, vpc, awsRegion))
    assert.True(t, aws.IsVpcDnsEnabled(t, vpc, awsRegion))
}

Test Stages (Speed Optimization)

go
func TestVpcStages(t *testing.T) {
    t.Parallel()
    
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
    }
    
    // Stage 1: Deploy (skip if already deployed)
    defer test_structure.RunTestStage(t, "teardown", func() {
        terraform.Destroy(t, terraformOptions)
    })
    
    test_structure.RunTestStage(t, "setup", func() {
        terraform.InitAndApply(t, terraformOptions)
    })
    
    // Stage 2: Validate
    test_structure.RunTestStage(t, "validate", func() {
        vpcId := terraform.Output(t, terraformOptions, "vpc_id")
        assert.NotEmpty(t, vpcId)
    })
}
bash
# Run specific stage
SKIP_teardown=true go test -v -run TestVpcStages

# Skip setup (use existing infrastructure)
SKIP_setup=true go test -v -run TestVpcStages

CI/CD Pipeline Design

GitHub Actions Pipeline

yaml
# .github/workflows/terraform.yml
name: Terraform CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  TF_VERSION: "1.6.0"
  AWS_REGION: "us-east-1"

jobs:
  # Stage 1: Static Analysis
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}
      
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
      
      - name: Terraform Validate
        run: |
          terraform init -backend=false
          terraform validate
      
      - name: TFLint
        uses: terraform-linters/setup-tflint@v4
      - run: |
          tflint --init
          tflint --recursive

  # Stage 2: Security Scan
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Checkov Scan
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: .
          framework: terraform
          output_format: sarif
          output_file_path: checkov.sarif
      
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: checkov.sarif

  # Stage 3: Plan
  plan:
    needs: [validate, security]
    runs-on: ubuntu-latest
    environment: ${{ github.event_name == 'pull_request' && 'dev' || 'prod' }}
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Plan
        id: plan
        run: |
          terraform plan -out=tfplan -no-color
          terraform show -json tfplan > tfplan.json
      
      - name: Policy Check (OPA)
        run: |
          conftest test tfplan.json -p policy/
      
      - name: Upload Plan
        uses: actions/upload-artifact@v4
        with:
          name: tfplan
          path: tfplan

  # Stage 4: Apply (main branch only)
  apply:
    needs: plan
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    runs-on: ubuntu-latest
    environment: prod  # Requires approval
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}
      
      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: tfplan
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

PR Comment with Plan

yaml
- name: Comment Plan on PR
  uses: actions/github-script@v7
  if: github.event_name == 'pull_request'
  with:
    script: |
      const output = `#### Terraform Plan 📖
      
      \`\`\`
      ${{ steps.plan.outputs.stdout }}
      \`\`\`
      
      *Pushed by: @${{ github.actor }}*`;
      
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: output
      })

Pipeline Best Practices

1. OIDC Authentication (No Long-Lived Credentials)

yaml
# GitHub Actions OIDC
- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
    aws-region: us-east-1
    # No access keys needed!
hcl
# AWS IAM Role for GitHub Actions
resource "aws_iam_role" "github_actions" {
  name = "GitHubActionsRole"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/token.actions.githubusercontent.com"
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
          }
          StringLike = {
            "token.actions.githubusercontent.com:sub" = "repo:company/infra:*"
          }
        }
      }
    ]
  })
}

2. Environment Protection Rules

yaml
# Require approval for production
environment: prod

# GitHub Settings:
# - Required reviewers: 2
# - Wait timer: 5 minutes
# - Restrict to specific branches

3. Drift Detection Schedule

yaml
name: Drift Detection
on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - name: Check for Drift
        run: |
          terraform plan -detailed-exitcode
        continue-on-error: true
        id: plan
      
      - name: Notify on Drift
        if: steps.plan.outcome == 'failure'
        run: |
          # Send Slack notification
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text": "⚠️ Infrastructure drift detected!"}'

Best Practices Checklist

  • [ ] CI validates every commit
  • [ ] Security scan on every PR
  • [ ] Plan requires review
  • [ ] Apply from CI only
  • [ ] OIDC authentication
  • [ ] Environment protection rules
  • [ ] Drift detection scheduled
  • [ ] Test modules before release

⚖️ Trade-offs

Trade-off 1: Test Coverage vs Speed

LevelCoverageSpeedCost
Static onlyLowFastFree
+ PoliciesMediumFastFree
+ TerratestHighSlowCloud $$
+ E2EHighestVery slowCloud $$$

Khuyến nghị: Static + Policies cho mọi PR, Terratest cho module releases.


Trade-off 2: Policy Strictness

LevelSecurityDeveloper Experience
LooseLowEasy
ModerateMediumGood
StrictHighFriction

Trade-off 3: CI/CD Platform

PlatformIntegrationCost
GitHub ActionsExcellentFree tier
GitLab CIExcellentFree tier
Terraform CloudBuilt-in$$$
JenkinsManualSelf-hosted

🚨 Failure Modes

Failure Mode 1: CI Credentials Leaked

🔥 Incident thực tế

CI secrets stored as env vars. Log file exposed credentials. Attacker gained AWS admin. $50K cleanup cost.

Cách phát hiệnCách phòng tránh
Unusual API activityOIDC, no static keys
GuardDuty alertsSecret scanning
Bill spikeLeast privilege

Failure Mode 2: Apply Without Review

Cách phát hiệnCách phòng tránh
Unexpected changesBranch protection
Production issuesEnvironment approval
Audit failuresRequire reviewers

Failure Mode 3: Flaky Tests

Cách phát hiệnCách phòng tránh
Random failuresRetry logic
Rate limitsTest throttling
Resource conflictsUnique naming

🔐 Security Baseline

CI/CD Security Requirements

RequirementImplementationVerification
No static credentialsOIDCAudit secrets
Least privilegeScoped rolesIAM review
Secret scanningGitHub secret scanningAlert review
Audit loggingCloudTrailLog analysis
Environment protectionRequired reviewersConfig check

Pipeline Security Checklist

ItemStatus
OIDC authentication☑ Required
No hardcoded secrets☑ Required
Branch protection☑ Required
Required reviewers☑ Required
Environment approvals☑ Production
Audit logging☑ Required

📊 Ops Readiness

Metrics cần Monitoring

MetricSourceAlert Threshold
Pipeline failuresCI/CD> 5% failure rate
Apply durationCI/CD> 30 min
Security findingsCheckovAny high/critical
Policy violationsOPAAny
Drift detectedScheduled planAny

Runbook Entry Points

Tình huốngRunbook
Pipeline failedrunbook/ci-pipeline-debug.md
Security findingrunbook/security-finding-triage.md
OIDC auth errorrunbook/oidc-troubleshooting.md
Test flakyrunbook/terratest-debug.md
Apply stuckrunbook/terraform-apply-stuck.md

Design Review Checklist

Pipeline

  • [ ] All stages defined
  • [ ] OIDC configured
  • [ ] Branch protection
  • [ ] Environment approvals

Testing

  • [ ] Static analysis
  • [ ] Security scanning
  • [ ] Policy checks
  • [ ] Integration tests (modules)

Security

  • [ ] No static credentials
  • [ ] Secrets masked
  • [ ] Least privilege
  • [ ] Audit logging

Operations

  • [ ] Drift detection
  • [ ] PR comments with plan
  • [ ] Failure notifications
  • [ ] Runbooks documented

📎 Liên kết