Skip to content

🔴 Environments Strategy

Level: Advanced Solves: Quản lý multiple environments (dev, staging, prod) với Terraform một cách scalable và maintainable

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Chọn đúng Environment Strategy cho project
  • Sử dụng Workspaces đúng cách (và biết khi nào không dùng)
  • Thiết kế Directory per Environment structure
  • Cấu hình Terragrunt cho DRY environments
  • Implement Promotion Workflow an toàn
  • Ngăn chặn Accidental Prod Changes

Khi nào dùng

StrategyUse CaseLý do
WorkspacesĐơn giản, same configsQuick, no duplication
Directory per envHầu hết teamsClear separation, recommended
TerragruntLarge orgs, DRYMinimal duplication
Branch per envGitOps workflowsClear promotion

Khi nào KHÔNG dùng

PatternVấn đềThay thế
Workspaces for prodWrong workspace riskDirectory per env
Single state all envsBlast radiusSeparate states
Different module versionsEnv parity brokenSame versions
No prod protectionAccidental changesprevent_destroy, approvals

⚠️ Cảnh báo từ Raizo

"Engineer quên switch workspace. Apply dev config vào prod. Downtime 4 giờ. Từ đó chỉ dùng directory per env cho production."

Tại sao Environment Strategy quan trọng?

💡 Giáo sư Tom

"Works on my machine" trong IaC là "works in dev". Environment parity là critical - prod phải giống staging giống dev về structure, chỉ khác về scale và data. Strategy sai = deployment nightmares.

Environment Challenges

┌─────────────────────────────────────────────────────────────────┐
│                    ENVIRONMENT CHALLENGES                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │     DEV     │    │   STAGING   │    │    PROD     │          │
│  │             │    │             │    │             │          │
│  │  t3.micro   │    │  t3.small   │    │  t3.large   │          │
│  │  1 replica  │    │  2 replicas │    │  5 replicas │          │
│  │  no HA      │    │  basic HA   │    │  full HA    │          │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
│                                                                 │
│  QUESTIONS:                                                     │
│  • How to share code but vary configuration?                    │
│  • How to prevent dev changes from affecting prod?              │
│  • How to promote changes through environments?                 │
│  • How to manage state isolation?                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Strategy Comparison

Overview

StrategyState IsolationCode DuplicationComplexityBest For
WorkspacesSame backend, different stateNoneLowSimple projects
Directory per envSeparate backendsSomeMediumMost teams
TerragruntSeparate backendsMinimalHighLarge orgs
Branch per envSeparate backendsHighMediumGitOps workflows

Strategy 1: Terraform Workspaces

How Workspaces Work

┌─────────────────────────────────────────────────────────────────┐
│                    TERRAFORM WORKSPACES                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Same Configuration                                             │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  main.tf, variables.tf, outputs.tf                      │    │
│  └─────────────────────────────────────────────────────────┘    │
│                           │                                     │
│           ┌───────────────┼───────────────┐                     │
│           ▼               ▼               ▼                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  Workspace  │  │  Workspace  │  │  Workspace  │              │
│  │    "dev"    │  │  "staging"  │  │   "prod"    │              │
│  │             │  │             │  │             │              │
│  │  state:     │  │  state:     │  │  state:     │              │
│  │  env:/dev/  │  │  env:/stg/  │  │  env:/prod/ │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│                                                                 │
│  S3 Backend Structure:                                          │
│  bucket/                                                        │
│  ├── env:/dev/terraform.tfstate                                 │
│  ├── env:/staging/terraform.tfstate                             │
│  └── env:/prod/terraform.tfstate                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Workspace Commands

bash
# List workspaces
terraform workspace list

# Create workspace
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

# Switch workspace
terraform workspace select prod

# Show current workspace
terraform workspace show

# Delete workspace
terraform workspace delete dev

Using Workspace in Config

hcl
# variables.tf
variable "environment_config" {
  type = map(object({
    instance_type    = string
    instance_count   = number
    enable_ha        = bool
  }))
  
  default = {
    dev = {
      instance_type  = "t3.micro"
      instance_count = 1
      enable_ha      = false
    }
    staging = {
      instance_type  = "t3.small"
      instance_count = 2
      enable_ha      = true
    }
    prod = {
      instance_type  = "t3.large"
      instance_count = 5
      enable_ha      = true
    }
  }
}

# main.tf
locals {
  env    = terraform.workspace
  config = var.environment_config[local.env]
}

resource "aws_instance" "web" {
  count         = local.config.instance_count
  instance_type = local.config.instance_type
  
  tags = {
    Environment = local.env
  }
}

Workspace Pros & Cons

✅ Pros

  • Zero code duplication
  • Simple to understand
  • Built into Terraform
  • Easy to switch environments

❌ Cons

  • Same backend = shared access control
  • Easy to accidentally apply to wrong workspace
  • Limited flexibility for env-specific resources
  • Not recommended by HashiCorp for production

Directory Structure

infrastructure/
├── modules/                    # Shared modules
│   ├── vpc/
│   ├── eks/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── backend.tf          # Dev-specific backend
│   │   └── terraform.tfvars    # Dev values
│   ├── staging/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── backend.tf          # Staging-specific backend
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── variables.tf
│       ├── outputs.tf
│       ├── backend.tf          # Prod-specific backend
│       └── terraform.tfvars
└── global/                     # Shared resources
    ├── iam/
    └── dns/

Environment Configuration

hcl
# environments/dev/backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "dev/infrastructure/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

# environments/dev/main.tf
module "vpc" {
  source = "../../modules/vpc"
  
  name               = "dev"
  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a"]  # Single AZ for dev
  enable_nat_gateway = true
  single_nat_gateway = true            # Cost saving
  
  tags = local.common_tags
}

# environments/dev/terraform.tfvars
environment    = "dev"
instance_type  = "t3.micro"
instance_count = 1
hcl
# environments/prod/backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state-prod"  # Separate bucket
    key            = "prod/infrastructure/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock-prod"
    encrypt        = true
    role_arn       = "arn:aws:iam::PROD_ACCOUNT:role/TerraformRole"
  }
}

# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"
  
  name               = "prod"
  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  enable_nat_gateway = true
  single_nat_gateway = false  # HA for prod
  
  tags = local.common_tags
}

# environments/prod/terraform.tfvars
environment    = "prod"
instance_type  = "t3.large"
instance_count = 5

Directory Strategy Pros & Cons

✅ Pros

  • Complete state isolation
  • Separate access control per environment
  • Clear visibility of env-specific config
  • Can have different resources per env
  • Recommended by HashiCorp

❌ Cons

  • Some code duplication (main.tf per env)
  • Need to keep envs in sync manually
  • More files to manage

Strategy 3: Terragrunt (DRY Approach)

Terragrunt Structure

infrastructure/
├── modules/
│   └── vpc/
├── terragrunt.hcl              # Root config
└── environments/
    ├── terragrunt.hcl          # Env-level config
    ├── dev/
    │   ├── terragrunt.hcl      # Dev-specific
    │   └── vpc/
    │       └── terragrunt.hcl
    ├── staging/
    │   └── ...
    └── prod/
        └── ...

Terragrunt Configuration

hcl
# environments/terragrunt.hcl (parent)
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "company-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

# environments/dev/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/vpc"
}

inputs = {
  name               = "dev"
  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

Terragrunt Commands

bash
# Apply single module
cd environments/dev/vpc
terragrunt apply

# Apply all modules in environment
cd environments/dev
terragrunt run-all apply

# Plan all modules
terragrunt run-all plan

# Destroy in reverse dependency order
terragrunt run-all destroy

Promotion Workflow

GitOps Promotion

Environment Promotion Script

bash
#!/bin/bash
# promote.sh - Promote changes between environments

SOURCE_ENV=$1
TARGET_ENV=$2

echo "Promoting from $SOURCE_ENV to $TARGET_ENV"

# Copy tfvars (excluding env-specific values)
cp environments/$SOURCE_ENV/terraform.tfvars environments/$TARGET_ENV/terraform.tfvars.new

# Show diff
diff environments/$TARGET_ENV/terraform.tfvars environments/$TARGET_ENV/terraform.tfvars.new

# Plan in target environment
cd environments/$TARGET_ENV
terraform plan -var-file=terraform.tfvars.new

Best Practices

1. Environment Parity

hcl
# Use same module versions across environments
module "vpc" {
  source  = "company/vpc/aws"
  version = "2.1.0"  # Same version everywhere
  
  # Only vary configuration, not structure
  instance_count = var.instance_count
}

2. Separate State Buckets for Prod

hcl
# Dev/Staging - shared bucket
bucket = "company-terraform-state"

# Prod - separate bucket, separate account
bucket   = "company-terraform-state-prod"
role_arn = "arn:aws:iam::PROD_ACCOUNT:role/TerraformRole"

3. Environment-Specific Variables

hcl
# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
  
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Must be dev, staging, or prod."
  }
}

# Derive other settings from environment
locals {
  is_production = var.environment == "prod"
  
  instance_type = local.is_production ? "t3.large" : "t3.micro"
  multi_az      = local.is_production ? true : false
}

4. Prevent Accidental Prod Changes

hcl
# prod/main.tf
resource "aws_vpc" "main" {
  # ...
  
  lifecycle {
    prevent_destroy = true
  }
}

# CI/CD - require approval for prod
# .github/workflows/terraform.yml
jobs:
  apply-prod:
    environment: production  # Requires approval
    steps:
      - run: terraform apply

Anti-Patterns

Single State for All Environments

hcl
# BAD - All environments in one state
resource "aws_instance" "dev_web" { ... }
resource "aws_instance" "staging_web" { ... }
resource "aws_instance" "prod_web" { ... }

Hardcoded Environment Values

hcl
# BAD - Hardcoded in main.tf
resource "aws_instance" "web" {
  instance_type = "t3.micro"  # What about prod?
}

# GOOD - Variable
resource "aws_instance" "web" {
  instance_type = var.instance_type
}

Different Module Versions per Environment

hcl
# BAD - Version drift
# dev/main.tf
module "vpc" {
  source  = "company/vpc/aws"
  version = "3.0.0"  # Latest
}

## Best Practices Checklist

- [ ] Environment parity (same modules, same versions)
- [ ] State isolation per environment
- [ ] Separate prod state bucket/account
- [ ] prevent_destroy on prod resources
- [ ] Approval required for prod changes
- [ ] Clear promotion workflow
- [ ] No hardcoded env values
- [ ] Environment validation in variables

## ⚖️ Trade-offs

### Trade-off 1: Workspaces vs Directories

| Khía cạnh | Workspaces | Directory per Env |
|-----------|------------|-------------------|
| **Simplicity** | Cao | Trung bình |
| **State isolation** | Thấp | Cao |
| **Access control** | Shared | Separate |
| **Wrong env risk** | Cao | Thấp |

**Khuyến nghị**: Directory per env cho production workloads.

---

### Trade-off 2: Code Duplication vs DRY

| Approach | Duplication | Complexity | Flexibility |
|----------|-------------|------------|-------------|
| **Full duplication** | Cao | Thấp | Cao nhất |
| **Shared modules** | Thấp | Trung bình | Cao |
| **Terragrunt** | Tối thiểu | Cao | Trung bình |

---

### Trade-off 3: Promotion Strategy

| Strategy | Safety | Speed |
|----------|--------|-------|
| **Manual promotion** | Cao | Chậm |
| **GitOps (branches)** | Cao | Trung bình |
| **Auto-promotion** | Thấp | Nhanh |

## 🚨 Failure Modes

### Failure Mode 1: Wrong Workspace Applied to Prod

::: danger 🔥 Incident thực tế
*Engineer forget to switch workspace. terraform apply in "dev" workspace nhưng lại đang ở prod directory. Dev config deployed to prod. 4-hour outage.*
:::

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Monitoring alerts | Directory per env |
| User reports | CI/CD only apply |
| Plan review | Workspace prompts |

---

### Failure Mode 2: Module Version Drift

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Different behavior | Pin same versions |
| Promotion failures | Audit version spread |
| Env parity issues | Version management |

---

### Failure Mode 3: Env Parity Broken

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Works in dev, fails in prod | Same structure |
| Missing resources | Shared modules |
| Config differences | Only vary scale |

## 🔐 Security Baseline

### Environment Security

| Requirement | Implementation | Verification |
|-------------|----------------|---------------|
| **Prod state separate** | Separate bucket/account | Config review |
| **Prod access restricted** | IAM/approvals | Access audit |
| **prevent_destroy on prod** | lifecycle block | Code review |
| **Promotion workflow** | PRs, approvals | Process review |

### Access Control per Environment

| Environment | Who Can Apply | Approval |
|-------------|---------------|----------|
| Dev | Engineers | No |
| Staging | Engineers | Optional |
| Prod | CI/CD only | Required |

## 📊 Ops Readiness

### Metrics cần Monitoring

| Metric | Source | Alert Threshold |
|--------|--------|-----------------|
| Env parity | Module version audit | Any drift |
| Wrong env apply | Audit logs | Any dev->prod |
| Promotion time | CI/CD | > SLA |
| Prod change frequency | CI/CD | Anomaly |

### Runbook Entry Points

| Tình huống | Runbook |
|------------|---------|
| Wrong env applied | `runbook/wrong-env-recovery.md` |
| Module version drift | `runbook/version-alignment.md` |
| Promotion blocked | `runbook/promotion-troubleshoot.md` |
| Env parity issue | `runbook/env-parity-debug.md` |

## ✅ Design Review Checklist

### Strategy

- [ ] Strategy chosen fits team size
- [ ] State isolation appropriate
- [ ] Code duplication acceptable
- [ ] Terragrunt if needed

### Per Environment

- [ ] Backend configured
- [ ] State isolated
- [ ] Access restricted (prod)
- [ ] prevent_destroy (prod)

### Parity

- [ ] Same modules all envs
- [ ] Same versions all envs
- [ ] Only config differs
- [ ] Promotion tested

### Operations

- [ ] Promotion workflow defined
- [ ] Approvals for prod
- [ ] Rollback plan
- [ ] Runbooks documented

## 📎 Liên kết

- 📎 [State Management](/terraform/foundation/state) - Backend configuration
- 📎 [Module Design](/terraform/core/modules) - Reusable modules
- 📎 [Testing & CI/CD](/terraform/advanced/testing) - Environment pipelines
- 📎 [AWS Landing Zone](/aws/foundation/landing-zone) - Multi-account strategy
- 📎 [GCP Hierarchy](/gcp/foundation/hierarchy) - Project-based environments
- 📎 [Terraform Security](/terraform/security/security) - Secure environments