Skip to content

📚 State Management

Level: Foundation Solves: Quản lý Terraform state một cách an toàn, collaborative, và scalable cho team enterprise

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Hiểu State File và tầm quan trọng của nó
  • Cấu hình Remote Backend (S3, GCS) với locking
  • Thực hiện State Operations an toàn (mv, rm, import)
  • Bảo mật State với encryption và access control
  • Tổ chức State theo patterns phù hợp
  • Xử lý State Issues (corruption, lock, migration)

Khi nào dùng

BackendUse CaseLý do
S3 + DynamoDBAWS primaryNative, locking
GCSGCP primaryBuilt-in locking
Terraform CloudMulti-cloud, SaaSManaged, features
Azure BlobAzure primaryNative integration

Khi nào KHÔNG dùng

PatternVấn đềThay thế
Local state cho teamConflicts, lossRemote backend
State trong GitSecrets exposed, conflictsS3/GCS backend
Shared state không lockCorruptionDynamoDB lock
Single state cho tất cảBlast radiusSplit by component

⚠️ Cảnh báo từ Raizo

"Team commit state file vào Git. 6 tháng sau, security audit phát hiện database passwords trong Git history. NEVER commit state to version control."

Tại sao State quan trọng?

💡 Giáo sư Tom

State file là "source of truth" của Terraform. Mất state = mất control. Corrupt state = infrastructure chaos. State management không phải optional - nó là foundation của mọi enterprise Terraform deployment.

State File là gì?

┌─────────────────────────────────────────────────────────────────┐
│                    TERRAFORM STATE                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐         ┌─────────────────┐                │
│  │   Config (.tf)  │         │   Real World    │                │
│  │                 │         │   Resources     │                │
│  │  resource "x"   │         │                 │                │
│  │  resource "y"   │         │   EC2, VPC,     │                │
│  │  resource "z"   │         │   S3, RDS...    │                │
│  └────────┬────────┘         └────────┬────────┘                │
│           │                           │                         │
│           │    ┌─────────────────┐    │                         │
│           └───►│   STATE FILE    │◄───┘                         │
│                │                 │                              │
│                │  Maps config    │                              │
│                │  to real        │                              │
│                │  resources      │                              │
│                │                 │                              │
│                │  resource_id    │                              │
│                │  attributes     │                              │
│                │  dependencies   │                              │
│                └─────────────────┘                              │
│                                                                 │
│  STATE CONTAINS:                                                │
│  • Resource IDs (how Terraform tracks what it manages)          │
│  • Resource attributes (current values)                         │
│  • Dependencies (ordering information)                          │
│  • Metadata (Terraform version, provider versions)              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

State File Structure

json
{
  "version": 4,
  "terraform_version": "1.6.0",
  "serial": 42,
  "lineage": "abc123-def456-...",
  "outputs": {
    "vpc_id": {
      "value": "vpc-0123456789abcdef0",
      "type": "string"
    }
  },
  "resources": [
    {
      "mode": "managed",
      "type": "aws_vpc",
      "name": "main",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "id": "vpc-0123456789abcdef0",
            "cidr_block": "10.0.0.0/16",
            "tags": {
              "Name": "main-vpc"
            }
          }
        }
      ]
    }
  ]
}

Local vs Remote State

Local State Problems

┌─────────────────────────────────────────────────────────────────┐
│                    LOCAL STATE PROBLEMS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Engineer A                    Engineer B                       │
│  ┌─────────────┐              ┌─────────────┐                   │
│  │ terraform   │              │ terraform   │                   │
│  │ .tfstate    │              │ .tfstate    │                   │
│  │ (local)     │              │ (local)     │                   │
│  └──────┬──────┘              └──────┬──────┘                   │
│         │                            │                          │
│         ▼                            ▼                          │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    AWS Resources                        │    │
│  │                                                         │    │
│  │  Both engineers think they own the same resources!      │    │
│  │  Conflicting changes, state corruption, chaos!          │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  PROBLEMS:                                                      │
│  ❌ No collaboration - each engineer has different state        │
│  ❌ No locking - concurrent applies corrupt state               │
│  ❌ No backup - laptop dies = state lost                        │
│  ❌ Secrets in state - local file = security risk               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Remote State Solution

┌─────────────────────────────────────────────────────────────────┐
│                    REMOTE STATE SOLUTION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Engineer A                    Engineer B                       │
│  ┌─────────────┐              ┌─────────────┐                   │
│  │ terraform   │              │ terraform   │                   │
│  │ (no local   │              │ (no local   │                   │
│  │  state)     │              │  state)     │                   │
│  └──────┬──────┘              └──────┬──────┘                   │
│         │                            │                          │
│         └──────────┬─────────────────┘                          │
│                    ▼                                            │
│         ┌─────────────────────┐                                 │
│         │   REMOTE BACKEND    │                                 │
│         │   (S3 + DynamoDB)   │                                 │
│         │                     │                                 │
│         │  • Single source    │                                 │
│         │    of truth         │                                 │
│         │  • State locking    │                                 │
│         │  • Encryption       │                                 │
│         │  • Versioning       │                                 │
│         └──────────┬──────────┘                                 │
│                    ▼                                            │
│         ┌─────────────────────┐                                 │
│         │   AWS Resources     │                                 │
│         └─────────────────────┘                                 │
│                                                                 │
│  BENEFITS:                                                      │
│  ✅ Team collaboration with single state                        │
│  ✅ State locking prevents concurrent modifications             │
│  ✅ Automatic backup and versioning                             │
│  ✅ Encryption at rest and in transit                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Backend Configuration

S3 Backend (AWS)

hcl
# backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "prod/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Optional: Assume role for cross-account
    role_arn       = "arn:aws:iam::123456789012:role/TerraformStateAccess"
  }
}

GCS Backend (GCP)

hcl
terraform {
  backend "gcs" {
    bucket  = "company-terraform-state"
    prefix  = "prod/vpc"
  }
}

Backend Infrastructure Setup

hcl
# state-backend/main.tf
# Run this ONCE to create backend infrastructure

resource "aws_s3_bucket" "terraform_state" {
  bucket = "company-terraform-state"
  
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.terraform_state.arn
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  
  attribute {
    name = "LockID"
    type = "S"
  }
  
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_kms_key" "terraform_state" {
  description             = "KMS key for Terraform state encryption"
  deletion_window_in_days = 30
  enable_key_rotation     = true
}

State Locking

How Locking Works

Lock Table Entry

json
{
  "LockID": "company-terraform-state/prod/vpc/terraform.tfstate",
  "Info": {
    "ID": "abc123-def456",
    "Operation": "OperationTypeApply",
    "Who": "engineer@company.com",
    "Version": "1.6.0",
    "Created": "2024-01-15T10:30:00Z",
    "Path": "prod/vpc/terraform.tfstate"
  }
}

Force Unlock (Emergency Only)

bash
# Only use when lock is stuck (e.g., CI/CD crashed)
terraform force-unlock LOCK_ID

# Example
terraform force-unlock abc123-def456-ghi789

⚠️ Force Unlock Warning

Chỉ sử dụng force-unlock khi chắc chắn không có process nào đang chạy. Force unlock sai có thể gây state corruption.

State Operations

State Commands

bash
# List all resources in state
terraform state list

# Show specific resource
terraform state show aws_vpc.main

# Move resource (rename)
terraform state mv aws_vpc.main aws_vpc.primary

# Remove resource from state (doesn't destroy)
terraform state rm aws_vpc.main

# Import existing resource
terraform import aws_vpc.main vpc-0123456789abcdef0

# Pull remote state to local
terraform state pull > state.json

# Push local state to remote (dangerous!)
terraform state push state.json

State Migration

hcl
# Old backend
terraform {
  backend "local" {}
}

# New backend
terraform {
  backend "s3" {
    bucket = "new-state-bucket"
    key    = "terraform.tfstate"
    region = "us-east-1"
  }
}
bash
# Migrate state
terraform init -migrate-state

State File Security

Sensitive Data in State

⚠️ State Contains Secrets

State file chứa tất cả attributes của resources, bao gồm passwords, API keys, và sensitive data. PHẢI encrypt và restrict access.

hcl
# Passwords end up in state!
resource "aws_db_instance" "main" {
  password = var.db_password  # This is stored in state
}

# Better: Use secrets manager
resource "aws_db_instance" "main" {
  password = aws_secretsmanager_secret_version.db.secret_string
}

State Access Control

hcl
# S3 bucket policy - restrict access
resource "aws_s3_bucket_policy" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "DenyUnencryptedUploads"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:PutObject"
        Resource  = "${aws_s3_bucket.terraform_state.arn}/*"
        Condition = {
          StringNotEquals = {
            "s3:x-amz-server-side-encryption" = "aws:kms"
          }
        }
      },
      {
        Sid       = "DenyInsecureTransport"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:*"
        Resource  = [
          aws_s3_bucket.terraform_state.arn,
          "${aws_s3_bucket.terraform_state.arn}/*"
        ]
        Condition = {
          Bool = {
            "aws:SecureTransport" = "false"
          }
        }
      }
    ]
  })
}

State Organization Patterns

Per-Environment State

states/
├── dev/terraform.tfstate
├── staging/terraform.tfstate
└── prod/terraform.tfstate

Per-Component State

states/
├── networking/terraform.tfstate
├── compute/terraform.tfstate
├── database/terraform.tfstate
└── monitoring/terraform.tfstate
states/
├── global/
│   ├── iam/terraform.tfstate
│   └── dns/terraform.tfstate
├── dev/
│   ├── vpc/terraform.tfstate
│   ├── eks/terraform.tfstate
│   └── rds/terraform.tfstate
├── staging/
│   └── ...
└── prod/
    └── ...

Remote State Data Source

hcl
# Read outputs from another state
data "terraform_remote_state" "vpc" {
  backend = "s3"
  
  config = {
    bucket = "company-terraform-state"
    key    = "prod/vpc/terraform.tfstate"
    region = "us-east-1"
  }
}

# Use outputs
resource "aws_instance" "web" {
## Best Practices Checklist

- [ ] Use remote backend (S3/GCS)
- [ ] Enable state locking
- [ ] Encrypt state at rest
- [ ] Restrict state access via IAM
- [ ] Enable versioning for recovery
- [ ] Split state by component/environment
- [ ] Avoid sensitive data in state when possible
- [ ] Regular state backup verification

## ⚖️ Trade-offs

### Trade-off 1: Single State vs Split State

| Approach | Blast Radius | Complexity | Plan Time |
|----------|--------------|------------|------------|
| **Single state** | Rất cao | Thấp | Chậm |
| **Per-env** | Trung bình | Trung bình | Nhanh |
| **Per-component** | Thấp | Cao | Nhanh |
| **Hybrid** | Thấp | Cao nhất | Varies |

**Khuyến nghị**: Hybrid - global + per-env per-component cho enterprise.

---

### Trade-off 2: Backend Options

| Backend | Locking | Encryption | Cost |
|---------|---------|------------|------|
| **S3 + DynamoDB** | DynamoDB | KMS | Low |
| **GCS** | Built-in | Google-managed | Low |
| **Terraform Cloud** | Built-in | Managed | $$$ |
| **Consul** | Built-in | Manual | Ops overhead |

---

### Trade-off 3: State Access Model

| Model | Security | Convenience |
|-------|----------|-------------|
| **Strict IAM** | High | Low |
| **Role-based** | High | Medium |
| **Open access** | Low | High |

## 🚨 Failure Modes

### Failure Mode 1: State Corruption

::: danger 🔥 Incident thực tế
*2 engineers apply đồng thời, không có locking. State chứa partial updates. Terraform không biết resources nào exist. 3 ngày manual reconciliation.*
:::

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Plan shows unexpected changes | Enable state locking |
| Resource conflicts | CI/CD only apply |
| State serial mismatch | Versioning enabled |

---

### Failure Mode 2: State Lock Stuck

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| "Error acquiring lock" | Timeout policies |
| Blocked applies | CI/CD proper cleanup |
| DynamoDB item stuck | Monitoring + alerts |

**Recovery**: `terraform force-unlock LOCK_ID` (sau khi verify không có process đang chạy)

---

### Failure Mode 3: State Loss

| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Empty state | S3 versioning |
| Missing resources | Regular backups |
| Init failures | Cross-region replication |

## 🔐 Security Baseline

### State Security Requirements

| Requirement | Implementation | Verification |
|-------------|----------------|---------------|
| **Encryption at rest** | KMS/Google-managed | Bucket config |
| **Encryption in transit** | HTTPS only | Bucket policy |
| **Access restricted** | IAM/IAP | Access audit |
| **No public access** | Block public access | Security scan |
| **Versioning** | Enabled | Bucket config |

### State Security Checklist

| Item | Status |
|------|--------|
| S3/GCS bucket encrypted | ☑ Required |
| Versioning enabled | ☑ Required |
| Public access blocked | ☑ Required |
| IAM access restricted | ☑ Required |
| HTTPS enforced | ☑ Required |
| Audit logging enabled | ☑ Required |

## 📊 Ops Readiness

### Metrics cần Monitoring

| Metric | Source | Alert Threshold |
|--------|--------|-----------------|
| Lock acquisition time | DynamoDB | > 30s |
| State file size | S3/GCS | > 50MB |
| State version count | S3 | Delta > 100/day |
| Lock stuck duration | DynamoDB | > 1 hour |
| State access errors | CloudTrail | Any |

### Runbook Entry Points

| Tình huống | Runbook |
|------------|---------|
| State locked | `runbook/terraform-state-lock.md` |
| State corrupted | `runbook/terraform-state-recovery.md` |
| State lost | `runbook/terraform-state-restore.md` |
| Migration needed | `runbook/terraform-state-migration.md` |
| Backend access denied | `runbook/terraform-backend-access.md` |

## ✅ Design Review Checklist

### Backend

- [ ] Remote backend configured
- [ ] Locking enabled
- [ ] Encryption at rest
- [ ] Access restricted

### Organization

- [ ] State split appropriately
- [ ] Naming convention consistent
- [ ] Path structure logical
- [ ] Cross-state references documented

### Security

- [ ] No secrets in code
- [ ] State access audited
- [ ] Versioning enabled
- [ ] Backup strategy in place

### Operations

- [ ] Lock monitoring
- [ ] State size monitoring
- [ ] Recovery runbooks
- [ ] Team trained on state ops

## 📎 Liên kết

- 📎 [IaC Fundamentals](./fundamentals) - Terraform basics và workflow
- 📎 [Drift Management](/terraform/core/drift) - Xử lý state drift
- 📎 [Security for IaC](/terraform/security/security) - Bảo mật state secrets
- 📎 [AWS S3](/aws/core/storage) - S3 backend best practices
- 📎 [GCP Cloud Storage](/gcp/data/platforms) - GCS backend patterns
- 📎 [AWS KMS](/aws/security/secrets) - State encryption with KMS