Giao diện
📚 State Management
Level: Foundation Solves: Quản lý Terraform state một cách an toàn, collaborative, và scalable cho team enterprise
🎯 Mục tiêu (Outcomes)
Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:
- Hiểu State File và tầm quan trọng của nó
- Cấu hình Remote Backend (S3, GCS) với locking
- Thực hiện State Operations an toàn (mv, rm, import)
- Bảo mật State với encryption và access control
- Tổ chức State theo patterns phù hợp
- Xử lý State Issues (corruption, lock, migration)
✅ Khi nào dùng
| Backend | Use Case | Lý do |
|---|---|---|
| S3 + DynamoDB | AWS primary | Native, locking |
| GCS | GCP primary | Built-in locking |
| Terraform Cloud | Multi-cloud, SaaS | Managed, features |
| Azure Blob | Azure primary | Native integration |
❌ Khi nào KHÔNG dùng
| Pattern | Vấn đề | Thay thế |
|---|---|---|
| Local state cho team | Conflicts, loss | Remote backend |
| State trong Git | Secrets exposed, conflicts | S3/GCS backend |
| Shared state không lock | Corruption | DynamoDB lock |
| Single state cho tất cả | Blast radius | Split by component |
⚠️ Cảnh báo từ Raizo
"Team commit state file vào Git. 6 tháng sau, security audit phát hiện database passwords trong Git history. NEVER commit state to version control."
Tại sao State quan trọng?
💡 Giáo sư Tom
State file là "source of truth" của Terraform. Mất state = mất control. Corrupt state = infrastructure chaos. State management không phải optional - nó là foundation của mọi enterprise Terraform deployment.
State File là gì?
┌─────────────────────────────────────────────────────────────────┐
│ TERRAFORM STATE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Config (.tf) │ │ Real World │ │
│ │ │ │ Resources │ │
│ │ resource "x" │ │ │ │
│ │ resource "y" │ │ EC2, VPC, │ │
│ │ resource "z" │ │ S3, RDS... │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ │ ┌─────────────────┐ │ │
│ └───►│ STATE FILE │◄───┘ │
│ │ │ │
│ │ Maps config │ │
│ │ to real │ │
│ │ resources │ │
│ │ │ │
│ │ resource_id │ │
│ │ attributes │ │
│ │ dependencies │ │
│ └─────────────────┘ │
│ │
│ STATE CONTAINS: │
│ • Resource IDs (how Terraform tracks what it manages) │
│ • Resource attributes (current values) │
│ • Dependencies (ordering information) │
│ • Metadata (Terraform version, provider versions) │
│ │
└─────────────────────────────────────────────────────────────────┘State File Structure
json
{
"version": 4,
"terraform_version": "1.6.0",
"serial": 42,
"lineage": "abc123-def456-...",
"outputs": {
"vpc_id": {
"value": "vpc-0123456789abcdef0",
"type": "string"
}
},
"resources": [
{
"mode": "managed",
"type": "aws_vpc",
"name": "main",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"schema_version": 1,
"attributes": {
"id": "vpc-0123456789abcdef0",
"cidr_block": "10.0.0.0/16",
"tags": {
"Name": "main-vpc"
}
}
}
]
}
]
}Local vs Remote State
Local State Problems
┌─────────────────────────────────────────────────────────────────┐
│ LOCAL STATE PROBLEMS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Engineer A Engineer B │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ terraform │ │ terraform │ │
│ │ .tfstate │ │ .tfstate │ │
│ │ (local) │ │ (local) │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ AWS Resources │ │
│ │ │ │
│ │ Both engineers think they own the same resources! │ │
│ │ Conflicting changes, state corruption, chaos! │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ PROBLEMS: │
│ ❌ No collaboration - each engineer has different state │
│ ❌ No locking - concurrent applies corrupt state │
│ ❌ No backup - laptop dies = state lost │
│ ❌ Secrets in state - local file = security risk │
│ │
└─────────────────────────────────────────────────────────────────┘Remote State Solution
┌─────────────────────────────────────────────────────────────────┐
│ REMOTE STATE SOLUTION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Engineer A Engineer B │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ terraform │ │ terraform │ │
│ │ (no local │ │ (no local │ │
│ │ state) │ │ state) │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ └──────────┬─────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ REMOTE BACKEND │ │
│ │ (S3 + DynamoDB) │ │
│ │ │ │
│ │ • Single source │ │
│ │ of truth │ │
│ │ • State locking │ │
│ │ • Encryption │ │
│ │ • Versioning │ │
│ └──────────┬──────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ AWS Resources │ │
│ └─────────────────────┘ │
│ │
│ BENEFITS: │
│ ✅ Team collaboration with single state │
│ ✅ State locking prevents concurrent modifications │
│ ✅ Automatic backup and versioning │
│ ✅ Encryption at rest and in transit │
│ │
└─────────────────────────────────────────────────────────────────┘Backend Configuration
S3 Backend (AWS)
hcl
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Optional: Assume role for cross-account
role_arn = "arn:aws:iam::123456789012:role/TerraformStateAccess"
}
}GCS Backend (GCP)
hcl
terraform {
backend "gcs" {
bucket = "company-terraform-state"
prefix = "prod/vpc"
}
}Backend Infrastructure Setup
hcl
# state-backend/main.tf
# Run this ONCE to create backend infrastructure
resource "aws_s3_bucket" "terraform_state" {
bucket = "company-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
lifecycle {
prevent_destroy = true
}
}
resource "aws_kms_key" "terraform_state" {
description = "KMS key for Terraform state encryption"
deletion_window_in_days = 30
enable_key_rotation = true
}State Locking
How Locking Works
Lock Table Entry
json
{
"LockID": "company-terraform-state/prod/vpc/terraform.tfstate",
"Info": {
"ID": "abc123-def456",
"Operation": "OperationTypeApply",
"Who": "engineer@company.com",
"Version": "1.6.0",
"Created": "2024-01-15T10:30:00Z",
"Path": "prod/vpc/terraform.tfstate"
}
}Force Unlock (Emergency Only)
bash
# Only use when lock is stuck (e.g., CI/CD crashed)
terraform force-unlock LOCK_ID
# Example
terraform force-unlock abc123-def456-ghi789⚠️ Force Unlock Warning
Chỉ sử dụng force-unlock khi chắc chắn không có process nào đang chạy. Force unlock sai có thể gây state corruption.
State Operations
State Commands
bash
# List all resources in state
terraform state list
# Show specific resource
terraform state show aws_vpc.main
# Move resource (rename)
terraform state mv aws_vpc.main aws_vpc.primary
# Remove resource from state (doesn't destroy)
terraform state rm aws_vpc.main
# Import existing resource
terraform import aws_vpc.main vpc-0123456789abcdef0
# Pull remote state to local
terraform state pull > state.json
# Push local state to remote (dangerous!)
terraform state push state.jsonState Migration
hcl
# Old backend
terraform {
backend "local" {}
}
# New backend
terraform {
backend "s3" {
bucket = "new-state-bucket"
key = "terraform.tfstate"
region = "us-east-1"
}
}bash
# Migrate state
terraform init -migrate-stateState File Security
Sensitive Data in State
⚠️ State Contains Secrets
State file chứa tất cả attributes của resources, bao gồm passwords, API keys, và sensitive data. PHẢI encrypt và restrict access.
hcl
# Passwords end up in state!
resource "aws_db_instance" "main" {
password = var.db_password # This is stored in state
}
# Better: Use secrets manager
resource "aws_db_instance" "main" {
password = aws_secretsmanager_secret_version.db.secret_string
}State Access Control
hcl
# S3 bucket policy - restrict access
resource "aws_s3_bucket_policy" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "DenyUnencryptedUploads"
Effect = "Deny"
Principal = "*"
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.terraform_state.arn}/*"
Condition = {
StringNotEquals = {
"s3:x-amz-server-side-encryption" = "aws:kms"
}
}
},
{
Sid = "DenyInsecureTransport"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.terraform_state.arn,
"${aws_s3_bucket.terraform_state.arn}/*"
]
Condition = {
Bool = {
"aws:SecureTransport" = "false"
}
}
}
]
})
}State Organization Patterns
Per-Environment State
states/
├── dev/terraform.tfstate
├── staging/terraform.tfstate
└── prod/terraform.tfstatePer-Component State
states/
├── networking/terraform.tfstate
├── compute/terraform.tfstate
├── database/terraform.tfstate
└── monitoring/terraform.tfstateHybrid Approach (Recommended)
states/
├── global/
│ ├── iam/terraform.tfstate
│ └── dns/terraform.tfstate
├── dev/
│ ├── vpc/terraform.tfstate
│ ├── eks/terraform.tfstate
│ └── rds/terraform.tfstate
├── staging/
│ └── ...
└── prod/
└── ...Remote State Data Source
hcl
# Read outputs from another state
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
}
}
# Use outputs
resource "aws_instance" "web" {
## Best Practices Checklist
- [ ] Use remote backend (S3/GCS)
- [ ] Enable state locking
- [ ] Encrypt state at rest
- [ ] Restrict state access via IAM
- [ ] Enable versioning for recovery
- [ ] Split state by component/environment
- [ ] Avoid sensitive data in state when possible
- [ ] Regular state backup verification
## ⚖️ Trade-offs
### Trade-off 1: Single State vs Split State
| Approach | Blast Radius | Complexity | Plan Time |
|----------|--------------|------------|------------|
| **Single state** | Rất cao | Thấp | Chậm |
| **Per-env** | Trung bình | Trung bình | Nhanh |
| **Per-component** | Thấp | Cao | Nhanh |
| **Hybrid** | Thấp | Cao nhất | Varies |
**Khuyến nghị**: Hybrid - global + per-env per-component cho enterprise.
---
### Trade-off 2: Backend Options
| Backend | Locking | Encryption | Cost |
|---------|---------|------------|------|
| **S3 + DynamoDB** | DynamoDB | KMS | Low |
| **GCS** | Built-in | Google-managed | Low |
| **Terraform Cloud** | Built-in | Managed | $$$ |
| **Consul** | Built-in | Manual | Ops overhead |
---
### Trade-off 3: State Access Model
| Model | Security | Convenience |
|-------|----------|-------------|
| **Strict IAM** | High | Low |
| **Role-based** | High | Medium |
| **Open access** | Low | High |
## 🚨 Failure Modes
### Failure Mode 1: State Corruption
::: danger 🔥 Incident thực tế
*2 engineers apply đồng thời, không có locking. State chứa partial updates. Terraform không biết resources nào exist. 3 ngày manual reconciliation.*
:::
| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Plan shows unexpected changes | Enable state locking |
| Resource conflicts | CI/CD only apply |
| State serial mismatch | Versioning enabled |
---
### Failure Mode 2: State Lock Stuck
| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| "Error acquiring lock" | Timeout policies |
| Blocked applies | CI/CD proper cleanup |
| DynamoDB item stuck | Monitoring + alerts |
**Recovery**: `terraform force-unlock LOCK_ID` (sau khi verify không có process đang chạy)
---
### Failure Mode 3: State Loss
| Cách phát hiện | Cách phòng tránh |
|----------------|------------------|
| Empty state | S3 versioning |
| Missing resources | Regular backups |
| Init failures | Cross-region replication |
## 🔐 Security Baseline
### State Security Requirements
| Requirement | Implementation | Verification |
|-------------|----------------|---------------|
| **Encryption at rest** | KMS/Google-managed | Bucket config |
| **Encryption in transit** | HTTPS only | Bucket policy |
| **Access restricted** | IAM/IAP | Access audit |
| **No public access** | Block public access | Security scan |
| **Versioning** | Enabled | Bucket config |
### State Security Checklist
| Item | Status |
|------|--------|
| S3/GCS bucket encrypted | ☑ Required |
| Versioning enabled | ☑ Required |
| Public access blocked | ☑ Required |
| IAM access restricted | ☑ Required |
| HTTPS enforced | ☑ Required |
| Audit logging enabled | ☑ Required |
## 📊 Ops Readiness
### Metrics cần Monitoring
| Metric | Source | Alert Threshold |
|--------|--------|-----------------|
| Lock acquisition time | DynamoDB | > 30s |
| State file size | S3/GCS | > 50MB |
| State version count | S3 | Delta > 100/day |
| Lock stuck duration | DynamoDB | > 1 hour |
| State access errors | CloudTrail | Any |
### Runbook Entry Points
| Tình huống | Runbook |
|------------|---------|
| State locked | `runbook/terraform-state-lock.md` |
| State corrupted | `runbook/terraform-state-recovery.md` |
| State lost | `runbook/terraform-state-restore.md` |
| Migration needed | `runbook/terraform-state-migration.md` |
| Backend access denied | `runbook/terraform-backend-access.md` |
## ✅ Design Review Checklist
### Backend
- [ ] Remote backend configured
- [ ] Locking enabled
- [ ] Encryption at rest
- [ ] Access restricted
### Organization
- [ ] State split appropriately
- [ ] Naming convention consistent
- [ ] Path structure logical
- [ ] Cross-state references documented
### Security
- [ ] No secrets in code
- [ ] State access audited
- [ ] Versioning enabled
- [ ] Backup strategy in place
### Operations
- [ ] Lock monitoring
- [ ] State size monitoring
- [ ] Recovery runbooks
- [ ] Team trained on state ops
## 📎 Liên kết
- 📎 [IaC Fundamentals](./fundamentals) - Terraform basics và workflow
- 📎 [Drift Management](/terraform/core/drift) - Xử lý state drift
- 📎 [Security for IaC](/terraform/security/security) - Bảo mật state và secrets
- 📎 [AWS S3](/aws/core/storage) - S3 backend best practices
- 📎 [GCP Cloud Storage](/gcp/data/platforms) - GCS backend patterns
- 📎 [AWS KMS](/aws/security/secrets) - State encryption with KMS