Skip to content

💾 Drift Management

Level: Core Solves: Phát hiện và xử lý configuration drift giữa Terraform state và actual infrastructure

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Phát hiện Drift với terraform plan và refresh
  • Xử lý Drift với các strategies phù hợp
  • Import Resources vào Terraform management
  • Ngăn chặn Drift với policies và automation
  • Thiết lập Scheduled Drift Detection
  • Sử dụng lifecycle ignore_changes đúng cách

Khi nào dùng

StrategyUse CaseLý do
terraform planBất kỳ changeDetect differences
terraform importExisting resourcesBring under TF control
ignore_changesAuto-managed attrsPrevent unwanted changes
Scheduled detectionProductionCatch manual changes

Khi nào KHÔNG dùng

PatternVấn đềThay thế
ignore_changes = allLose controlSpecific attributes
Manual refreshState mismatchPlan first
Force-unlock không kiểm traState corruptionVerify lock holder
Import không viết configNext plan failsConfig trước, import sau

⚠️ Cảnh báo từ Raizo

"Emergency fix trong console lúc 3 AM. Quên update Terraform. Next apply revert fix. Production down lần 2. Luôn document manual changes và update IaC."

Drift là gì?

💡 Giáo sư Tom

Drift xảy ra khi reality không match với expectation. Ai đó click trong console, script chạy ngoài Terraform, hoặc auto-scaling thay đổi resources. Drift là inevitable - cách bạn handle nó mới quan trọng.

Drift Scenarios

┌─────────────────────────────────────────────────────────────────┐
│                    CONFIGURATION DRIFT                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
│  │  Terraform  │    │   State     │    │   Actual    │          │
│  │   Config    │    │   File      │    │   Infra     │          │
│  │             │    │             │    │             │          │
│  │  t3.micro   │    │  t3.micro   │    │  t3.large   │ ← DRIFT! │
│  │  port 443   │    │  port 443   │    │  port 443   │          │
│  │  3 replicas │    │  3 replicas │    │  5 replicas │ ← DRIFT! │
│  └─────────────┘    └─────────────┘    └─────────────┘          │
│                                                                 │
│  DRIFT CAUSES:                                                  │
│  • Manual console changes (emergency fixes)                     │
│  • Scripts running outside Terraform                            │
│  • Auto-scaling events                                          │
│  • AWS/GCP service updates                                      │
│  • Another team's Terraform (overlapping resources)             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Detecting Drift

terraform plan

bash
# Most common drift detection
terraform plan

# Output shows drift
# aws_instance.web:
#   ~ instance_type = "t3.micro" -> "t3.large"  # Changed outside Terraform

terraform refresh

bash
# Update state to match reality (without changing infra)
terraform refresh

# Or with plan
terraform plan -refresh-only

⚠️ Refresh Warning

terraform refresh updates state to match reality. Nếu ai đó delete resource ngoài Terraform, refresh sẽ remove nó khỏi state - và Terraform sẽ không biết để recreate.

Drift Detection Workflow

Handling Drift

Option 1: Revert to Terraform (Most Common)

bash
# Terraform will change infrastructure back to match config
terraform apply

# Example output:
# aws_instance.web will be updated in-place
#   ~ instance_type = "t3.large" -> "t3.micro"

Option 2: Accept Reality (Update Config)

hcl
# If the manual change was intentional, update config
resource "aws_instance" "web" {
  instance_type = "t3.large"  # Changed from t3.micro
}
bash
# Now plan shows no changes
terraform plan
# No changes. Infrastructure is up-to-date.

Option 3: Ignore Specific Attributes

hcl
resource "aws_instance" "web" {
  ami           = var.ami_id
  instance_type = var.instance_type
  
  # Ignore changes to tags made by auto-tagging systems
  lifecycle {
    ignore_changes = [
      tags["LastModified"],
      tags["AutoScaleGroup"],
    ]
  }
}

# Ignore all tag changes
lifecycle {
  ignore_changes = [tags]
}

# Ignore all changes (resource managed elsewhere)
lifecycle {
  ignore_changes = all
}

Importing Resources

When to Import

  • Existing infrastructure created before Terraform
  • Resources created manually during emergency
  • Migrating from another IaC tool
  • Taking over resources from another team

Import Workflow

┌─────────────────────────────────────────────────────────────────┐
│                    IMPORT WORKFLOW                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. IDENTIFY                                                    │
│     ┌─────────────────────────────────────────────────────┐     │
│     │  Find resource ID in cloud console                  │     │
│     │  Example: vpc-0123456789abcdef0                      │     │
│     └─────────────────────────────────────────────────────┘     │
│                           │                                     │
│                           ▼                                     │
│  2. WRITE CONFIG                                                │
│     ┌─────────────────────────────────────────────────────┐     │
│     │  resource "aws_vpc" "imported" {                    │     │
│     │    # Empty or minimal config                        │     │
│     │  }                                                  │     │
│     └─────────────────────────────────────────────────────┘     │
│                           │                                     │
│                           ▼                                     │
│  3. IMPORT                                                      │
│     ┌─────────────────────────────────────────────────────┐     │
│     │  terraform import aws_vpc.imported vpc-xxx          │     │
│     └─────────────────────────────────────────────────────┘     │
│                           │                                     │
│                           ▼                                     │
│  4. COMPLETE CONFIG                                             │
│     ┌─────────────────────────────────────────────────────┐     │
│     │  terraform show -no-color > imported.txt            │     │
│     │  Copy attributes to config                          │     │
│     │  Run terraform plan (should show no changes)        │     │
│     └─────────────────────────────────────────────────────┘     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Import Commands

bash
# Basic import
terraform import aws_vpc.main vpc-0123456789abcdef0

# Import with provider alias
terraform import -provider=aws.eu aws_vpc.eu_main vpc-xxx

# Import module resource
terraform import module.vpc.aws_vpc.main vpc-xxx

# Import with index (count/for_each)
terraform import 'aws_subnet.public[0]' subnet-xxx
terraform import 'aws_subnet.public["us-east-1a"]' subnet-xxx

Import Block (Terraform 1.5+)

hcl
# import.tf - Declarative import
import {
  to = aws_vpc.main
  id = "vpc-0123456789abcdef0"
}

import {
  to = aws_subnet.public[0]
  id = "subnet-abc123"
}

# Generate config automatically
terraform plan -generate-config-out=generated.tf

State Manipulation

Moving Resources

bash
# Rename resource
terraform state mv aws_vpc.old aws_vpc.new

# Move to module
terraform state mv aws_vpc.main module.vpc.aws_vpc.main

# Move between modules
terraform state mv module.old.aws_vpc.main module.new.aws_vpc.main

Removing from State

bash
# Remove resource from state (doesn't destroy actual resource)
terraform state rm aws_vpc.main

# Use case: Resource will be managed by different Terraform config
# Use case: Resource was imported by mistake

Replacing Resources

bash
# Force recreation of resource
terraform apply -replace=aws_instance.web

# Old way (deprecated)
terraform taint aws_instance.web
terraform apply

Drift Prevention Strategies

1. Restrict Console Access

hcl
# SCP to prevent manual changes
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyManualEC2Changes",
      "Effect": "Deny",
      "Action": [
        "ec2:ModifyInstanceAttribute",
        "ec2:TerminateInstances"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/TerraformRole"
        }
      }
    }
  ]
}

2. Scheduled Drift Detection

yaml
# GitHub Actions - Daily drift check
name: Drift Detection
on:
  schedule:
    - cron: '0 6 * * *'  # Daily at 6 AM

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        
      - name: Terraform Init
        run: terraform init
        
      - name: Check for Drift
        run: |
          terraform plan -detailed-exitcode
          # Exit code 2 = changes detected (drift)
        continue-on-error: true
        
      - name: Notify on Drift
        if: failure()
        run: |
          # Send Slack/email notification
          echo "Drift detected!"

3. Tagging for Terraform-Managed Resources

hcl
locals {
  terraform_tags = {
    ManagedBy     = "terraform"
    TerraformRepo = "github.com/company/infra"
    TerraformPath = "environments/prod/vpc"
  }
}

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  
  tags = merge(var.tags, local.terraform_tags)
}

4. Resource Locks

hcl
# Prevent accidental deletion
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  
  lifecycle {
    prevent_destroy = true
  }
}

Reconciliation Patterns

Pattern 1: Terraform Wins

bash
# Always revert to Terraform config
terraform apply -auto-approve

Pattern 2: Reality Wins

bash
# Update state to match reality
terraform apply -refresh-only -auto-approve

# Then update config to match
# (manual process)

Pattern 3: Selective Reconciliation

hcl
# Some attributes managed by Terraform
# Some attributes managed externally
resource "aws_autoscaling_group" "web" {
  min_size = 2
  max_size = 10
  
  lifecycle {
    ignore_changes = [
      desired_capacity,  # Managed by auto-scaling
    ]
  }
}

Troubleshooting Drift

Common Issues

IssueCauseSolution
Resource not in stateDeleted outside TFRe-import or recreate
Attribute mismatchManual changeApply or update config
Provider version driftAPI changesUpdate provider version
State corruptionConcurrent accessRestore from backup

Best Practices Checklist

  • [ ] Scheduled drift detection (daily)
  • [ ] Alert on drift detected
  • [ ] Document manual changes immediately
  • [ ] Use ignore_changes sparingly
  • [ ] Tag Terraform-managed resources
  • [ ] Restrict console access with SCPs
  • [ ] Import before recreate
  • [ ] State backup before major operations

⚖️ Trade-offs

Trade-off 1: Terraform Wins vs Reality Wins

ApproachConsistencyFlexibility
Terraform winsCaoThấp
Reality winsThấpCao
Selective (ignore_changes)Trung bìnhTrung bình

Khuyến nghị: Terraform wins, với ignore_changes cho auto-scaling attributes.


Trade-off 2: Drift Detection Frequency

FrequencyDetection SpeedCI Cost
Every pushInstantCao
Hourly< 1 hourTrung bình
Daily< 24 hoursThấp

Trade-off 3: Import vs Recreate

ApproachDowntimeComplexity
ImportZeroCao
RecreateThấp
Move (mv)ZeroTrung bình

🚨 Failure Modes

Failure Mode 1: Accidental Revert

🔥 Incident thực tế

Manual fix trong console 3 AM. Terraform apply lúc 9 AM revert fix. Production down lần 2.

Cách phát hiệnCách phòng tránh
User reportsDocument manual changes
Plan shows changesUpdate TF trước apply
Monitoring alertsReview plan carefully

Failure Mode 2: Import Without Config

Cách phát hiệnCách phòng tránh
Plan shows changesWrite config first
Missing attributesUse terraform show
Next apply modifiesPlan verification

Failure Mode 3: State/Reality Mismatch

Cách phát hiệnCách phòng tránh
Resources not foundRegular drift check
Unexpected changesLock state properly
Apply errorsState backup

🔐 Security Baseline

Drift Prevention Security

RequirementImplementationVerification
Restrict consoleSCPs, IAMPolicy audit
Audit trailCloudTrail, Audit logsLog review
Terraform-only changesCI/CD enforcementAccess review
Tag managed resourcesStandard tagsCompliance scan

Access Control

ActionWho Can Do
Console readEngineers
Console writeEmergency only
Terraform applyCI/CD
State accessTerraform role only

📊 Ops Readiness

Metrics cần Monitoring

MetricSourceAlert Threshold
Drift detectedCI/CD planAny
Manual console changesCloudTrailAny
Import countCI logsTracking
Time since last syncCI metadata> 24 hours

Runbook Entry Points

Tình huốngRunbook
Drift detectedrunbook/terraform-drift-resolution.md
Manual change neededrunbook/emergency-manual-change.md
Import resourcerunbook/terraform-import.md
Accidental revertrunbook/revert-recovery.md
State mismatchrunbook/terraform-state-reconcile.md

Design Review Checklist

Detection

  • [ ] Scheduled drift detection
  • [ ] Alert on drift
  • [ ] Plan before apply
  • [ ] Manual change tracking

Prevention

  • [ ] SCPs restricting console
  • [ ] Terraform tags
  • [ ] CI/CD only apply
  • [ ] Audit logging

Handling

  • [ ] Documented reconciliation process
  • [ ] ignore_changes for auto-managed
  • [ ] Import workflow
  • [ ] Rollback plan

Operations

  • [ ] Runbooks documented
  • [ ] Team trained
  • [ ] Emergency procedures
  • [ ] State backup

📎 Liên kết