Giao diện
💾 Drift Management
Level: Core Solves: Phát hiện và xử lý configuration drift giữa Terraform state và actual infrastructure
🎯 Mục tiêu (Outcomes)
Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:
- Phát hiện Drift với terraform plan và refresh
- Xử lý Drift với các strategies phù hợp
- Import Resources vào Terraform management
- Ngăn chặn Drift với policies và automation
- Thiết lập Scheduled Drift Detection
- Sử dụng lifecycle ignore_changes đúng cách
✅ Khi nào dùng
| Strategy | Use Case | Lý do |
|---|---|---|
| terraform plan | Bất kỳ change | Detect differences |
| terraform import | Existing resources | Bring under TF control |
| ignore_changes | Auto-managed attrs | Prevent unwanted changes |
| Scheduled detection | Production | Catch manual changes |
❌ Khi nào KHÔNG dùng
| Pattern | Vấn đề | Thay thế |
|---|---|---|
| ignore_changes = all | Lose control | Specific attributes |
| Manual refresh | State mismatch | Plan first |
| Force-unlock không kiểm tra | State corruption | Verify lock holder |
| Import không viết config | Next plan fails | Config trước, import sau |
⚠️ Cảnh báo từ Raizo
"Emergency fix trong console lúc 3 AM. Quên update Terraform. Next apply revert fix. Production down lần 2. Luôn document manual changes và update IaC."
Drift là gì?
💡 Giáo sư Tom
Drift xảy ra khi reality không match với expectation. Ai đó click trong console, script chạy ngoài Terraform, hoặc auto-scaling thay đổi resources. Drift là inevitable - cách bạn handle nó mới quan trọng.
Drift Scenarios
┌─────────────────────────────────────────────────────────────────┐
│ CONFIGURATION DRIFT │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Terraform │ │ State │ │ Actual │ │
│ │ Config │ │ File │ │ Infra │ │
│ │ │ │ │ │ │ │
│ │ t3.micro │ │ t3.micro │ │ t3.large │ ← DRIFT! │
│ │ port 443 │ │ port 443 │ │ port 443 │ │
│ │ 3 replicas │ │ 3 replicas │ │ 5 replicas │ ← DRIFT! │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ DRIFT CAUSES: │
│ • Manual console changes (emergency fixes) │
│ • Scripts running outside Terraform │
│ • Auto-scaling events │
│ • AWS/GCP service updates │
│ • Another team's Terraform (overlapping resources) │
│ │
└─────────────────────────────────────────────────────────────────┘Detecting Drift
terraform plan
bash
# Most common drift detection
terraform plan
# Output shows drift
# aws_instance.web:
# ~ instance_type = "t3.micro" -> "t3.large" # Changed outside Terraformterraform refresh
bash
# Update state to match reality (without changing infra)
terraform refresh
# Or with plan
terraform plan -refresh-only⚠️ Refresh Warning
terraform refresh updates state to match reality. Nếu ai đó delete resource ngoài Terraform, refresh sẽ remove nó khỏi state - và Terraform sẽ không biết để recreate.
Drift Detection Workflow
Handling Drift
Option 1: Revert to Terraform (Most Common)
bash
# Terraform will change infrastructure back to match config
terraform apply
# Example output:
# aws_instance.web will be updated in-place
# ~ instance_type = "t3.large" -> "t3.micro"Option 2: Accept Reality (Update Config)
hcl
# If the manual change was intentional, update config
resource "aws_instance" "web" {
instance_type = "t3.large" # Changed from t3.micro
}bash
# Now plan shows no changes
terraform plan
# No changes. Infrastructure is up-to-date.Option 3: Ignore Specific Attributes
hcl
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = var.instance_type
# Ignore changes to tags made by auto-tagging systems
lifecycle {
ignore_changes = [
tags["LastModified"],
tags["AutoScaleGroup"],
]
}
}
# Ignore all tag changes
lifecycle {
ignore_changes = [tags]
}
# Ignore all changes (resource managed elsewhere)
lifecycle {
ignore_changes = all
}Importing Resources
When to Import
- Existing infrastructure created before Terraform
- Resources created manually during emergency
- Migrating from another IaC tool
- Taking over resources from another team
Import Workflow
┌─────────────────────────────────────────────────────────────────┐
│ IMPORT WORKFLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. IDENTIFY │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Find resource ID in cloud console │ │
│ │ Example: vpc-0123456789abcdef0 │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. WRITE CONFIG │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ resource "aws_vpc" "imported" { │ │
│ │ # Empty or minimal config │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 3. IMPORT │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ terraform import aws_vpc.imported vpc-xxx │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. COMPLETE CONFIG │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ terraform show -no-color > imported.txt │ │
│ │ Copy attributes to config │ │
│ │ Run terraform plan (should show no changes) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Import Commands
bash
# Basic import
terraform import aws_vpc.main vpc-0123456789abcdef0
# Import with provider alias
terraform import -provider=aws.eu aws_vpc.eu_main vpc-xxx
# Import module resource
terraform import module.vpc.aws_vpc.main vpc-xxx
# Import with index (count/for_each)
terraform import 'aws_subnet.public[0]' subnet-xxx
terraform import 'aws_subnet.public["us-east-1a"]' subnet-xxxImport Block (Terraform 1.5+)
hcl
# import.tf - Declarative import
import {
to = aws_vpc.main
id = "vpc-0123456789abcdef0"
}
import {
to = aws_subnet.public[0]
id = "subnet-abc123"
}
# Generate config automatically
terraform plan -generate-config-out=generated.tfState Manipulation
Moving Resources
bash
# Rename resource
terraform state mv aws_vpc.old aws_vpc.new
# Move to module
terraform state mv aws_vpc.main module.vpc.aws_vpc.main
# Move between modules
terraform state mv module.old.aws_vpc.main module.new.aws_vpc.mainRemoving from State
bash
# Remove resource from state (doesn't destroy actual resource)
terraform state rm aws_vpc.main
# Use case: Resource will be managed by different Terraform config
# Use case: Resource was imported by mistakeReplacing Resources
bash
# Force recreation of resource
terraform apply -replace=aws_instance.web
# Old way (deprecated)
terraform taint aws_instance.web
terraform applyDrift Prevention Strategies
1. Restrict Console Access
hcl
# SCP to prevent manual changes
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyManualEC2Changes",
"Effect": "Deny",
"Action": [
"ec2:ModifyInstanceAttribute",
"ec2:TerminateInstances"
],
"Resource": "*",
"Condition": {
"StringNotLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/TerraformRole"
}
}
}
]
}2. Scheduled Drift Detection
yaml
# GitHub Actions - Daily drift check
name: Drift Detection
on:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Check for Drift
run: |
terraform plan -detailed-exitcode
# Exit code 2 = changes detected (drift)
continue-on-error: true
- name: Notify on Drift
if: failure()
run: |
# Send Slack/email notification
echo "Drift detected!"3. Tagging for Terraform-Managed Resources
hcl
locals {
terraform_tags = {
ManagedBy = "terraform"
TerraformRepo = "github.com/company/infra"
TerraformPath = "environments/prod/vpc"
}
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = merge(var.tags, local.terraform_tags)
}4. Resource Locks
hcl
# Prevent accidental deletion
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
lifecycle {
prevent_destroy = true
}
}Reconciliation Patterns
Pattern 1: Terraform Wins
bash
# Always revert to Terraform config
terraform apply -auto-approvePattern 2: Reality Wins
bash
# Update state to match reality
terraform apply -refresh-only -auto-approve
# Then update config to match
# (manual process)Pattern 3: Selective Reconciliation
hcl
# Some attributes managed by Terraform
# Some attributes managed externally
resource "aws_autoscaling_group" "web" {
min_size = 2
max_size = 10
lifecycle {
ignore_changes = [
desired_capacity, # Managed by auto-scaling
]
}
}Troubleshooting Drift
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Resource not in state | Deleted outside TF | Re-import or recreate |
| Attribute mismatch | Manual change | Apply or update config |
| Provider version drift | API changes | Update provider version |
| State corruption | Concurrent access | Restore from backup |
Best Practices Checklist
- [ ] Scheduled drift detection (daily)
- [ ] Alert on drift detected
- [ ] Document manual changes immediately
- [ ] Use ignore_changes sparingly
- [ ] Tag Terraform-managed resources
- [ ] Restrict console access with SCPs
- [ ] Import before recreate
- [ ] State backup before major operations
⚖️ Trade-offs
Trade-off 1: Terraform Wins vs Reality Wins
| Approach | Consistency | Flexibility |
|---|---|---|
| Terraform wins | Cao | Thấp |
| Reality wins | Thấp | Cao |
| Selective (ignore_changes) | Trung bình | Trung bình |
Khuyến nghị: Terraform wins, với ignore_changes cho auto-scaling attributes.
Trade-off 2: Drift Detection Frequency
| Frequency | Detection Speed | CI Cost |
|---|---|---|
| Every push | Instant | Cao |
| Hourly | < 1 hour | Trung bình |
| Daily | < 24 hours | Thấp |
Trade-off 3: Import vs Recreate
| Approach | Downtime | Complexity |
|---|---|---|
| Import | Zero | Cao |
| Recreate | Có | Thấp |
| Move (mv) | Zero | Trung bình |
🚨 Failure Modes
Failure Mode 1: Accidental Revert
🔥 Incident thực tế
Manual fix trong console 3 AM. Terraform apply lúc 9 AM revert fix. Production down lần 2.
| Cách phát hiện | Cách phòng tránh |
|---|---|
| User reports | Document manual changes |
| Plan shows changes | Update TF trước apply |
| Monitoring alerts | Review plan carefully |
Failure Mode 2: Import Without Config
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Plan shows changes | Write config first |
| Missing attributes | Use terraform show |
| Next apply modifies | Plan verification |
Failure Mode 3: State/Reality Mismatch
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Resources not found | Regular drift check |
| Unexpected changes | Lock state properly |
| Apply errors | State backup |
🔐 Security Baseline
Drift Prevention Security
| Requirement | Implementation | Verification |
|---|---|---|
| Restrict console | SCPs, IAM | Policy audit |
| Audit trail | CloudTrail, Audit logs | Log review |
| Terraform-only changes | CI/CD enforcement | Access review |
| Tag managed resources | Standard tags | Compliance scan |
Access Control
| Action | Who Can Do |
|---|---|
| Console read | Engineers |
| Console write | Emergency only |
| Terraform apply | CI/CD |
| State access | Terraform role only |
📊 Ops Readiness
Metrics cần Monitoring
| Metric | Source | Alert Threshold |
|---|---|---|
| Drift detected | CI/CD plan | Any |
| Manual console changes | CloudTrail | Any |
| Import count | CI logs | Tracking |
| Time since last sync | CI metadata | > 24 hours |
Runbook Entry Points
| Tình huống | Runbook |
|---|---|
| Drift detected | runbook/terraform-drift-resolution.md |
| Manual change needed | runbook/emergency-manual-change.md |
| Import resource | runbook/terraform-import.md |
| Accidental revert | runbook/revert-recovery.md |
| State mismatch | runbook/terraform-state-reconcile.md |
✅ Design Review Checklist
Detection
- [ ] Scheduled drift detection
- [ ] Alert on drift
- [ ] Plan before apply
- [ ] Manual change tracking
Prevention
- [ ] SCPs restricting console
- [ ] Terraform tags
- [ ] CI/CD only apply
- [ ] Audit logging
Handling
- [ ] Documented reconciliation process
- [ ] ignore_changes for auto-managed
- [ ] Import workflow
- [ ] Rollback plan
Operations
- [ ] Runbooks documented
- [ ] Team trained
- [ ] Emergency procedures
- [ ] State backup
📎 Liên kết
- 📎 State Management - State file fundamentals
- 📎 Module Design - Module lifecycle management
- 📎 Testing & CI/CD - Automated drift detection
- 📎 AWS Observability - CloudTrail for change tracking
- 📎 GCP Audit - GCP audit logging
- 📎 Terraform Security - Access control patterns