Skip to content

💾 Storage & Data Protection

Level: Core Solves: Chọn đúng storage service và implement data protection strategies cho enterprise workloads

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Chọn đúng Storage Service dựa trên access patterns và durability requirements
  • Thiết kế S3 Strategy với storage classes, lifecycle policies, và replication
  • Cấu hình EBS với volume types phù hợp và encryption
  • Triển khai EFS cho shared file systems với performance modes
  • Implement Backup Strategy với AWS Backup và cross-region replication
  • Đảm bảo Compliance với Object Lock và retention policies

Khi nào dùng

Storage ServiceUse CaseTại sao
S3 StandardHot data, cần truy cập thường xuyên11 9s durability, high availability
S3 Intelligent-TieringUnknown access patternsAuto-optimization, không retrieval fees
S3 GlacierArchive, compliance retentionLowest cost, minutes-hours retrieval
EBS gp3General workloads, databasesConfigurable IOPS/throughput
EBS io2High-performance databasesUp to 64,000 IOPS
EFSShared file system, multi-AZNFS compatible, auto-scaling

Khi nào KHÔNG dùng

PatternVấn đềThay thế
S3 cho database storageLatency quá caoEBS io2/gp3
EBS cho file sharingSingle-attach limitationEFS hoặc FSx
Instance Store cho persistent dataData mất khi stop/terminateEBS
S3 Standard cho archiveCost cao không cần thiếtGlacier Deep Archive
EFS cho high-IOPS workloadsPerformance không đủFSx for Lustre

⚠️ Cảnh báo từ Raizo

"Một startup lưu 500TB logs trong S3 Standard suốt 2 năm. Chi phí $11,500/tháng. Sau khi review, họ nhận ra 95% data không bao giờ access. Việc apply Intelligent-Tiering + lifecycle to Glacier giảm cost xuống $800/tháng. Đó là $128,400 tiết kiệm/năm."

Storage Service Selection

Decision Matrix

┌─────────────────────────────────────────────────────────────────┐
│                 STORAGE DECISION MATRIX                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Access Pattern?                                                │
│  ├── Object storage (files, backups) → S3                       │
│  ├── Block storage (databases, OS) → EBS                        │
│  ├── Shared file system → EFS/FSx                               │
│  └── High-performance parallel → FSx for Lustre                 │
│                                                                 │
│  Latency Requirements?                                          │
│  ├── Sub-millisecond → EBS io2/gp3, Instance Store              │
│  ├── Milliseconds → EFS, S3                                     │
│  └── Seconds acceptable → S3 Glacier                            │
│                                                                 │
│  Durability Requirements?                                       │
│  ├── 11 9s (99.999999999%) → S3                                 │
│  ├── 5 9s (99.999%) → EBS                                       │
│  └── Ephemeral OK → Instance Store                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Service Comparison

ServiceTypeDurabilityLatencyUse Case
S3Object11 9smsFiles, backups, data lake
EBS gp3Block5 9ssub-msGeneral workloads
EBS io2Block5 9ssub-msHigh IOPS databases
EFSFile (NFS)11 9smsShared file systems
FSx LustreFile (Lustre)-sub-msHPC, ML training
Instance StoreBlock0sub-msTemp data, caches

S3 Deep Dive

Storage Classes

┌─────────────────────────────────────────────────────────────────┐
│                    S3 STORAGE CLASSES                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Frequent Access:                                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ S3 Standard          │ Default, highest availability    │    │
│  │ S3 Intelligent-Tier  │ Auto-tiering, monitoring fee     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Infrequent Access:                                             │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ S3 Standard-IA       │ 30-day minimum, retrieval fee    │    │
│  │ S3 One Zone-IA       │ Single AZ, 20% cheaper           │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Archive:                                                       │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ S3 Glacier Instant   │ Milliseconds retrieval           │    │
│  │ S3 Glacier Flexible  │ Minutes to hours retrieval       │    │
│  │ S3 Glacier Deep      │ 12-48 hours, lowest cost         │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Lifecycle Policy Example

json
{
  "Rules": [
    {
      "ID": "ArchiveOldLogs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

S3 Security Best Practices

┌─────────────────────────────────────────────────────────────────┐
│                 S3 SECURITY CHECKLIST                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Account Level:                                                 │
│  ☑ S3 Block Public Access enabled (account-wide)                │
│  ☑ S3 Access Points for application access                      │
│                                                                 │
│  Bucket Level:                                                  │
│  ☑ Bucket policy with least privilege                           │
│  ☑ Server-side encryption (SSE-S3 or SSE-KMS)                   │
│  ☑ Versioning enabled for critical data                         │
│  ☑ Object Lock for compliance (WORM)                            │
│  ☑ Access logging enabled                                       │
│                                                                 │
│  Object Level:                                                  │
│  ☑ Encryption at rest                                           │
│  ☑ Pre-signed URLs for temporary access                         │
│  ☑ Object tagging for classification                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

EBS Configuration

Volume Types

TypeIOPSThroughputUse Case
gp33,000-16,000125-1,000 MB/sGeneral purpose
gp2100-16,000128-250 MB/sLegacy, burstable
io2100-64,0001,000 MB/sHigh-performance DB
st1500500 MB/sBig data, throughput
sc1250250 MB/sCold data, lowest cost

EBS Optimization Tips

┌─────────────────────────────────────────────────────────────────┐
│                 EBS OPTIMIZATION                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Use gp3 over gp2:                                           │
│     • 20% cheaper baseline                                      │
│     • Independent IOPS/throughput scaling                       │
│     • No burst credits to manage                                │
│                                                                 │
│  2. Right-size volumes:                                         │
│     • Monitor CloudWatch metrics                                │
│     • VolumeReadOps, VolumeWriteOps                             │
│     • BurstBalance (for gp2)                                    │
│                                                                 │
│  3. Use EBS-optimized instances:                                │
│     • Dedicated bandwidth to EBS                                │
│     • Most current-gen instances are EBS-optimized              │
│                                                                 │
│  4. Consider io2 Block Express:                                 │
│     • Up to 256,000 IOPS                                        │
│     • Sub-millisecond latency                                   │
│     • For most demanding workloads                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Encryption Strategy

Encryption Options

KMS Key Policy Example

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Enable IAM policies",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Allow S3 service",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:GenerateDataKey*"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "123456789012"
        }
      }
    }
  ]
}

Backup Strategy

AWS Backup Configuration

┌─────────────────────────────────────────────────────────────────┐
│                 BACKUP STRATEGY                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Backup Plan:                                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Rule: DailyBackup                                       │    │
│  │ Schedule: cron(0 5 ? * * *)  # 5 AM UTC daily           │    │
│  │ Lifecycle:                                              │    │
│  │   - Move to cold storage: 30 days                       │    │
│  │   - Delete after: 365 days                              │    │
│  │ Copy to: us-west-2 (cross-region)                       │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Rule: WeeklyBackup                                      │    │
│  │ Schedule: cron(0 5 ? * SUN *)  # Sunday 5 AM UTC        │    │
│  │ Lifecycle:                                              │    │
│  │   - Move to cold storage: 90 days                       │    │
│  │   - Delete after: 2555 days (7 years)                   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Resources:                                                     │
│  • EBS volumes (tag: Backup=true)                               │
│  • RDS databases                                                │
│  • EFS file systems                                             │
│  • DynamoDB tables                                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Cross-Region Replication

Data Protection Compliance

Retention Requirements

ComplianceRetentionStorage Class
SOX7 yearsGlacier Deep Archive
HIPAA6 yearsGlacier + Object Lock
PCI-DSS1 yearStandard-IA
GDPRAs neededWith deletion capability

Object Lock (WORM)

json
{
  "ObjectLockConfiguration": {
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "COMPLIANCE",
        "Years": 7
      }
    }
  }
}

⚠️ Compliance Mode

COMPLIANCE mode cannot be overridden by any user, including root. Use GOVERNANCE mode for testing. Once set, COMPLIANCE retention cannot be shortened.

Best Practices Checklist

  • [ ] Enable S3 Block Public Access account-wide
  • [ ] Use SSE-KMS for sensitive data
  • [ ] Implement lifecycle policies for cost optimization
  • [ ] Enable versioning for critical buckets
  • [ ] Configure cross-region replication for DR
  • [ ] Use AWS Backup for centralized backup management
  • [ ] Enable EBS encryption by default
  • [ ] Monitor storage metrics and costs

⚖️ Trade-offs

Trade-off 1: S3 Storage Classes - Cost vs Availability

Storage ClassCost/GB/thángRetrievalBest For
Standard$0.023InstantHot data
Intelligent-Tiering$0.023 + monitoringInstantUnknown patterns
Standard-IA$0.0125$0.01/GB30+ days, infrequent
Glacier Instant$0.004InstantArchives cần access nhanh
Glacier Deep$0.0009912-48 hoursLong-term archive

Ví dụ tính toán (100TB logs):

S3 Standard:     100,000 GB × $0.023 = $2,300/tháng
Glacier Deep:    100,000 GB × $0.00099 = $99/tháng

Savings: $2,201/tháng = $26,412/năm

Trade-off 2: EBS Volume Types

Volume TypeIOPSThroughputCostUse Case
gp33,000-16,000125-1,000 MB/s$0.08/GBGeneral
io2Up to 64,0001,000 MB/s$0.125/GB + IOPSHigh-perf DB
st1N/A500 MB/s$0.045/GBThroughput-intensive
sc1N/A250 MB/s$0.015/GBCold data

Khuyến nghị:

  • Default: gp3 (có thể configure IOPS và throughput độc lập)
  • Databases: io2 chỉ khi cần > 16,000 IOPS
  • Big Data: st1 cho sequential reads

Trade-off 3: Cross-Region Replication Cost vs RPO

StrategyRPOCostComplexity
No replicationFull backupThấpThấp
Same-region replicationNear-zero+100% storageTrung bình
Cross-region replicationNear-zero+100% storage + transferCao
Glacier CRRHoursThấp hơnTrung bình

🚨 Failure Modes

Failure Mode 1: Accidental Deletion

🔥 Incident thực tế

Developer chạy script xóa "temp" files nhưng regex sai, xóa hết production bucket. Không có versioning, không có replication. 2TB data mất vĩnh viễn. Downtime 3 ngày để reconstruct từ logs.

Cách phát hiệnCách phòng tránh
CloudTrail S3 data eventsEnable versioning + MFA Delete
S3 Inventory reportsCross-region replication
Storage metrics dropObject Lock cho critical data
User reports missing dataIAM restrict s3:DeleteObject

Failure Mode 2: Lifecycle Policy Lỗi

┌─────────────────────────────────────────────────────────────────┐
│                LIFECYCLE POLICY DISASTER                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Nhầm lẫn: Thiết lập transition to Glacier sau 1 ngày          │
│           thay vì 30 ngày                                       │
│                                                                 │
│  Hậu quả:                                                       │
│  - Hot data bị archive, cần restore                            │
│  - Glacier restore cost: 10TB × $0.03 = $300                    │
│  - Application failures do latency                              │
│  - User-facing downtime                                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Cách phát hiệnCách phòng tránh
S3 Storage Lens class distributionTest lifecycle trong non-prod bucket
Unexpected Glacier restore costsPR review cho lifecycle changes
Application latency spikesMinimum 30-day wait before Glacier

Failure Mode 3: EBS Volume Exhaustion

Cách phát hiệnCách phòng tránh
CloudWatch VolumeQueueLength highRight-size volume và IOPS
Application I/O timeoutsMonitor burst balance (gp2)
DiskSpaceUtilization alertsAuto-expand với Lambda + CloudWatch

🔐 Security Baseline

Encryption Requirements

StorageEncryptionKey Management
S3SSE-KMS (default)Customer-managed CMK
EBSEncryption-by-default enabledAccount-level CMK
EFSEncryption at rest + transitManaged CMK
BackupsAWS Backup với CMKVault Lock

S3 Security Configuration

json
{
  "PublicAccessBlockConfiguration": {
    "BlockPublicAcls": true,
    "IgnorePublicAcls": true,
    "BlockPublicPolicy": true,
    "RestrictPublicBuckets": true
  }
}

Access Control Matrix

RequirementImplementationVerification
No public bucketsAccount-level Block Public AccessConfig Rule
EncryptionSSE-KMS defaultBucket policy deny unencrypted
LoggingS3 access logging to central bucketCloudTrail data events
MFA DeleteEnabled cho critical bucketsAWS CLI check

📊 Ops Readiness

Metrics cần Monitoring

ServiceMetricAlert Threshold
S3BucketSizeBytes> budget threshold
S3NumberOfObjectsSpike pattern
EBSVolumeQueueLength> 1 sustained
EBSBurstBalance< 20%
EFSPercentIOLimit> 80%
BackupNumberOfBackupJobsFailed> 0

Alerting Configuration

json
{
  "AlarmName": "EBSHighQueueLength",
  "MetricName": "VolumeQueueLength",
  "Namespace": "AWS/EBS",
  "Dimensions": [{"Name": "VolumeId", "Value": "vol-xxx"}],
  "Threshold": 1,
  "ComparisonOperator": "GreaterThanThreshold",
  "EvaluationPeriods": 5,
  "Period": 60
}

Runbook Entry Points

Tình huốngRunbook
S3 bucket publicly exposedrunbook/s3-public-bucket-remediation.md
EBS volume exhaustionrunbook/ebs-volume-expansion.md
Backup job failurerunbook/backup-failure-investigation.md
Data restoration requestrunbook/data-restoration-procedure.md
Lifecycle policy issuerunbook/lifecycle-policy-review.md
Cross-region replication lagrunbook/crr-troubleshooting.md

Design Review Checklist

Storage Selection

  • [ ] Storage service phù hợp với access patterns
  • [ ] Durability requirements đáp ứng
  • [ ] Performance (IOPS/throughput) sized đúng
  • [ ] Cost optimization với appropriate storage classes

Security

  • [ ] Encryption at rest enabled (SSE-KMS)
  • [ ] Block Public Access enabled account-wide
  • [ ] No public buckets
  • [ ] MFA Delete cho critical data

Data Protection

  • [ ] Versioning enabled cho critical buckets
  • [ ] Lifecycle policies configured và tested
  • [ ] Cross-region replication cho DR
  • [ ] AWS Backup configured với retention

Operations

  • [ ] Storage metrics monitored
  • [ ] Cost alerts configured
  • [ ] Backup job monitoring
  • [ ] Runbooks documented

📎 Liên kết