Giao diện
💾 Storage & Data Protection
Level: Core Solves: Chọn đúng storage service và implement data protection strategies cho enterprise workloads
🎯 Mục tiêu (Outcomes)
Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:
- Chọn đúng Storage Service dựa trên access patterns và durability requirements
- Thiết kế S3 Strategy với storage classes, lifecycle policies, và replication
- Cấu hình EBS với volume types phù hợp và encryption
- Triển khai EFS cho shared file systems với performance modes
- Implement Backup Strategy với AWS Backup và cross-region replication
- Đảm bảo Compliance với Object Lock và retention policies
✅ Khi nào dùng
| Storage Service | Use Case | Tại sao |
|---|---|---|
| S3 Standard | Hot data, cần truy cập thường xuyên | 11 9s durability, high availability |
| S3 Intelligent-Tiering | Unknown access patterns | Auto-optimization, không retrieval fees |
| S3 Glacier | Archive, compliance retention | Lowest cost, minutes-hours retrieval |
| EBS gp3 | General workloads, databases | Configurable IOPS/throughput |
| EBS io2 | High-performance databases | Up to 64,000 IOPS |
| EFS | Shared file system, multi-AZ | NFS compatible, auto-scaling |
❌ Khi nào KHÔNG dùng
| Pattern | Vấn đề | Thay thế |
|---|---|---|
| S3 cho database storage | Latency quá cao | EBS io2/gp3 |
| EBS cho file sharing | Single-attach limitation | EFS hoặc FSx |
| Instance Store cho persistent data | Data mất khi stop/terminate | EBS |
| S3 Standard cho archive | Cost cao không cần thiết | Glacier Deep Archive |
| EFS cho high-IOPS workloads | Performance không đủ | FSx for Lustre |
⚠️ Cảnh báo từ Raizo
"Một startup lưu 500TB logs trong S3 Standard suốt 2 năm. Chi phí $11,500/tháng. Sau khi review, họ nhận ra 95% data không bao giờ access. Việc apply Intelligent-Tiering + lifecycle to Glacier giảm cost xuống $800/tháng. Đó là $128,400 tiết kiệm/năm."
Storage Service Selection
Decision Matrix
┌─────────────────────────────────────────────────────────────────┐
│ STORAGE DECISION MATRIX │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Access Pattern? │
│ ├── Object storage (files, backups) → S3 │
│ ├── Block storage (databases, OS) → EBS │
│ ├── Shared file system → EFS/FSx │
│ └── High-performance parallel → FSx for Lustre │
│ │
│ Latency Requirements? │
│ ├── Sub-millisecond → EBS io2/gp3, Instance Store │
│ ├── Milliseconds → EFS, S3 │
│ └── Seconds acceptable → S3 Glacier │
│ │
│ Durability Requirements? │
│ ├── 11 9s (99.999999999%) → S3 │
│ ├── 5 9s (99.999%) → EBS │
│ └── Ephemeral OK → Instance Store │
│ │
└─────────────────────────────────────────────────────────────────┘Service Comparison
| Service | Type | Durability | Latency | Use Case |
|---|---|---|---|---|
| S3 | Object | 11 9s | ms | Files, backups, data lake |
| EBS gp3 | Block | 5 9s | sub-ms | General workloads |
| EBS io2 | Block | 5 9s | sub-ms | High IOPS databases |
| EFS | File (NFS) | 11 9s | ms | Shared file systems |
| FSx Lustre | File (Lustre) | - | sub-ms | HPC, ML training |
| Instance Store | Block | 0 | sub-ms | Temp data, caches |
S3 Deep Dive
Storage Classes
┌─────────────────────────────────────────────────────────────────┐
│ S3 STORAGE CLASSES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Frequent Access: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ S3 Standard │ Default, highest availability │ │
│ │ S3 Intelligent-Tier │ Auto-tiering, monitoring fee │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Infrequent Access: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ S3 Standard-IA │ 30-day minimum, retrieval fee │ │
│ │ S3 One Zone-IA │ Single AZ, 20% cheaper │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Archive: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ S3 Glacier Instant │ Milliseconds retrieval │ │
│ │ S3 Glacier Flexible │ Minutes to hours retrieval │ │
│ │ S3 Glacier Deep │ 12-48 hours, lowest cost │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Lifecycle Policy Example
json
{
"Rules": [
{
"ID": "ArchiveOldLogs",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555
}
}
]
}S3 Security Best Practices
┌─────────────────────────────────────────────────────────────────┐
│ S3 SECURITY CHECKLIST │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Account Level: │
│ ☑ S3 Block Public Access enabled (account-wide) │
│ ☑ S3 Access Points for application access │
│ │
│ Bucket Level: │
│ ☑ Bucket policy with least privilege │
│ ☑ Server-side encryption (SSE-S3 or SSE-KMS) │
│ ☑ Versioning enabled for critical data │
│ ☑ Object Lock for compliance (WORM) │
│ ☑ Access logging enabled │
│ │
│ Object Level: │
│ ☑ Encryption at rest │
│ ☑ Pre-signed URLs for temporary access │
│ ☑ Object tagging for classification │
│ │
└─────────────────────────────────────────────────────────────────┘EBS Configuration
Volume Types
| Type | IOPS | Throughput | Use Case |
|---|---|---|---|
| gp3 | 3,000-16,000 | 125-1,000 MB/s | General purpose |
| gp2 | 100-16,000 | 128-250 MB/s | Legacy, burstable |
| io2 | 100-64,000 | 1,000 MB/s | High-performance DB |
| st1 | 500 | 500 MB/s | Big data, throughput |
| sc1 | 250 | 250 MB/s | Cold data, lowest cost |
EBS Optimization Tips
┌─────────────────────────────────────────────────────────────────┐
│ EBS OPTIMIZATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Use gp3 over gp2: │
│ • 20% cheaper baseline │
│ • Independent IOPS/throughput scaling │
│ • No burst credits to manage │
│ │
│ 2. Right-size volumes: │
│ • Monitor CloudWatch metrics │
│ • VolumeReadOps, VolumeWriteOps │
│ • BurstBalance (for gp2) │
│ │
│ 3. Use EBS-optimized instances: │
│ • Dedicated bandwidth to EBS │
│ • Most current-gen instances are EBS-optimized │
│ │
│ 4. Consider io2 Block Express: │
│ • Up to 256,000 IOPS │
│ • Sub-millisecond latency │
│ • For most demanding workloads │
│ │
└─────────────────────────────────────────────────────────────────┘Encryption Strategy
Encryption Options
KMS Key Policy Example
json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable IAM policies",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Allow S3 service",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey*"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456789012"
}
}
}
]
}Backup Strategy
AWS Backup Configuration
┌─────────────────────────────────────────────────────────────────┐
│ BACKUP STRATEGY │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Backup Plan: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Rule: DailyBackup │ │
│ │ Schedule: cron(0 5 ? * * *) # 5 AM UTC daily │ │
│ │ Lifecycle: │ │
│ │ - Move to cold storage: 30 days │ │
│ │ - Delete after: 365 days │ │
│ │ Copy to: us-west-2 (cross-region) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Rule: WeeklyBackup │ │
│ │ Schedule: cron(0 5 ? * SUN *) # Sunday 5 AM UTC │ │
│ │ Lifecycle: │ │
│ │ - Move to cold storage: 90 days │ │
│ │ - Delete after: 2555 days (7 years) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Resources: │
│ • EBS volumes (tag: Backup=true) │
│ • RDS databases │
│ • EFS file systems │
│ • DynamoDB tables │
│ │
└─────────────────────────────────────────────────────────────────┘Cross-Region Replication
Data Protection Compliance
Retention Requirements
| Compliance | Retention | Storage Class |
|---|---|---|
| SOX | 7 years | Glacier Deep Archive |
| HIPAA | 6 years | Glacier + Object Lock |
| PCI-DSS | 1 year | Standard-IA |
| GDPR | As needed | With deletion capability |
Object Lock (WORM)
json
{
"ObjectLockConfiguration": {
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Years": 7
}
}
}
}⚠️ Compliance Mode
COMPLIANCE mode cannot be overridden by any user, including root. Use GOVERNANCE mode for testing. Once set, COMPLIANCE retention cannot be shortened.
Best Practices Checklist
- [ ] Enable S3 Block Public Access account-wide
- [ ] Use SSE-KMS for sensitive data
- [ ] Implement lifecycle policies for cost optimization
- [ ] Enable versioning for critical buckets
- [ ] Configure cross-region replication for DR
- [ ] Use AWS Backup for centralized backup management
- [ ] Enable EBS encryption by default
- [ ] Monitor storage metrics and costs
⚖️ Trade-offs
Trade-off 1: S3 Storage Classes - Cost vs Availability
| Storage Class | Cost/GB/tháng | Retrieval | Best For |
|---|---|---|---|
| Standard | $0.023 | Instant | Hot data |
| Intelligent-Tiering | $0.023 + monitoring | Instant | Unknown patterns |
| Standard-IA | $0.0125 | $0.01/GB | 30+ days, infrequent |
| Glacier Instant | $0.004 | Instant | Archives cần access nhanh |
| Glacier Deep | $0.00099 | 12-48 hours | Long-term archive |
Ví dụ tính toán (100TB logs):
S3 Standard: 100,000 GB × $0.023 = $2,300/tháng
Glacier Deep: 100,000 GB × $0.00099 = $99/tháng
Savings: $2,201/tháng = $26,412/nămTrade-off 2: EBS Volume Types
| Volume Type | IOPS | Throughput | Cost | Use Case |
|---|---|---|---|---|
| gp3 | 3,000-16,000 | 125-1,000 MB/s | $0.08/GB | General |
| io2 | Up to 64,000 | 1,000 MB/s | $0.125/GB + IOPS | High-perf DB |
| st1 | N/A | 500 MB/s | $0.045/GB | Throughput-intensive |
| sc1 | N/A | 250 MB/s | $0.015/GB | Cold data |
Khuyến nghị:
- Default: gp3 (có thể configure IOPS và throughput độc lập)
- Databases: io2 chỉ khi cần > 16,000 IOPS
- Big Data: st1 cho sequential reads
Trade-off 3: Cross-Region Replication Cost vs RPO
| Strategy | RPO | Cost | Complexity |
|---|---|---|---|
| No replication | Full backup | Thấp | Thấp |
| Same-region replication | Near-zero | +100% storage | Trung bình |
| Cross-region replication | Near-zero | +100% storage + transfer | Cao |
| Glacier CRR | Hours | Thấp hơn | Trung bình |
🚨 Failure Modes
Failure Mode 1: Accidental Deletion
🔥 Incident thực tế
Developer chạy script xóa "temp" files nhưng regex sai, xóa hết production bucket. Không có versioning, không có replication. 2TB data mất vĩnh viễn. Downtime 3 ngày để reconstruct từ logs.
| Cách phát hiện | Cách phòng tránh |
|---|---|
| CloudTrail S3 data events | Enable versioning + MFA Delete |
| S3 Inventory reports | Cross-region replication |
| Storage metrics drop | Object Lock cho critical data |
| User reports missing data | IAM restrict s3:DeleteObject |
Failure Mode 2: Lifecycle Policy Lỗi
┌─────────────────────────────────────────────────────────────────┐
│ LIFECYCLE POLICY DISASTER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Nhầm lẫn: Thiết lập transition to Glacier sau 1 ngày │
│ thay vì 30 ngày │
│ │
│ Hậu quả: │
│ - Hot data bị archive, cần restore │
│ - Glacier restore cost: 10TB × $0.03 = $300 │
│ - Application failures do latency │
│ - User-facing downtime │
│ │
└─────────────────────────────────────────────────────────────────┘| Cách phát hiện | Cách phòng tránh |
|---|---|
| S3 Storage Lens class distribution | Test lifecycle trong non-prod bucket |
| Unexpected Glacier restore costs | PR review cho lifecycle changes |
| Application latency spikes | Minimum 30-day wait before Glacier |
Failure Mode 3: EBS Volume Exhaustion
| Cách phát hiện | Cách phòng tránh |
|---|---|
CloudWatch VolumeQueueLength high | Right-size volume và IOPS |
| Application I/O timeouts | Monitor burst balance (gp2) |
DiskSpaceUtilization alerts | Auto-expand với Lambda + CloudWatch |
🔐 Security Baseline
Encryption Requirements
| Storage | Encryption | Key Management |
|---|---|---|
| S3 | SSE-KMS (default) | Customer-managed CMK |
| EBS | Encryption-by-default enabled | Account-level CMK |
| EFS | Encryption at rest + transit | Managed CMK |
| Backups | AWS Backup với CMK | Vault Lock |
S3 Security Configuration
json
{
"PublicAccessBlockConfiguration": {
"BlockPublicAcls": true,
"IgnorePublicAcls": true,
"BlockPublicPolicy": true,
"RestrictPublicBuckets": true
}
}Access Control Matrix
| Requirement | Implementation | Verification |
|---|---|---|
| No public buckets | Account-level Block Public Access | Config Rule |
| Encryption | SSE-KMS default | Bucket policy deny unencrypted |
| Logging | S3 access logging to central bucket | CloudTrail data events |
| MFA Delete | Enabled cho critical buckets | AWS CLI check |
📊 Ops Readiness
Metrics cần Monitoring
| Service | Metric | Alert Threshold |
|---|---|---|
| S3 | BucketSizeBytes | > budget threshold |
| S3 | NumberOfObjects | Spike pattern |
| EBS | VolumeQueueLength | > 1 sustained |
| EBS | BurstBalance | < 20% |
| EFS | PercentIOLimit | > 80% |
| Backup | NumberOfBackupJobsFailed | > 0 |
Alerting Configuration
json
{
"AlarmName": "EBSHighQueueLength",
"MetricName": "VolumeQueueLength",
"Namespace": "AWS/EBS",
"Dimensions": [{"Name": "VolumeId", "Value": "vol-xxx"}],
"Threshold": 1,
"ComparisonOperator": "GreaterThanThreshold",
"EvaluationPeriods": 5,
"Period": 60
}Runbook Entry Points
| Tình huống | Runbook |
|---|---|
| S3 bucket publicly exposed | runbook/s3-public-bucket-remediation.md |
| EBS volume exhaustion | runbook/ebs-volume-expansion.md |
| Backup job failure | runbook/backup-failure-investigation.md |
| Data restoration request | runbook/data-restoration-procedure.md |
| Lifecycle policy issue | runbook/lifecycle-policy-review.md |
| Cross-region replication lag | runbook/crr-troubleshooting.md |
✅ Design Review Checklist
Storage Selection
- [ ] Storage service phù hợp với access patterns
- [ ] Durability requirements đáp ứng
- [ ] Performance (IOPS/throughput) sized đúng
- [ ] Cost optimization với appropriate storage classes
Security
- [ ] Encryption at rest enabled (SSE-KMS)
- [ ] Block Public Access enabled account-wide
- [ ] No public buckets
- [ ] MFA Delete cho critical data
Data Protection
- [ ] Versioning enabled cho critical buckets
- [ ] Lifecycle policies configured và tested
- [ ] Cross-region replication cho DR
- [ ] AWS Backup configured với retention
Operations
- [ ] Storage metrics monitored
- [ ] Cost alerts configured
- [ ] Backup job monitoring
- [ ] Runbooks documented
📎 Liên kết
- 📎 GCP Data Platforms - So sánh với GCS, Persistent Disk
- 📎 Key & Secrets Management - KMS integration patterns
- 📎 Reliability & DR - Storage trong DR strategy
- 📎 Cost Governance - Storage cost optimization
- 📎 Terraform State - S3 backend configuration