Giao diện
🏗️ Multi-Cloud Patterns
Level: Advanced Solves: Thiết kế và triển khai infrastructure across multiple cloud providers với Terraform
🎯 Mục tiêu (Outcomes)
Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:
- Đánh giá Multi-Cloud requirements và trade-offs
- Thiết kế Directory Structure cho multi-cloud
- Implement Abstraction Patterns phù hợp
- Cấu hình Cross-Cloud Networking (VPN, Interconnect)
- Tránh Anti-patterns thường gặp
- Thiết lập Unified Monitoring across clouds
✅ Khi nào dùng
| Pattern | Use Case | Lý do |
|---|---|---|
| Best-of-breed | BigQuery + Lambda | Tận dụng strengths |
| DR cross-cloud | Regulatory | True resilience |
| M&A | Inherited infra | Business requirement |
| Data residency | Regional compliance | Legal requirement |
❌ Khi nào KHÔNG dùng
| Pattern | Vấn đề | Thay thế |
|---|---|---|
| "Avoid lock-in" vô căn cứ | Complexity ko xứng | Single cloud |
| Resume-driven | No business value | Single cloud |
| Universal abstraction | Lose features | Cloud-specific modules |
| Shared state all clouds | Blast radius | Separate state |
⚠️ Cảnh báo từ Raizo
"Team quyết định multi-cloud 'to avoid lock-in'. 2x ops cost, 3x complexity. Sau 1 năm, 95% workload vẫn ở AWS. Multi-cloud chỉ khi có business justification rõ ràng."
Tại sao Multi-Cloud?
💡 Giáo sư Tom
Multi-cloud không phải silver bullet. Nó thêm complexity, cost, và operational overhead. Chỉ adopt khi có business justification rõ ràng: vendor lock-in avoidance, regulatory requirements, hoặc best-of-breed services.
Multi-Cloud Motivations
┌─────────────────────────────────────────────────────────────────┐
│ MULTI-CLOUD MOTIVATIONS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ VALID REASONS: │
│ ✅ Regulatory requirements (data residency) │
│ ✅ M&A - acquired company uses different cloud │
│ ✅ Best-of-breed services (GCP BigQuery + AWS Lambda) │
│ ✅ Disaster recovery across providers │
│ ✅ Negotiating leverage with vendors │
│ │
│ INVALID REASONS: │
│ ❌ "Avoid vendor lock-in" without specific requirements │
│ ❌ "Everyone is doing it" │
│ ❌ Resume-driven development │
│ ❌ Premature optimization │
│ │
│ COSTS: │
│ • 2x+ operational complexity │
│ • Team needs expertise in multiple clouds │
│ • Cross-cloud networking is expensive │
│ • Lowest common denominator abstractions │
│ │
└─────────────────────────────────────────────────────────────────┘Multi-Cloud Architecture Patterns
Pattern 1: Cloud-Specific Workloads
┌─────────────────────────────────────────────────────────────────┐
│ CLOUD-SPECIFIC WORKLOADS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ AWS │ │ GCP │ │
│ │ │ │ │ │
│ │ • Web Application │ │ • Data Analytics │ │
│ │ • API Gateway │ │ • BigQuery │ │
│ │ • Lambda Functions │ │ • ML Training │ │
│ │ • RDS Database │ │ • Vertex AI │ │
│ │ │ │ │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ └──────────┬───────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Cross-Cloud │ │
│ │ Data Transfer │ │
│ │ (VPN/Interconnect)│ │
│ └─────────────────────┘ │
│ │
│ USE CASE: Best-of-breed services │
│ TERRAFORM: Separate state per cloud, shared modules │
│ │
└─────────────────────────────────────────────────────────────────┘Pattern 2: Active-Passive DR
┌─────────────────────────────────────────────────────────────────┐
│ ACTIVE-PASSIVE DR │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ AWS (Primary) │ │ GCP (DR) │ │
│ │ │ │ │ │
│ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │
│ │ │ Application │ │ │ │ Application │ │ │
│ │ │ (Active) │ │ │ │ (Standby) │ │ │
│ │ └───────────────┘ │ │ └───────────────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌──────▼────────┐ │ │ ┌──────▼────────┐ │ │
│ │ │ Database │──┼────┼──│ Database │ │ │
│ │ │ (Primary) │ │ │ │ (Replica) │ │ │
│ │ └───────────────┘ │ │ └───────────────┘ │ │
│ │ │ │ │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ USE CASE: Disaster recovery, regulatory compliance │
│ TERRAFORM: Same modules, different provider configurations │
│ │
└─────────────────────────────────────────────────────────────────┘Terraform Multi-Cloud Structure
Directory Organization
infrastructure/
├── modules/
│ ├── common/ # Cloud-agnostic abstractions
│ │ ├── network/
│ │ └── compute/
│ ├── aws/ # AWS-specific modules
│ │ ├── vpc/
│ │ ├── eks/
│ │ └── rds/
│ └── gcp/ # GCP-specific modules
│ ├── vpc/
│ ├── gke/
│ └── cloudsql/
├── environments/
│ ├── aws-prod/
│ │ ├── main.tf
│ │ ├── providers.tf
│ │ └── backend.tf
│ ├── gcp-prod/
│ │ ├── main.tf
│ │ ├── providers.tf
│ │ └── backend.tf
│ └── multi-cloud-prod/ # Cross-cloud resources
│ ├── main.tf
│ ├── providers.tf
│ └── backend.tf
└── global/
└── dns/ # Global DNS (Route53/Cloud DNS)Multiple Providers Configuration
hcl
# providers.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
# AWS Provider
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project
}
}
}
# GCP Provider
provider "google" {
project = var.gcp_project_id
region = "us-central1"
}
# Provider aliases for multi-region
provider "aws" {
alias = "eu"
region = "eu-west-1"
}
provider "google" {
alias = "eu"
project = var.gcp_project_id
region = "europe-west1"
}Abstraction Patterns
Pattern 1: Cloud-Agnostic Interface
hcl
# modules/common/compute/variables.tf
variable "name" {
description = "Instance name"
type = string
}
variable "size" {
description = "Instance size (small, medium, large)"
type = string
validation {
condition = contains(["small", "medium", "large"], var.size)
error_message = "Size must be small, medium, or large."
}
}
variable "cloud_provider" {
description = "Cloud provider (aws or gcp)"
type = string
}
# modules/common/compute/main.tf
locals {
# Map abstract sizes to cloud-specific instance types
aws_instance_types = {
small = "t3.micro"
medium = "t3.small"
large = "t3.large"
}
gcp_machine_types = {
small = "e2-micro"
medium = "e2-small"
large = "e2-standard-2"
}
}
# AWS Instance
resource "aws_instance" "main" {
count = var.cloud_provider == "aws" ? 1 : 0
ami = data.aws_ami.amazon_linux[0].id
instance_type = local.aws_instance_types[var.size]
tags = {
Name = var.name
}
}
# GCP Instance
resource "google_compute_instance" "main" {
count = var.cloud_provider == "gcp" ? 1 : 0
name = var.name
machine_type = local.gcp_machine_types[var.size]
zone = "${var.region}-a"
boot_disk {
initialize_params {
image = "debian-cloud/debian-11"
}
}
network_interface {
network = "default"
}
}Pattern 2: Separate Modules, Common Interface
hcl
# modules/aws/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
# modules/gcp/vpc/outputs.tf
output "vpc_id" {
value = google_compute_network.main.id
}
output "private_subnet_ids" {
value = google_compute_subnetwork.private[*].id
}
output "public_subnet_ids" {
value = google_compute_subnetwork.public[*].id
}
# Usage - same interface, different implementations
module "aws_network" {
source = "../../modules/aws/vpc"
name = "prod"
cidr_block = "10.0.0.0/16"
}
module "gcp_network" {
source = "../../modules/gcp/vpc"
name = "prod"
cidr_block = "10.1.0.0/16"
}Cross-Cloud Networking
VPN Connection
hcl
# AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
vpc_id = module.aws_vpc.vpc_id
tags = {
Name = "aws-to-gcp"
}
}
resource "aws_customer_gateway" "gcp" {
bgp_asn = 65000
ip_address = google_compute_address.vpn.address
type = "ipsec.1"
tags = {
Name = "gcp-gateway"
}
}
resource "aws_vpn_connection" "to_gcp" {
vpn_gateway_id = aws_vpn_gateway.main.id
customer_gateway_id = aws_customer_gateway.gcp.id
type = "ipsec.1"
static_routes_only = true
}
# GCP VPN
resource "google_compute_vpn_gateway" "main" {
name = "gcp-to-aws"
network = module.gcp_vpc.vpc_id
}
resource "google_compute_address" "vpn" {
name = "vpn-static-ip"
}
resource "google_compute_vpn_tunnel" "to_aws" {
name = "tunnel-to-aws"
peer_ip = aws_vpn_connection.to_gcp.tunnel1_address
shared_secret = aws_vpn_connection.to_gcp.tunnel1_preshared_key
target_vpn_gateway = google_compute_vpn_gateway.main.id
local_traffic_selector = ["10.1.0.0/16"]
remote_traffic_selector = ["10.0.0.0/16"]
}Cloud Interconnect (Dedicated)
hcl
# For high-bandwidth, low-latency connections
# Requires physical setup with cloud providers
# AWS Direct Connect
resource "aws_dx_connection" "main" {
name = "cross-cloud-interconnect"
bandwidth = "10Gbps"
location = "EqDC2"
}
# GCP Dedicated Interconnect
resource "google_compute_interconnect_attachment" "main" {
name = "cross-cloud-interconnect"
router = google_compute_router.main.id
type = "DEDICATED"
interconnect = "https://www.googleapis.com/compute/v1/projects/${var.project}/global/interconnects/my-interconnect"
}Anti-Patterns to Avoid
❌ Lowest Common Denominator
hcl
# BAD: Avoiding cloud-specific features
resource "aws_instance" "web" {
# Not using AWS-specific features like:
# - Instance profiles
# - Placement groups
# - Enhanced networking
# Just to maintain "portability"
}
# GOOD: Use cloud-specific features when beneficial
resource "aws_instance" "web" {
iam_instance_profile = aws_iam_instance_profile.web.name
metadata_options {
http_tokens = "required" # IMDSv2
}
root_block_device {
encrypted = true
}
}❌ Single Abstraction Layer
hcl
# BAD: Trying to abstract everything
module "compute" {
source = "./modules/universal-compute"
cloud = "aws" # or "gcp" or "azure"
size = "medium"
# Lost: 90% of cloud-specific capabilities
}
# GOOD: Cloud-specific modules with common patterns
module "aws_compute" {
source = "./modules/aws/compute"
# Full AWS capabilities
}
module "gcp_compute" {
source = "./modules/gcp/compute"
# Full GCP capabilities
}❌ Shared State Across Clouds
hcl
# BAD: Single state for multi-cloud
terraform {
backend "s3" {
bucket = "state"
key = "all-clouds/terraform.tfstate" # AWS + GCP in one state
}
}
# GOOD: Separate state per cloud
# aws/backend.tf
terraform {
backend "s3" {
bucket = "aws-state"
key = "prod/terraform.tfstate"
}
}
# gcp/backend.tf
terraform {
backend "gcs" {
bucket = "gcp-state"
prefix = "prod"
}
}Best Practices
1. Start Single-Cloud
Phase 1: Master one cloud
Phase 2: Add second cloud for specific use case
Phase 3: Optimize cross-cloud operations2. Separate State per Cloud
hcl
# Each cloud has its own state
# Cross-cloud resources use remote state data sources
data "terraform_remote_state" "aws_vpc" {
backend = "s3"
config = {
bucket = "aws-state"
key = "prod/vpc/terraform.tfstate"
}
}
data "terraform_remote_state" "gcp_vpc" {
backend = "gcs"
config = {
bucket = "gcp-state"
prefix = "prod/vpc"
}
}3. Common Tagging Strategy
hcl
locals {
common_tags = {
Environment = var.environment
Project = var.project
ManagedBy = "terraform"
CostCenter = var.cost_center
}
}
# AWS
resource "aws_instance" "web" {
tags = local.common_tags
}
# GCP
resource "google_compute_instance" "web" {
labels = local.common_tags
}4. Unified Monitoring
hcl
# Send metrics to single observability platform
# (Datadog, Grafana Cloud, etc.)
# AWS CloudWatch to Datadog
resource "aws_cloudwatch_metric_stream" "datadog" {
name = "datadog-stream"
role_arn = aws_iam_role.metric_stream.arn
firehose_arn = aws_kinesis_firehose_delivery_stream.datadog.arn
output_format = "opentelemetry0.7"
}
# GCP to Datadog
resource "google_monitoring_notification_channel" "datadog" {
type = "webhook_tokenauth"
labels = {
url = "https://app.datadoghq.com/intake/webhook/..."
}
}Best Practices Checklist
- [ ] Business justification documented
- [ ] Separate state per cloud
- [ ] Cloud-specific modules (not LCD)
- [ ] Common tagging strategy
- [ ] Unified monitoring
- [ ] Cross-cloud networking secure
- [ ] Team trained on both clouds
- [ ] Runbooks per cloud
⚖️ Trade-offs
Trade-off 1: Multi-Cloud vs Single-Cloud
| Khía cạnh | Single Cloud | Multi-Cloud |
|---|---|---|
| Complexity | Thấp | Cao |
| Cost | Optimized | Higher (egress) |
| Team skills | Focused | Broad |
| Vendor lock-in | Cao | Thấp |
| Best-of-breed | Limited | Possible |
Khuyến nghị: Single cloud trừ khi có compelling business reason.
Trade-off 2: Abstraction Level
| Level | Portability | Feature Access |
|---|---|---|
| Universal abstraction | Cao | Thấp (LCD) |
| Common interface | Trung bình | Trung bình |
| Cloud-specific | Thấp | Cao |
Trade-off 3: Cross-Cloud Networking
| Method | Cost | Latency | Complexity |
|---|---|---|---|
| VPN | Low | Medium | Low |
| Interconnect | High | Low | High |
| Public (encrypted) | Low | Variable | Low |
🚨 Failure Modes
Failure Mode 1: Cross-Cloud Networking Failure
🔥 Incident thực tế
VPN tunnel giữa AWS và GCP flap. Data sync bị interrupt. 6 giờ để debug vì team không có visibility cả 2 sides.
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Replication lag | Redundant tunnels |
| App errors | Cross-cloud monitoring |
| Alerts from both sides | Unified alerting |
Failure Mode 2: LCD Abstraction Limits
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Missing features | Cloud-specific modules |
| Performance issues | Don't abstract everything |
| Security gaps | Use native features |
Failure Mode 3: State Confusion
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Wrong resources modified | Separate state |
| Cross-cloud dependencies | Remote state data |
| Apply confusion | Clear directory structure |
🔐 Security Baseline
Multi-Cloud Security Requirements
| Requirement | Implementation | Verification |
|---|---|---|
| Separate state per cloud | Different backends | Config review |
| Cross-cloud encrypted | VPN/TLS | Network audit |
| OIDC per cloud | GitHub Actions | Auth config |
| Unified audit | Central logging | Log review |
Cross-Cloud Security Checklist
| Item | Status |
|---|---|
| Separate state buckets | ☑ Required |
| VPN/Interconnect encrypted | ☑ Required |
| OIDC for both clouds | ☑ Required |
| Unified monitoring | ☑ Required |
| Cross-cloud IAM audit | ☑ Required |
📊 Ops Readiness
Metrics cần Monitoring
| Metric | Source | Alert Threshold |
|---|---|---|
| Cross-cloud latency | VPN/Interconnect | > 50ms |
| Data sync lag | Application | > 5 min |
| VPN tunnel status | Both clouds | Any down |
| Egress cost | Billing | > budget |
Runbook Entry Points
| Tình huống | Runbook |
|---|---|
| VPN tunnel down | runbook/cross-cloud-vpn.md |
| Data sync failed | runbook/cross-cloud-sync.md |
| Provider outage | runbook/cloud-failover.md |
| Egress spike | runbook/egress-investigation.md |
✅ Design Review Checklist
Business
- [ ] Multi-cloud justified
- [ ] Cost analysis done
- [ ] Team capacity assessed
- [ ] Single cloud considered
Architecture
- [ ] Separate state per cloud
- [ ] Cloud-specific modules
- [ ] Cross-cloud networking designed
- [ ] Failover tested
Operations
- [ ] Unified monitoring
- [ ] Alerting both clouds
- [ ] Runbooks per cloud
- [ ] Team cross-trained
Security
- [ ] OIDC per cloud
- [ ] Cross-cloud encrypted
- [ ] Audit logging unified
- [ ] Access control per cloud
📎 Liên kết
- 📎 Module Design - Reusable module patterns
- 📎 Environments Strategy - Multi-env management
- 📎 AWS Networking - AWS VPC patterns
- 📎 GCP Networking - GCP VPC patterns
- 📎 Terraform Security - Secure multi-cloud
- 📎 State Management - State per cloud