Skip to content

🏗️ Multi-Cloud Patterns

Level: Advanced Solves: Thiết kế và triển khai infrastructure across multiple cloud providers với Terraform

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Đánh giá Multi-Cloud requirements và trade-offs
  • Thiết kế Directory Structure cho multi-cloud
  • Implement Abstraction Patterns phù hợp
  • Cấu hình Cross-Cloud Networking (VPN, Interconnect)
  • Tránh Anti-patterns thường gặp
  • Thiết lập Unified Monitoring across clouds

Khi nào dùng

PatternUse CaseLý do
Best-of-breedBigQuery + LambdaTận dụng strengths
DR cross-cloudRegulatoryTrue resilience
M&AInherited infraBusiness requirement
Data residencyRegional complianceLegal requirement

Khi nào KHÔNG dùng

PatternVấn đềThay thế
"Avoid lock-in" vô căn cứComplexity ko xứngSingle cloud
Resume-drivenNo business valueSingle cloud
Universal abstractionLose featuresCloud-specific modules
Shared state all cloudsBlast radiusSeparate state

⚠️ Cảnh báo từ Raizo

"Team quyết định multi-cloud 'to avoid lock-in'. 2x ops cost, 3x complexity. Sau 1 năm, 95% workload vẫn ở AWS. Multi-cloud chỉ khi có business justification rõ ràng."

Tại sao Multi-Cloud?

💡 Giáo sư Tom

Multi-cloud không phải silver bullet. Nó thêm complexity, cost, và operational overhead. Chỉ adopt khi có business justification rõ ràng: vendor lock-in avoidance, regulatory requirements, hoặc best-of-breed services.

Multi-Cloud Motivations

┌─────────────────────────────────────────────────────────────────┐
│                    MULTI-CLOUD MOTIVATIONS                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  VALID REASONS:                                                 │
│  ✅ Regulatory requirements (data residency)                    │
│  ✅ M&A - acquired company uses different cloud                 │
│  ✅ Best-of-breed services (GCP BigQuery + AWS Lambda)          │
│  ✅ Disaster recovery across providers                          │
│  ✅ Negotiating leverage with vendors                           │
│                                                                 │
│  INVALID REASONS:                                               │
│  ❌ "Avoid vendor lock-in" without specific requirements        │
│  ❌ "Everyone is doing it"                                      │
│  ❌ Resume-driven development                                   │
│  ❌ Premature optimization                                      │
│                                                                 │
│  COSTS:                                                         │
│  • 2x+ operational complexity                                   │
│  • Team needs expertise in multiple clouds                      │
│  • Cross-cloud networking is expensive                          │
│  • Lowest common denominator abstractions                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Multi-Cloud Architecture Patterns

Pattern 1: Cloud-Specific Workloads

┌─────────────────────────────────────────────────────────────────┐
│                    CLOUD-SPECIFIC WORKLOADS                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────┐             │
│  │        AWS          │    │        GCP          │             │
│  │                     │    │                     │             │
│  │  • Web Application  │    │  • Data Analytics   │             │
│  │  • API Gateway      │    │  • BigQuery         │             │
│  │  • Lambda Functions │    │  • ML Training      │             │
│  │  • RDS Database     │    │  • Vertex AI        │             │
│  │                     │    │                     │             │
│  └──────────┬──────────┘    └──────────┬──────────┘             │
│             │                          │                        │
│             └──────────┬───────────────┘                        │
│                        │                                        │
│             ┌──────────▼──────────┐                             │
│             │   Cross-Cloud       │                             │
│             │   Data Transfer     │                             │
│             │   (VPN/Interconnect)│                             │
│             └─────────────────────┘                             │
│                                                                 │
│  USE CASE: Best-of-breed services                               │
│  TERRAFORM: Separate state per cloud, shared modules            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pattern 2: Active-Passive DR

┌─────────────────────────────────────────────────────────────────┐
│                    ACTIVE-PASSIVE DR                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────┐             │
│  │   AWS (Primary)     │    │   GCP (DR)          │             │
│  │                     │    │                     │             │
│  │  ┌───────────────┐  │    │  ┌───────────────┐  │             │
│  │  │  Application  │  │    │  │  Application  │  │             │
│  │  │  (Active)     │  │    │  │  (Standby)    │  │             │
│  │  └───────────────┘  │    │  └───────────────┘  │             │
│  │         │           │    │         │           │             │
│  │  ┌──────▼────────┐  │    │  ┌──────▼────────┐  │             │
│  │  │   Database    │──┼────┼──│   Database    │  │             │
│  │  │   (Primary)   │  │    │  │   (Replica)   │  │             │
│  │  └───────────────┘  │    │  └───────────────┘  │             │
│  │                     │    │                     │             │
│  └─────────────────────┘    └─────────────────────┘             │
│                                                                 │
│  USE CASE: Disaster recovery, regulatory compliance             │
│  TERRAFORM: Same modules, different provider configurations     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Terraform Multi-Cloud Structure

Directory Organization

infrastructure/
├── modules/
│   ├── common/                 # Cloud-agnostic abstractions
│   │   ├── network/
│   │   └── compute/
│   ├── aws/                    # AWS-specific modules
│   │   ├── vpc/
│   │   ├── eks/
│   │   └── rds/
│   └── gcp/                    # GCP-specific modules
│       ├── vpc/
│       ├── gke/
│       └── cloudsql/
├── environments/
│   ├── aws-prod/
│   │   ├── main.tf
│   │   ├── providers.tf
│   │   └── backend.tf
│   ├── gcp-prod/
│   │   ├── main.tf
│   │   ├── providers.tf
│   │   └── backend.tf
│   └── multi-cloud-prod/       # Cross-cloud resources
│       ├── main.tf
│       ├── providers.tf
│       └── backend.tf
└── global/
    └── dns/                    # Global DNS (Route53/Cloud DNS)

Multiple Providers Configuration

hcl
# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

# AWS Provider
provider "aws" {
  region = "us-east-1"
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = var.project
    }
  }
}

# GCP Provider
provider "google" {
  project = var.gcp_project_id
  region  = "us-central1"
}

# Provider aliases for multi-region
provider "aws" {
  alias  = "eu"
  region = "eu-west-1"
}

provider "google" {
  alias   = "eu"
  project = var.gcp_project_id
  region  = "europe-west1"
}

Abstraction Patterns

Pattern 1: Cloud-Agnostic Interface

hcl
# modules/common/compute/variables.tf
variable "name" {
  description = "Instance name"
  type        = string
}

variable "size" {
  description = "Instance size (small, medium, large)"
  type        = string
  
  validation {
    condition     = contains(["small", "medium", "large"], var.size)
    error_message = "Size must be small, medium, or large."
  }
}

variable "cloud_provider" {
  description = "Cloud provider (aws or gcp)"
  type        = string
}

# modules/common/compute/main.tf
locals {
  # Map abstract sizes to cloud-specific instance types
  aws_instance_types = {
    small  = "t3.micro"
    medium = "t3.small"
    large  = "t3.large"
  }
  
  gcp_machine_types = {
    small  = "e2-micro"
    medium = "e2-small"
    large  = "e2-standard-2"
  }
}

# AWS Instance
resource "aws_instance" "main" {
  count = var.cloud_provider == "aws" ? 1 : 0
  
  ami           = data.aws_ami.amazon_linux[0].id
  instance_type = local.aws_instance_types[var.size]
  
  tags = {
    Name = var.name
  }
}

# GCP Instance
resource "google_compute_instance" "main" {
  count = var.cloud_provider == "gcp" ? 1 : 0
  
  name         = var.name
  machine_type = local.gcp_machine_types[var.size]
  zone         = "${var.region}-a"
  
  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }
  
  network_interface {
    network = "default"
  }
}

Pattern 2: Separate Modules, Common Interface

hcl
# modules/aws/vpc/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

output "public_subnet_ids" {
  value = aws_subnet.public[*].id
}

# modules/gcp/vpc/outputs.tf
output "vpc_id" {
  value = google_compute_network.main.id
}

output "private_subnet_ids" {
  value = google_compute_subnetwork.private[*].id
}

output "public_subnet_ids" {
  value = google_compute_subnetwork.public[*].id
}

# Usage - same interface, different implementations
module "aws_network" {
  source = "../../modules/aws/vpc"
  
  name       = "prod"
  cidr_block = "10.0.0.0/16"
}

module "gcp_network" {
  source = "../../modules/gcp/vpc"
  
  name       = "prod"
  cidr_block = "10.1.0.0/16"
}

Cross-Cloud Networking

VPN Connection

hcl
# AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
  vpc_id = module.aws_vpc.vpc_id
  
  tags = {
    Name = "aws-to-gcp"
  }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = google_compute_address.vpn.address
  type       = "ipsec.1"
  
  tags = {
    Name = "gcp-gateway"
  }
}

resource "aws_vpn_connection" "to_gcp" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.gcp.id
  type                = "ipsec.1"
  static_routes_only  = true
}

# GCP VPN
resource "google_compute_vpn_gateway" "main" {
  name    = "gcp-to-aws"
  network = module.gcp_vpc.vpc_id
}

resource "google_compute_address" "vpn" {
  name = "vpn-static-ip"
}

resource "google_compute_vpn_tunnel" "to_aws" {
  name          = "tunnel-to-aws"
  peer_ip       = aws_vpn_connection.to_gcp.tunnel1_address
  shared_secret = aws_vpn_connection.to_gcp.tunnel1_preshared_key
  
  target_vpn_gateway = google_compute_vpn_gateway.main.id
  
  local_traffic_selector  = ["10.1.0.0/16"]
  remote_traffic_selector = ["10.0.0.0/16"]
}

Cloud Interconnect (Dedicated)

hcl
# For high-bandwidth, low-latency connections
# Requires physical setup with cloud providers

# AWS Direct Connect
resource "aws_dx_connection" "main" {
  name      = "cross-cloud-interconnect"
  bandwidth = "10Gbps"
  location  = "EqDC2"
}

# GCP Dedicated Interconnect
resource "google_compute_interconnect_attachment" "main" {
  name         = "cross-cloud-interconnect"
  router       = google_compute_router.main.id
  type         = "DEDICATED"
  interconnect = "https://www.googleapis.com/compute/v1/projects/${var.project}/global/interconnects/my-interconnect"
}

Anti-Patterns to Avoid

Lowest Common Denominator

hcl
# BAD: Avoiding cloud-specific features
resource "aws_instance" "web" {
  # Not using AWS-specific features like:
  # - Instance profiles
  # - Placement groups
  # - Enhanced networking
  # Just to maintain "portability"
}

# GOOD: Use cloud-specific features when beneficial
resource "aws_instance" "web" {
  iam_instance_profile = aws_iam_instance_profile.web.name
  
  metadata_options {
    http_tokens = "required"  # IMDSv2
  }
  
  root_block_device {
    encrypted = true
  }
}

Single Abstraction Layer

hcl
# BAD: Trying to abstract everything
module "compute" {
  source = "./modules/universal-compute"
  
  cloud    = "aws"  # or "gcp" or "azure"
  size     = "medium"
  # Lost: 90% of cloud-specific capabilities
}

# GOOD: Cloud-specific modules with common patterns
module "aws_compute" {
  source = "./modules/aws/compute"
  # Full AWS capabilities
}

module "gcp_compute" {
  source = "./modules/gcp/compute"
  # Full GCP capabilities
}

Shared State Across Clouds

hcl
# BAD: Single state for multi-cloud
terraform {
  backend "s3" {
    bucket = "state"
    key    = "all-clouds/terraform.tfstate"  # AWS + GCP in one state
  }
}

# GOOD: Separate state per cloud
# aws/backend.tf
terraform {
  backend "s3" {
    bucket = "aws-state"
    key    = "prod/terraform.tfstate"
  }
}

# gcp/backend.tf
terraform {
  backend "gcs" {
    bucket = "gcp-state"
    prefix = "prod"
  }
}

Best Practices

1. Start Single-Cloud

Phase 1: Master one cloud
Phase 2: Add second cloud for specific use case
Phase 3: Optimize cross-cloud operations

2. Separate State per Cloud

hcl
# Each cloud has its own state
# Cross-cloud resources use remote state data sources

data "terraform_remote_state" "aws_vpc" {
  backend = "s3"
  config = {
    bucket = "aws-state"
    key    = "prod/vpc/terraform.tfstate"
  }
}

data "terraform_remote_state" "gcp_vpc" {
  backend = "gcs"
  config = {
    bucket = "gcp-state"
    prefix = "prod/vpc"
  }
}

3. Common Tagging Strategy

hcl
locals {
  common_tags = {
    Environment = var.environment
    Project     = var.project
    ManagedBy   = "terraform"
    CostCenter  = var.cost_center
  }
}

# AWS
resource "aws_instance" "web" {
  tags = local.common_tags
}

# GCP
resource "google_compute_instance" "web" {
  labels = local.common_tags
}

4. Unified Monitoring

hcl
# Send metrics to single observability platform
# (Datadog, Grafana Cloud, etc.)

# AWS CloudWatch to Datadog
resource "aws_cloudwatch_metric_stream" "datadog" {
  name          = "datadog-stream"
  role_arn      = aws_iam_role.metric_stream.arn
  firehose_arn  = aws_kinesis_firehose_delivery_stream.datadog.arn
  output_format = "opentelemetry0.7"
}

# GCP to Datadog
resource "google_monitoring_notification_channel" "datadog" {
  type = "webhook_tokenauth"
  
  labels = {
    url = "https://app.datadoghq.com/intake/webhook/..."
  }
}

Best Practices Checklist

  • [ ] Business justification documented
  • [ ] Separate state per cloud
  • [ ] Cloud-specific modules (not LCD)
  • [ ] Common tagging strategy
  • [ ] Unified monitoring
  • [ ] Cross-cloud networking secure
  • [ ] Team trained on both clouds
  • [ ] Runbooks per cloud

⚖️ Trade-offs

Trade-off 1: Multi-Cloud vs Single-Cloud

Khía cạnhSingle CloudMulti-Cloud
ComplexityThấpCao
CostOptimizedHigher (egress)
Team skillsFocusedBroad
Vendor lock-inCaoThấp
Best-of-breedLimitedPossible

Khuyến nghị: Single cloud trừ khi có compelling business reason.


Trade-off 2: Abstraction Level

LevelPortabilityFeature Access
Universal abstractionCaoThấp (LCD)
Common interfaceTrung bìnhTrung bình
Cloud-specificThấpCao

Trade-off 3: Cross-Cloud Networking

MethodCostLatencyComplexity
VPNLowMediumLow
InterconnectHighLowHigh
Public (encrypted)LowVariableLow

🚨 Failure Modes

Failure Mode 1: Cross-Cloud Networking Failure

🔥 Incident thực tế

VPN tunnel giữa AWS và GCP flap. Data sync bị interrupt. 6 giờ để debug vì team không có visibility cả 2 sides.

Cách phát hiệnCách phòng tránh
Replication lagRedundant tunnels
App errorsCross-cloud monitoring
Alerts from both sidesUnified alerting

Failure Mode 2: LCD Abstraction Limits

Cách phát hiệnCách phòng tránh
Missing featuresCloud-specific modules
Performance issuesDon't abstract everything
Security gapsUse native features

Failure Mode 3: State Confusion

Cách phát hiệnCách phòng tránh
Wrong resources modifiedSeparate state
Cross-cloud dependenciesRemote state data
Apply confusionClear directory structure

🔐 Security Baseline

Multi-Cloud Security Requirements

RequirementImplementationVerification
Separate state per cloudDifferent backendsConfig review
Cross-cloud encryptedVPN/TLSNetwork audit
OIDC per cloudGitHub ActionsAuth config
Unified auditCentral loggingLog review

Cross-Cloud Security Checklist

ItemStatus
Separate state buckets☑ Required
VPN/Interconnect encrypted☑ Required
OIDC for both clouds☑ Required
Unified monitoring☑ Required
Cross-cloud IAM audit☑ Required

📊 Ops Readiness

Metrics cần Monitoring

MetricSourceAlert Threshold
Cross-cloud latencyVPN/Interconnect> 50ms
Data sync lagApplication> 5 min
VPN tunnel statusBoth cloudsAny down
Egress costBilling> budget

Runbook Entry Points

Tình huốngRunbook
VPN tunnel downrunbook/cross-cloud-vpn.md
Data sync failedrunbook/cross-cloud-sync.md
Provider outagerunbook/cloud-failover.md
Egress spikerunbook/egress-investigation.md

Design Review Checklist

Business

  • [ ] Multi-cloud justified
  • [ ] Cost analysis done
  • [ ] Team capacity assessed
  • [ ] Single cloud considered

Architecture

  • [ ] Separate state per cloud
  • [ ] Cloud-specific modules
  • [ ] Cross-cloud networking designed
  • [ ] Failover tested

Operations

  • [ ] Unified monitoring
  • [ ] Alerting both clouds
  • [ ] Runbooks per cloud
  • [ ] Team cross-trained

Security

  • [ ] OIDC per cloud
  • [ ] Cross-cloud encrypted
  • [ ] Audit logging unified
  • [ ] Access control per cloud

📎 Liên kết