Skip to content

🌐 VPC & Networking

Level: Core Solves: Thiết kế network architecture cho enterprise workloads với security, scalability, và hybrid connectivity

🎯 Mục tiêu (Outcomes)

Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:

  • Thiết kế VPC Architecture với CIDR planning cho 3-5 năm expansion
  • Triển khai Network Segmentation theo tiêu chuẩn defense-in-depth
  • Xây dựng Private Connectivity với VPC Endpoints và PrivateLink
  • Thiết lập Hybrid Connectivity với Direct Connect và/hoặc VPN
  • Triển khai Transit Gateway cho multi-VPC và multi-account architecture
  • Cấu hình VPC Flow Logs cho network visibility và troubleshooting

Khi nào dùng

Kiến trúcUse CaseLý do
Multi-tier VPCMọi production workloadsIsolation giữa public/private/data tiers
Transit GatewayTừ 3 VPCs trở lênScalable hub-spoke, route table segmentation
VPC EndpointsWorkloads cần AWS servicesGiảm latency, tăng security, giảm NAT cost
Direct ConnectStable hybrid connectivity > 100MbpsConsistent latency, lower data transfer costs
VPNDR backup hoặc low-bandwidth hybridNhanh setup, encryption built-in

Khi nào KHÔNG dùng

PatternVấn đềThay thế
VPC Peering cho > 10 VPCsO(n²) connections, không transitiveTransit Gateway
Public subnets cho databasesExpose attack surfacePrivate subnets + VPC Endpoints
NAT Gateway cho AWS servicesUnnecessary cost, latencyVPC Endpoints (Gateway/Interface)
Single AZ deploymentSingle point of failureMulti-AZ với subnets mỗi AZ
/16 cho mọi VPCCIDR exhaustion, blocking peeringRight-size CIDR theo actual needs

⚠️ Cảnh báo từ Raizo

"Tôi đã thấy team dùng overlapping CIDR ranges cho Dev và Prod VPCs. 6 tháng sau khi cần VPC Peering để migrate data, họ phảt hiện không thể peer. Phải re-IP toàn bộ Dev environment - 3 tuần downtime. CIDR planning lúc đầu là critical."

VPC Design Principles

Enterprise VPC Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    ENTERPRISE VPC DESIGN                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Region: us-east-1                                              │
│  VPC CIDR: 10.0.0.0/16 (65,536 IPs)                            │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    Availability Zone A                   │    │
│  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐        │    │
│  │  │   Public    │ │   Private   │ │  Database   │        │    │
│  │  │  Subnet     │ │   Subnet    │ │   Subnet    │        │    │
│  │  │ 10.0.1.0/24 │ │ 10.0.11.0/24│ │ 10.0.21.0/24│        │    │
│  │  │             │ │             │ │             │        │    │
│  │  │ • NAT GW    │ │ • App Tier  │ │ • RDS       │        │    │
│  │  │ • ALB       │ │ • ECS/EKS   │ │ • ElastiCache│       │    │
│  │  └─────────────┘ └─────────────┘ └─────────────┘        │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    Availability Zone B                   │    │
│  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐        │    │
│  │  │   Public    │ │   Private   │ │  Database   │        │    │
│  │  │  Subnet     │ │   Subnet    │ │   Subnet    │        │    │
│  │  │ 10.0.2.0/24 │ │ 10.0.12.0/24│ │ 10.0.22.0/24│        │    │
│  │  └─────────────┘ └─────────────┘ └─────────────┘        │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Subnet Sizing Guidelines

Subnet TypeRecommended SizeRationale
Public/24 (256 IPs)NAT GW, ALB, Bastion - limited resources
Private/20 (4,096 IPs)Application tier - room for scaling
Database/24 (256 IPs)Managed services - predictable count
Reserved/22 per AZFuture expansion

⚠️ CIDR Planning

Luôn plan CIDR ranges với tầm nhìn 3-5 năm. VPC CIDR không thể thay đổi sau khi tạo (chỉ có thể add secondary CIDRs). Overlapping CIDRs sẽ block VPC peering và Transit Gateway.

Network Segmentation

Security Groups vs NACLs

┌─────────────────────────────────────────────────────────────────┐
│              SECURITY GROUPS vs NACLs                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SECURITY GROUPS (Stateful)          NACLs (Stateless)          │
│  ─────────────────────────           ──────────────────         │
│  • Instance level                    • Subnet level             │
│  • Allow rules only                  • Allow + Deny rules       │
│  • Return traffic auto-allowed       • Must allow return traffic│
│  • Evaluated as group                • Evaluated in order       │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                      SUBNET                              │    │
│  │  ┌─────────────────────────────────────────────────┐    │    │
│  │  │                    NACL                          │    │    │
│  │  │  ┌─────────────────────────────────────────┐    │    │    │
│  │  │  │           Security Group                │    │    │    │
│  │  │  │  ┌─────────────────────────────────┐    │    │    │    │
│  │  │  │  │         EC2 Instance            │    │    │    │    │
│  │  │  │  └─────────────────────────────────┘    │    │    │    │
│  │  │  └─────────────────────────────────────────┘    │    │    │
│  │  └─────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  RECOMMENDATION: Use Security Groups as primary control,        │
│  NACLs for subnet-level deny rules (e.g., block known bad IPs)  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Security Group Best Practices

json
{
  "SecurityGroupRules": [
    {
      "Description": "Allow HTTPS from ALB",
      "IpProtocol": "tcp",
      "FromPort": 443,
      "ToPort": 443,
      "SourceSecurityGroupId": "sg-alb-12345"
    },
    {
      "Description": "Allow health check from ALB",
      "IpProtocol": "tcp",
      "FromPort": 8080,
      "ToPort": 8080,
      "SourceSecurityGroupId": "sg-alb-12345"
    }
  ]
}

💡 Security Group Chaining

Reference Security Groups thay vì CIDR ranges khi có thể. Điều này tạo dynamic rules tự động update khi instances thay đổi.

Private Connectivity

VPC Endpoints

┌─────────────────────────────────────────────────────────────────┐
│                    VPC ENDPOINTS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GATEWAY ENDPOINTS (Free)           INTERFACE ENDPOINTS (Paid)  │
│  ─────────────────────────          ────────────────────────    │
│  • S3                               • Most AWS services         │
│  • DynamoDB                         • Creates ENI in subnet     │
│  • Route table entry                • Private DNS               │
│  • No data processing charge        • $0.01/hour + data         │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                        VPC                               │    │
│  │                                                          │    │
│  │  ┌──────────┐    Gateway      ┌──────────────────────┐  │    │
│  │  │ Instance │───Endpoint─────►│        S3            │  │    │
│  │  └──────────┘                 └──────────────────────┘  │    │
│  │       │                                                  │    │
│  │       │         Interface     ┌──────────────────────┐  │    │
│  │       └────────Endpoint──────►│    Secrets Manager   │  │    │
│  │                  (ENI)        └──────────────────────┘  │    │
│  │                                                          │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Traffic stays within AWS network - no Internet Gateway needed  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Hybrid Connectivity

Direct Connect vs VPN

AspectDirect ConnectSite-to-Site VPN
Bandwidth1-100 GbpsUp to 1.25 Gbps
LatencyConsistent, lowVariable
Setup TimeWeeks-monthsMinutes
CostHigher (port + data)Lower
EncryptionOptional (MACsec)Always (IPsec)
RedundancyRequires 2nd connectionBuilt-in

Transit Gateway Architecture

┌─────────────────────────────────────────────────────────────────┐
│                 TRANSIT GATEWAY HUB-SPOKE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                    ┌─────────────────┐                          │
│                    │ Transit Gateway │                          │
│                    └────────┬────────┘                          │
│                             │                                   │
│     ┌───────────────────────┼───────────────────────┐           │
│     │           │           │           │           │           │
│     ▼           ▼           ▼           ▼           ▼           │
│ ┌───────┐  ┌───────┐  ┌───────┐  ┌───────┐  ┌───────────┐      │
│ │VPC Dev│  │VPC Stg│  │VPC Prd│  │VPC Sec│  │On-Premises│      │
│ └───────┘  └───────┘  └───────┘  └───────┘  └───────────┘      │
│                                                                 │
│  Route Tables:                                                  │
│  • Dev/Stg → can reach Shared Services                          │
│  • Prod → isolated, only specific routes                        │
│  • Security → can reach all for monitoring                      │
│  • On-Prem → controlled access via route tables                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Transit Gateway Route Tables

┌─────────────────────────────────────────────────────────────────┐
│              TGW ROUTE TABLE SEGMENTATION                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Production Route Table:                                        │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Destination      │ Target          │ Notes              │    │
│  ├─────────────────────────────────────────────────────────┤    │
│  │ 10.1.0.0/16      │ VPC-Prod        │ Production VPC     │    │
│  │ 10.100.0.0/16    │ VPC-Shared      │ Shared Services    │    │
│  │ 192.168.0.0/16   │ VPN-OnPrem      │ On-premises        │    │
│  │ 0.0.0.0/0        │ Blackhole       │ Block internet     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Development Route Table:                                       │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Destination      │ Target          │ Notes              │    │
│  ├─────────────────────────────────────────────────────────┤    │
│  │ 10.2.0.0/16      │ VPC-Dev         │ Development VPC    │    │
│  │ 10.100.0.0/16    │ VPC-Shared      │ Shared Services    │    │
│  │ 0.0.0.0/0        │ NAT-Gateway     │ Internet access    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

VPC Flow Logs

Log Format

version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status

Example:
2 123456789012 eni-abc123 10.0.1.5 10.0.2.10 443 49152 6 25 5000 1620140761 1620140821 ACCEPT OK

Analysis Queries (Athena)

sql
-- Top talkers by bytes
SELECT srcaddr, dstaddr, SUM(bytes) as total_bytes
FROM vpc_flow_logs
WHERE action = 'ACCEPT'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 10;

-- Rejected connections (potential security issues)
SELECT srcaddr, dstaddr, dstport, COUNT(*) as reject_count
FROM vpc_flow_logs
WHERE action = 'REJECT'
GROUP BY srcaddr, dstaddr, dstport
ORDER BY reject_count DESC
LIMIT 20;

Best Practices Checklist

  • [ ] Use multiple AZs for high availability
  • [ ] Separate subnets by tier (public/private/database)
  • [ ] Enable VPC Flow Logs to S3 for analysis
  • [ ] Use VPC Endpoints for AWS services
  • [ ] Implement Transit Gateway for multi-VPC
  • [ ] Plan CIDR ranges for future growth
  • [ ] Use Security Groups over NACLs when possible
  • [ ] Enable DNS hostnames and DNS resolution

⚖️ Trade-offs

Trade-off 1: Transit Gateway vs VPC Peering

Khía cạnhTransit GatewayVPC Peering
ScalabilityHub-spoke, dễ scaleO(n²) connections
Cost$0.05/GB + $0.05/hour per attachmentFree (chỉ data transfer)
TransitivityCó, qua TGWKhông, chỉ point-to-point
Route managementCentralized route tablesPer-peering routes

Ngữ cảnh enterprise: Một e-commerce company với 5 VPCs ban đầu dùng VPC Peering (10 connections). Khi scale lên 15 VPCs, họ cần 105 peering connections. Chi phí management trở nên unsustainable. Migrate sang Transit Gateway mất 2 tuần nhưng giảm operational overhead 80%.

Khuyến nghị:

  • ≤ 5 VPCs: VPC Peering có thể đủ
  • > 5 VPCs hoặc cần transitive routing: Transit Gateway
  • Hybrid connectivity: Transit Gateway (attach VPN/Direct Connect)

Trade-off 2: NAT Gateway vs VPC Endpoints

Khía cạnhNAT GatewayVPC Endpoints
Use caseInternet access for private instancesAWS service access
Cost$0.045/hour + $0.045/GBGateway: Free, Interface: $0.01/hour + $0.01/GB
SecurityTraffic qua InternetTraffic trong AWS network
PerformanceQua NAT GWDirect tới service

Ví dụ tính toán:

S3 data transfer: 10 TB/tháng

NAT Gateway:
- Processing: 10,000 GB × $0.045 = $450
- NAT GW hours: 720h × $0.045 = $32.40
- Total: ~$482/tháng

S3 Gateway Endpoint:
- Cost: $0 (free)
- Savings: $482/tháng = $5,784/năm

Trade-off 3: Security Groups vs NACLs

Khía cạnhSecurity GroupsNACLs
StatefulCó (return traffic auto-allowed)Không (phải explicit allow)
Rule typeChỉ AllowAllow + Deny
LevelInstance/ENISubnet
EvaluationTất cả rulesTheo order

Khuyến nghị:

  • Primary control: Security Groups (stateful, dễ manage)
  • Secondary control: NACLs cho subnet-level deny (block known malicious IPs)

🚨 Failure Modes

Failure Mode 1: CIDR Exhaustion

┌─────────────────────────────────────────────────────────────────┐
│                    CIDR EXHAUSTION TIMELINE                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Year 1:              Year 2:              Year 3:              │
│  VPC: 10.0.0.0/24    VPC: 10.0.0.0/24     VPC: 10.0.0.0/24      │
│  Used: 50 IPs        Used: 200 IPs        Used: 250 IPs         │
│  Free: 200 IPs       Free: 50 IPs         Free: 0 IPs ❌        │
│                                                                 │
│  Impact: Không thể add instances, Lambda ENIs, ELBs            │
│  Fix: Migrate sang VPC mới với larger CIDR                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Cách phát hiệnCách phòng tránh
CloudWatch metric: AvailableIPAddressesPlan VPC CIDR /16 hoặc /20, không /24
Alert khi < 20% IPs availableReserve secondary CIDRs cho expansion
Monthly capacity reviewDocument CIDR allocation policy

Failure Mode 2: DNS Resolution Failures

⚠️ Incident thực tế

Application trong VPC không thể resolve S3 bucket DNS sau khi enable VPC endpoint. Root cause: enableDnsHostnames bị disable trên VPC. Team mất 4 giờ debug "connection timeout" mà không nghĩ đến DNS.

Cách phát hiệnCách phòng tránh
Test DNS resolution sau mỗi VPC changeChecklist: enableDnsHostnames + enableDnsSupport = true
Monitor Route 53 Resolver query logsVPC Endpoint private DNS verification
Application DNS resolution metricsInfrastructure tests trong CI/CD

Failure Mode 3: Transit Gateway Route Conflicts

Cách phát hiệnCách phòng tránh
Asymmetric routing symptomsDocument route table associations
Blackhole routes trong TGWAutomated route validation
VPC Flow Logs với unexpected dropsTest routing changes trong sandbox TGW

Debug command:

bash
# Liệt kê tất cả TGW route tables
aws ec2 describe-transit-gateway-route-tables \
  --transit-gateway-id tgw-xxx

# Check routes trong specific table
aws ec2 search-transit-gateway-routes \
  --transit-gateway-route-table-id tgw-rtb-xxx \
  --filters "Name=type,Values=static,propagated"

🔐 Security Baseline

Network Security Requirements

RequirementImplementationVerification
No public databasesDatabase subnets không có route to IGWConfig Rule: rds-instance-public-access-check
Encrypted transitTLS everywhere, VPN/DX encryptionVPC Flow Logs analysis
Egress controlNAT Gateway + Security GroupsOutbound traffic audit
Private AWS accessVPC Endpoints cho tất cả AWS servicesEndpoint policy review

Network Segmentation Matrix

┌─────────────────────────────────────────────────────────────────┐
│               NETWORK SEGMENTATION MATRIX                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  From \ To      │ Public  │ Private │ Database │ Management     │
│  ───────────────────────────────────────────────────────────     │
│  Public        │ -       │ ✅ ALB  │ ❌        │ ❌             │
│  Private       │ ❌ NAT   │ ✅ SG   │ ✅ DB SG  │ ✅ Mgmt SG     │
│  Database      │ ❌       │ ❌      │ -        │ ❌             │
│  Management    │ ❌       │ ✅ SSH  │ ✅ Admin  │ -              │
│                                                                 │
│  ✅ = Allowed với specific Security Group                        │
│  ❌ = Blocked by default                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

VPC Endpoint Policy Example

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictToOrgBuckets",
      "Effect": "Allow",
      "Principal": "*",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::company-*/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-xxxxxxxxxx"
        }
      }
    }
  ]
}

📊 Ops Readiness

Metrics cần Monitoring

MetricNguồnAlert Threshold
Available IP addressesVPC API< 20% per subnet
NAT Gateway bandwidthCloudWatch> 80% capacity
TGW bytesCloudWatchSpike > 2x baseline
VPC Flow Logs rejectedAthena analysis> 100/min unexpected
DNS query failuresRoute 53 Resolver> 0 for critical services

Alerting Configuration

json
{
  "AlarmName": "LowSubnetIPAvailability",
  "MetricName": "AvailableIPAddressCount",
  "Namespace": "AWS/EC2",
  "Dimensions": [
    {"Name": "SubnetId", "Value": "subnet-xxx"}
  ],
  "Threshold": 50,
  "ComparisonOperator": "LessThanThreshold",
  "EvaluationPeriods": 1,
  "AlarmActions": ["arn:aws:sns:region:account:network-alerts"]
}

Runbook Entry Points

Tình huốngRunbook
CIDR exhaustion alertrunbook/cidr-expansion.md
NAT Gateway failurerunbook/nat-gateway-failover.md
Transit Gateway routing issuerunbook/tgw-troubleshooting.md
VPC Endpoint connectivityrunbook/vpc-endpoint-debug.md
DNS resolution failurerunbook/dns-troubleshooting.md
Security Group audit requestrunbook/sg-audit-process.md

Design Review Checklist

Kiến trúc VPC

  • [ ] CIDR ranges không overlap với bất kỳ VPC nào khác
  • [ ] Secondary CIDRs reserved cho future expansion
  • [ ] Multi-AZ subnets cho tất cả tiers
  • [ ] Subnet sizing phù hợp với expected growth

Security

  • [ ] No public subnets cho databases/backend services
  • [ ] VPC Endpoints cho tất cả AWS services đang dùng
  • [ ] Security Group chaining thay vì CIDR ranges
  • [ ] VPC Flow Logs enabled và analyzed

Connectivity

  • [ ] Transit Gateway cho multi-VPC (nếu > 3 VPCs)
  • [ ] Route table segmentation đúng theo security requirements
  • [ ] DNS resolution configured đúng
  • [ ] Hybrid connectivity (DX/VPN) redundancy

Operations

  • [ ] IP address monitoring configured
  • [ ] NAT Gateway bandwidth alerting
  • [ ] Network documentation up-to-date
  • [ ] Runbooks cho common network issues

📎 Liên kết