Giao diện
🌐 VPC & Networking
Level: Core Solves: Thiết kế network architecture cho enterprise workloads với security, scalability, và hybrid connectivity
🎯 Mục tiêu (Outcomes)
Sau khi áp dụng kiến thức trong trang này, bạn sẽ có khả năng:
- Thiết kế VPC Architecture với CIDR planning cho 3-5 năm expansion
- Triển khai Network Segmentation theo tiêu chuẩn defense-in-depth
- Xây dựng Private Connectivity với VPC Endpoints và PrivateLink
- Thiết lập Hybrid Connectivity với Direct Connect và/hoặc VPN
- Triển khai Transit Gateway cho multi-VPC và multi-account architecture
- Cấu hình VPC Flow Logs cho network visibility và troubleshooting
✅ Khi nào dùng
| Kiến trúc | Use Case | Lý do |
|---|---|---|
| Multi-tier VPC | Mọi production workloads | Isolation giữa public/private/data tiers |
| Transit Gateway | Từ 3 VPCs trở lên | Scalable hub-spoke, route table segmentation |
| VPC Endpoints | Workloads cần AWS services | Giảm latency, tăng security, giảm NAT cost |
| Direct Connect | Stable hybrid connectivity > 100Mbps | Consistent latency, lower data transfer costs |
| VPN | DR backup hoặc low-bandwidth hybrid | Nhanh setup, encryption built-in |
❌ Khi nào KHÔNG dùng
| Pattern | Vấn đề | Thay thế |
|---|---|---|
| VPC Peering cho > 10 VPCs | O(n²) connections, không transitive | Transit Gateway |
| Public subnets cho databases | Expose attack surface | Private subnets + VPC Endpoints |
| NAT Gateway cho AWS services | Unnecessary cost, latency | VPC Endpoints (Gateway/Interface) |
| Single AZ deployment | Single point of failure | Multi-AZ với subnets mỗi AZ |
| /16 cho mọi VPC | CIDR exhaustion, blocking peering | Right-size CIDR theo actual needs |
⚠️ Cảnh báo từ Raizo
"Tôi đã thấy team dùng overlapping CIDR ranges cho Dev và Prod VPCs. 6 tháng sau khi cần VPC Peering để migrate data, họ phảt hiện không thể peer. Phải re-IP toàn bộ Dev environment - 3 tuần downtime. CIDR planning lúc đầu là critical."
VPC Design Principles
Enterprise VPC Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ENTERPRISE VPC DESIGN │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Region: us-east-1 │
│ VPC CIDR: 10.0.0.0/16 (65,536 IPs) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Availability Zone A │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Public │ │ Private │ │ Database │ │ │
│ │ │ Subnet │ │ Subnet │ │ Subnet │ │ │
│ │ │ 10.0.1.0/24 │ │ 10.0.11.0/24│ │ 10.0.21.0/24│ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • NAT GW │ │ • App Tier │ │ • RDS │ │ │
│ │ │ • ALB │ │ • ECS/EKS │ │ • ElastiCache│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Availability Zone B │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Public │ │ Private │ │ Database │ │ │
│ │ │ Subnet │ │ Subnet │ │ Subnet │ │ │
│ │ │ 10.0.2.0/24 │ │ 10.0.12.0/24│ │ 10.0.22.0/24│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Subnet Sizing Guidelines
| Subnet Type | Recommended Size | Rationale |
|---|---|---|
| Public | /24 (256 IPs) | NAT GW, ALB, Bastion - limited resources |
| Private | /20 (4,096 IPs) | Application tier - room for scaling |
| Database | /24 (256 IPs) | Managed services - predictable count |
| Reserved | /22 per AZ | Future expansion |
⚠️ CIDR Planning
Luôn plan CIDR ranges với tầm nhìn 3-5 năm. VPC CIDR không thể thay đổi sau khi tạo (chỉ có thể add secondary CIDRs). Overlapping CIDRs sẽ block VPC peering và Transit Gateway.
Network Segmentation
Security Groups vs NACLs
┌─────────────────────────────────────────────────────────────────┐
│ SECURITY GROUPS vs NACLs │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SECURITY GROUPS (Stateful) NACLs (Stateless) │
│ ───────────────────────── ────────────────── │
│ • Instance level • Subnet level │
│ • Allow rules only • Allow + Deny rules │
│ • Return traffic auto-allowed • Must allow return traffic│
│ • Evaluated as group • Evaluated in order │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ SUBNET │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ NACL │ │ │
│ │ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ │ Security Group │ │ │ │
│ │ │ │ ┌─────────────────────────────────┐ │ │ │ │
│ │ │ │ │ EC2 Instance │ │ │ │ │
│ │ │ │ └─────────────────────────────────┘ │ │ │ │
│ │ │ └─────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ RECOMMENDATION: Use Security Groups as primary control, │
│ NACLs for subnet-level deny rules (e.g., block known bad IPs) │
│ │
└─────────────────────────────────────────────────────────────────┘Security Group Best Practices
json
{
"SecurityGroupRules": [
{
"Description": "Allow HTTPS from ALB",
"IpProtocol": "tcp",
"FromPort": 443,
"ToPort": 443,
"SourceSecurityGroupId": "sg-alb-12345"
},
{
"Description": "Allow health check from ALB",
"IpProtocol": "tcp",
"FromPort": 8080,
"ToPort": 8080,
"SourceSecurityGroupId": "sg-alb-12345"
}
]
}💡 Security Group Chaining
Reference Security Groups thay vì CIDR ranges khi có thể. Điều này tạo dynamic rules tự động update khi instances thay đổi.
Private Connectivity
VPC Endpoints
┌─────────────────────────────────────────────────────────────────┐
│ VPC ENDPOINTS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ GATEWAY ENDPOINTS (Free) INTERFACE ENDPOINTS (Paid) │
│ ───────────────────────── ──────────────────────── │
│ • S3 • Most AWS services │
│ • DynamoDB • Creates ENI in subnet │
│ • Route table entry • Private DNS │
│ • No data processing charge • $0.01/hour + data │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ VPC │ │
│ │ │ │
│ │ ┌──────────┐ Gateway ┌──────────────────────┐ │ │
│ │ │ Instance │───Endpoint─────►│ S3 │ │ │
│ │ └──────────┘ └──────────────────────┘ │ │
│ │ │ │ │
│ │ │ Interface ┌──────────────────────┐ │ │
│ │ └────────Endpoint──────►│ Secrets Manager │ │ │
│ │ (ENI) └──────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Traffic stays within AWS network - no Internet Gateway needed │
│ │
└─────────────────────────────────────────────────────────────────┘PrivateLink for Services
Hybrid Connectivity
Direct Connect vs VPN
| Aspect | Direct Connect | Site-to-Site VPN |
|---|---|---|
| Bandwidth | 1-100 Gbps | Up to 1.25 Gbps |
| Latency | Consistent, low | Variable |
| Setup Time | Weeks-months | Minutes |
| Cost | Higher (port + data) | Lower |
| Encryption | Optional (MACsec) | Always (IPsec) |
| Redundancy | Requires 2nd connection | Built-in |
Transit Gateway Architecture
┌─────────────────────────────────────────────────────────────────┐
│ TRANSIT GATEWAY HUB-SPOKE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ │ Transit Gateway │ │
│ └────────┬────────┘ │
│ │ │
│ ┌───────────────────────┼───────────────────────┐ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────────┐ │
│ │VPC Dev│ │VPC Stg│ │VPC Prd│ │VPC Sec│ │On-Premises│ │
│ └───────┘ └───────┘ └───────┘ └───────┘ └───────────┘ │
│ │
│ Route Tables: │
│ • Dev/Stg → can reach Shared Services │
│ • Prod → isolated, only specific routes │
│ • Security → can reach all for monitoring │
│ • On-Prem → controlled access via route tables │
│ │
└─────────────────────────────────────────────────────────────────┘Transit Gateway Route Tables
┌─────────────────────────────────────────────────────────────────┐
│ TGW ROUTE TABLE SEGMENTATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Production Route Table: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Destination │ Target │ Notes │ │
│ ├─────────────────────────────────────────────────────────┤ │
│ │ 10.1.0.0/16 │ VPC-Prod │ Production VPC │ │
│ │ 10.100.0.0/16 │ VPC-Shared │ Shared Services │ │
│ │ 192.168.0.0/16 │ VPN-OnPrem │ On-premises │ │
│ │ 0.0.0.0/0 │ Blackhole │ Block internet │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Development Route Table: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Destination │ Target │ Notes │ │
│ ├─────────────────────────────────────────────────────────┤ │
│ │ 10.2.0.0/16 │ VPC-Dev │ Development VPC │ │
│ │ 10.100.0.0/16 │ VPC-Shared │ Shared Services │ │
│ │ 0.0.0.0/0 │ NAT-Gateway │ Internet access │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘VPC Flow Logs
Log Format
version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
Example:
2 123456789012 eni-abc123 10.0.1.5 10.0.2.10 443 49152 6 25 5000 1620140761 1620140821 ACCEPT OKAnalysis Queries (Athena)
sql
-- Top talkers by bytes
SELECT srcaddr, dstaddr, SUM(bytes) as total_bytes
FROM vpc_flow_logs
WHERE action = 'ACCEPT'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 10;
-- Rejected connections (potential security issues)
SELECT srcaddr, dstaddr, dstport, COUNT(*) as reject_count
FROM vpc_flow_logs
WHERE action = 'REJECT'
GROUP BY srcaddr, dstaddr, dstport
ORDER BY reject_count DESC
LIMIT 20;Best Practices Checklist
- [ ] Use multiple AZs for high availability
- [ ] Separate subnets by tier (public/private/database)
- [ ] Enable VPC Flow Logs to S3 for analysis
- [ ] Use VPC Endpoints for AWS services
- [ ] Implement Transit Gateway for multi-VPC
- [ ] Plan CIDR ranges for future growth
- [ ] Use Security Groups over NACLs when possible
- [ ] Enable DNS hostnames and DNS resolution
⚖️ Trade-offs
Trade-off 1: Transit Gateway vs VPC Peering
| Khía cạnh | Transit Gateway | VPC Peering |
|---|---|---|
| Scalability | Hub-spoke, dễ scale | O(n²) connections |
| Cost | $0.05/GB + $0.05/hour per attachment | Free (chỉ data transfer) |
| Transitivity | Có, qua TGW | Không, chỉ point-to-point |
| Route management | Centralized route tables | Per-peering routes |
Ngữ cảnh enterprise: Một e-commerce company với 5 VPCs ban đầu dùng VPC Peering (10 connections). Khi scale lên 15 VPCs, họ cần 105 peering connections. Chi phí management trở nên unsustainable. Migrate sang Transit Gateway mất 2 tuần nhưng giảm operational overhead 80%.
Khuyến nghị:
- ≤ 5 VPCs: VPC Peering có thể đủ
- > 5 VPCs hoặc cần transitive routing: Transit Gateway
- Hybrid connectivity: Transit Gateway (attach VPN/Direct Connect)
Trade-off 2: NAT Gateway vs VPC Endpoints
| Khía cạnh | NAT Gateway | VPC Endpoints |
|---|---|---|
| Use case | Internet access for private instances | AWS service access |
| Cost | $0.045/hour + $0.045/GB | Gateway: Free, Interface: $0.01/hour + $0.01/GB |
| Security | Traffic qua Internet | Traffic trong AWS network |
| Performance | Qua NAT GW | Direct tới service |
Ví dụ tính toán:
S3 data transfer: 10 TB/tháng
NAT Gateway:
- Processing: 10,000 GB × $0.045 = $450
- NAT GW hours: 720h × $0.045 = $32.40
- Total: ~$482/tháng
S3 Gateway Endpoint:
- Cost: $0 (free)
- Savings: $482/tháng = $5,784/nămTrade-off 3: Security Groups vs NACLs
| Khía cạnh | Security Groups | NACLs |
|---|---|---|
| Stateful | Có (return traffic auto-allowed) | Không (phải explicit allow) |
| Rule type | Chỉ Allow | Allow + Deny |
| Level | Instance/ENI | Subnet |
| Evaluation | Tất cả rules | Theo order |
Khuyến nghị:
- Primary control: Security Groups (stateful, dễ manage)
- Secondary control: NACLs cho subnet-level deny (block known malicious IPs)
🚨 Failure Modes
Failure Mode 1: CIDR Exhaustion
┌─────────────────────────────────────────────────────────────────┐
│ CIDR EXHAUSTION TIMELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Year 1: Year 2: Year 3: │
│ VPC: 10.0.0.0/24 VPC: 10.0.0.0/24 VPC: 10.0.0.0/24 │
│ Used: 50 IPs Used: 200 IPs Used: 250 IPs │
│ Free: 200 IPs Free: 50 IPs Free: 0 IPs ❌ │
│ │
│ Impact: Không thể add instances, Lambda ENIs, ELBs │
│ Fix: Migrate sang VPC mới với larger CIDR │
│ │
└─────────────────────────────────────────────────────────────────┘| Cách phát hiện | Cách phòng tránh |
|---|---|
CloudWatch metric: AvailableIPAddresses | Plan VPC CIDR /16 hoặc /20, không /24 |
| Alert khi < 20% IPs available | Reserve secondary CIDRs cho expansion |
| Monthly capacity review | Document CIDR allocation policy |
Failure Mode 2: DNS Resolution Failures
⚠️ Incident thực tế
Application trong VPC không thể resolve S3 bucket DNS sau khi enable VPC endpoint. Root cause: enableDnsHostnames bị disable trên VPC. Team mất 4 giờ debug "connection timeout" mà không nghĩ đến DNS.
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Test DNS resolution sau mỗi VPC change | Checklist: enableDnsHostnames + enableDnsSupport = true |
| Monitor Route 53 Resolver query logs | VPC Endpoint private DNS verification |
| Application DNS resolution metrics | Infrastructure tests trong CI/CD |
Failure Mode 3: Transit Gateway Route Conflicts
| Cách phát hiện | Cách phòng tránh |
|---|---|
| Asymmetric routing symptoms | Document route table associations |
| Blackhole routes trong TGW | Automated route validation |
| VPC Flow Logs với unexpected drops | Test routing changes trong sandbox TGW |
Debug command:
bash
# Liệt kê tất cả TGW route tables
aws ec2 describe-transit-gateway-route-tables \
--transit-gateway-id tgw-xxx
# Check routes trong specific table
aws ec2 search-transit-gateway-routes \
--transit-gateway-route-table-id tgw-rtb-xxx \
--filters "Name=type,Values=static,propagated"🔐 Security Baseline
Network Security Requirements
| Requirement | Implementation | Verification |
|---|---|---|
| No public databases | Database subnets không có route to IGW | Config Rule: rds-instance-public-access-check |
| Encrypted transit | TLS everywhere, VPN/DX encryption | VPC Flow Logs analysis |
| Egress control | NAT Gateway + Security Groups | Outbound traffic audit |
| Private AWS access | VPC Endpoints cho tất cả AWS services | Endpoint policy review |
Network Segmentation Matrix
┌─────────────────────────────────────────────────────────────────┐
│ NETWORK SEGMENTATION MATRIX │
├─────────────────────────────────────────────────────────────────┤
│ │
│ From \ To │ Public │ Private │ Database │ Management │
│ ─────────────────────────────────────────────────────────── │
│ Public │ - │ ✅ ALB │ ❌ │ ❌ │
│ Private │ ❌ NAT │ ✅ SG │ ✅ DB SG │ ✅ Mgmt SG │
│ Database │ ❌ │ ❌ │ - │ ❌ │
│ Management │ ❌ │ ✅ SSH │ ✅ Admin │ - │
│ │
│ ✅ = Allowed với specific Security Group │
│ ❌ = Blocked by default │
│ │
└─────────────────────────────────────────────────────────────────┘VPC Endpoint Policy Example
json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RestrictToOrgBuckets",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::company-*/*",
"Condition": {
"StringEquals": {
"aws:PrincipalOrgID": "o-xxxxxxxxxx"
}
}
}
]
}📊 Ops Readiness
Metrics cần Monitoring
| Metric | Nguồn | Alert Threshold |
|---|---|---|
| Available IP addresses | VPC API | < 20% per subnet |
| NAT Gateway bandwidth | CloudWatch | > 80% capacity |
| TGW bytes | CloudWatch | Spike > 2x baseline |
| VPC Flow Logs rejected | Athena analysis | > 100/min unexpected |
| DNS query failures | Route 53 Resolver | > 0 for critical services |
Alerting Configuration
json
{
"AlarmName": "LowSubnetIPAvailability",
"MetricName": "AvailableIPAddressCount",
"Namespace": "AWS/EC2",
"Dimensions": [
{"Name": "SubnetId", "Value": "subnet-xxx"}
],
"Threshold": 50,
"ComparisonOperator": "LessThanThreshold",
"EvaluationPeriods": 1,
"AlarmActions": ["arn:aws:sns:region:account:network-alerts"]
}Runbook Entry Points
| Tình huống | Runbook |
|---|---|
| CIDR exhaustion alert | runbook/cidr-expansion.md |
| NAT Gateway failure | runbook/nat-gateway-failover.md |
| Transit Gateway routing issue | runbook/tgw-troubleshooting.md |
| VPC Endpoint connectivity | runbook/vpc-endpoint-debug.md |
| DNS resolution failure | runbook/dns-troubleshooting.md |
| Security Group audit request | runbook/sg-audit-process.md |
✅ Design Review Checklist
Kiến trúc VPC
- [ ] CIDR ranges không overlap với bất kỳ VPC nào khác
- [ ] Secondary CIDRs reserved cho future expansion
- [ ] Multi-AZ subnets cho tất cả tiers
- [ ] Subnet sizing phù hợp với expected growth
Security
- [ ] No public subnets cho databases/backend services
- [ ] VPC Endpoints cho tất cả AWS services đang dùng
- [ ] Security Group chaining thay vì CIDR ranges
- [ ] VPC Flow Logs enabled và analyzed
Connectivity
- [ ] Transit Gateway cho multi-VPC (nếu > 3 VPCs)
- [ ] Route table segmentation đúng theo security requirements
- [ ] DNS resolution configured đúng
- [ ] Hybrid connectivity (DX/VPN) redundancy
Operations
- [ ] IP address monitoring configured
- [ ] NAT Gateway bandwidth alerting
- [ ] Network documentation up-to-date
- [ ] Runbooks cho common network issues
📎 Liên kết
- 📎 GCP VPC & Networking - So sánh với GCP's networking model
- 📎 Compute Decisioning - Network requirements cho different compute options
- 📎 Security Posture - Network security trong overall security strategy
- 📎 Terraform Modules - IaC patterns cho VPC deployment
- 📎 Reliability & DR - Multi-region networking cho DR