Giao diện
📺 DESIGN YOUTUBE
Video Streaming & Content Delivery at Scale
🎓 Giáo sư Tom
YouTube là bài toán kinh điển về Video Processing Pipeline và Content Delivery Network. Làm sao để upload, transcode, và stream video đến hàng tỷ người dùng với latency thấp? Đây là nơi bạn học về chunked upload, adaptive bitrate streaming, và CDN architecture.
📊 Back-of-Envelope Calculations
Scale Assumptions
| Metric | Value | Rationale |
|---|---|---|
| Monthly Active Users (MAU) | 2B | Global video platform |
| Daily Active Users (DAU) | 800M | ~40% of MAU |
| Videos watched per user per day | 5 | Average viewing sessions |
| Average video duration | 5 minutes | Mix of short and long content |
| Video uploads per day | 500K | Creator uploads |
| Average upload size (raw) | 500 MB | HD video before compression |
| Storage per minute (compressed) | 50 MB | After transcoding to multiple resolutions |
Video Upload & Storage Calculations
Daily Upload Volume:
Videos uploaded/day = 500,000 videos
Raw upload size = 500K × 500 MB = 250 TB/day
Transcoded Storage (per video):
- 360p: ~10 MB/min
- 720p: ~25 MB/min
- 1080p: ~50 MB/min
- 4K: ~200 MB/min
Total per minute = ~285 MB/min (all resolutions)
Average 5-min video = 285 MB × 5 = 1.425 GB (all resolutions)
Daily transcoded storage = 500K × 1.425 GB = ~712 TB/day
Monthly storage growth = 712 TB × 30 = ~21 PB/monthStorage Breakdown by Resolution
| Resolution | Bitrate | Storage/min | % of Views |
|---|---|---|---|
| 360p | 1 Mbps | 7.5 MB | 15% |
| 720p | 3 Mbps | 22.5 MB | 35% |
| 1080p | 6 Mbps | 45 MB | 40% |
| 4K | 25 Mbps | 187.5 MB | 10% |
Transcoding Capacity Calculations
Transcoding Requirements:
Videos to transcode/day = 500,000
Average video duration = 5 minutes
Total minutes to transcode = 500K × 5 = 2.5M minutes/day
Transcoding time (per resolution):
- Real-time transcoding: 1 min video = 1 min processing
- With GPU acceleration: 1 min video = 0.1 min processing (10x faster)
Total transcoding work = 2.5M min × 4 resolutions = 10M min/day
With GPU workers: 10M / 10 = 1M GPU-minutes/day
Workers needed (24/7):
1M GPU-min / 1440 min/day = ~700 GPU workers (sustained)
Peak capacity (3x): ~2,100 GPU workersCDN Bandwidth Calculations
Video Streaming Volume:
Daily views = 800M DAU × 5 videos = 4B video views/day
Average video duration watched = 3 minutes (not all finish)
Total watch time = 4B × 3 min = 12B minutes/day
Bandwidth per resolution (weighted average):
Weighted bitrate = (0.15 × 1) + (0.35 × 3) + (0.40 × 6) + (0.10 × 25)
= 0.15 + 1.05 + 2.4 + 2.5 = 6.1 Mbps average
Daily bandwidth = 12B min × 60 sec × 6.1 Mbps
= 4.4 × 10^12 Mb = 4.4 Exabits/day
= 550 PB/day egress
Peak bandwidth:
Average egress = 550 PB / 86400 sec = 6.4 TB/s = 51 Tbps
Peak egress (3x) = ~150 Tbps🔧 Raizo's Note
CDN là chi phí lớn nhất của YouTube. 550 PB/day egress với giá $0.02/GB = $11M/day chỉ riêng bandwidth! Đây là lý do YouTube đầu tư heavily vào edge caching và peering agreements với ISPs.
QPS Calculations
Video Metadata Requests:
- Video page loads = 4B/day
- Search queries = 1B/day
- Recommendations = 4B/day
Total metadata QPS = 9B / 86400 = ~104,000 QPS
Peak QPS = 104K × 3 = ~312,000 QPS
Upload API:
- Upload initiations = 500K/day
- Chunk uploads = 500K × 100 chunks = 50M/day
Upload QPS = 50M / 86400 = ~580 QPS (relatively low)🏗️ High-Level Architecture
Component Responsibilities
| Component | Responsibility | Technology |
|---|---|---|
| CDN Edge Servers | Cache & serve video segments globally | Akamai/CloudFront/Google Edge |
| Upload Service | Handle chunked uploads, resumable uploads | Go microservice |
| Video Service | Video metadata CRUD, status management | Go/Java microservice |
| Transcoding Workers | Convert raw video to multiple resolutions | FFmpeg on GPU instances |
| Thumbnail Generator | Extract frames, generate thumbnails | FFmpeg + ImageMagick |
| Search Service | Full-text search on titles, descriptions | Elasticsearch cluster |
| Recommendation Service | Personalized video suggestions | TensorFlow/PyTorch ML models |
| Raw Storage | Temporary storage for uploaded videos | S3/GCS (delete after processing) |
| Transcoded Storage | Permanent storage for all resolutions | S3/GCS with lifecycle policies |
| Metadata DB | Video info, user data, comments | PostgreSQL/Vitess (sharded) |
🔧 Raizo's Note
Tại sao tách Raw và Transcoded storage?
Raw videos rất lớn (500MB+) và chỉ cần giữ tạm thời. Sau khi transcode xong, delete raw để tiết kiệm storage cost. Transcoded videos nhỏ hơn nhiều và được giữ permanent với multiple resolutions.
🔄 Core Flows
Flow 1: Video Upload & Processing
Flow 2: Video Playback (Streaming)
🎓 Giáo sư Tom
Adaptive Bitrate Streaming (ABR) là key feature:
- Player liên tục monitor bandwidth
- Tự động switch giữa các quality levels
- Không cần user intervention
- Giảm buffering, tăng watch time
💡 Deep Dive: Video Processing Pipeline
The Core Challenge
Upload một video 500MB và convert thành 4 resolutions với HLS segments là process phức tạp. Làm sao để handle 500K uploads/day một cách reliable và efficient?
Chunked Upload Mechanism
┌─────────────────────────────────────────────────────────────────┐
│ CHUNKED UPLOAD FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Original Video: 500 MB │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Split into 5MB chunks │ │
│ │ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │ │
│ │ │ 1 │ 2 │ 3 │ 4 │ 5 │ ... │ 99 │ 100 │ │ │
│ │ │ 5MB │ 5MB │ 5MB │ 5MB │ 5MB │ │ 5MB │ 5MB │ │ │
│ │ └──┬──┴──┬──┴──┬──┴──┬──┴──┬──┴─────┴──┬──┴──┬──┘ │ │
│ └─────┼─────┼─────┼─────┼─────┼───────────┼─────┼────────┘ │
│ │ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Parallel Upload to S3 (Presigned URLs) │ │
│ │ │ │
│ │ • Each chunk gets unique presigned URL │ │
│ │ • Upload in parallel (5-10 concurrent) │ │
│ │ • Retry failed chunks independently │ │
│ │ • Track progress: chunks_uploaded / total_chunks │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Benefits: │
│ ✅ Resumable: Network fails? Resume from last chunk │
│ ✅ Parallel: 10x faster than single stream │
│ ✅ Progress: Show accurate upload percentage │
│ ✅ Retry: Only retry failed chunks, not entire file │
│ │
└─────────────────────────────────────────────────────────────────┘Transcoding Pipeline Architecture
┌─────────────────────────────────────────────────────────────────┐
│ TRANSCODING PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Raw Video (S3) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Job Scheduler │ │
│ │ • Priority queue (premium creators first) │ │
│ │ • Distribute to available GPU workers │ │
│ │ • Handle retries and dead letter queue │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GPU Worker 1│ │ GPU Worker 2│ │ GPU Worker N│ │
│ │ (FFmpeg) │ │ (FFmpeg) │ │ (FFmpeg) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Per-Worker Processing │ │
│ │ │ │
│ │ 1. Download raw video from S3 │ │
│ │ 2. Analyze: codec, resolution, framerate, duration │ │
│ │ 3. Transcode to each target resolution: │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Input: 4K raw video │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────┴────┬────────┬────────┐ │ │ │
│ │ │ ▼ ▼ ▼ ▼ │ │ │
│ │ │ 360p 720p 1080p 4K │ │ │
│ │ │ H.264 H.264 H.264 H.265 │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ │ 4. Segment each resolution into HLS chunks │ │
│ │ 5. Generate manifest files (.m3u8) │ │
│ │ 6. Extract thumbnails (every 10 seconds) │ │
│ │ 7. Upload all outputs to transcoded storage │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘FFmpeg Transcoding Commands
bash
# Transcode to 720p with HLS segmentation
ffmpeg -i input.mp4 \
-vf scale=1280:720 \
-c:v libx264 -preset fast -crf 23 \
-c:a aac -b:a 128k \
-hls_time 6 \
-hls_playlist_type vod \
-hls_segment_filename '720p/segment_%03d.ts' \
720p/playlist.m3u8
# Generate thumbnail sprite (1 frame every 10 seconds)
ffmpeg -i input.mp4 \
-vf "fps=1/10,scale=160:90,tile=10x10" \
-frames:v 1 \
thumbnail_sprite.jpgHLS/DASH Adaptive Bitrate Streaming
┌─────────────────────────────────────────────────────────────────┐
│ HLS (HTTP Live Streaming) Structure │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Master Playlist (master.m3u8): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ #EXTM3U │ │
│ │ #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360 │ │
│ │ 360p/playlist.m3u8 │ │
│ │ #EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720│ │
│ │ 720p/playlist.m3u8 │ │
│ │ #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080│ │
│ │ 1080p/playlist.m3u8 │ │
│ │ #EXT-X-STREAM-INF:BANDWIDTH=14000000,RESOLUTION=3840x2160│ │
│ │ 4k/playlist.m3u8 │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Resolution Playlist (720p/playlist.m3u8): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ #EXTM3U │ │
│ │ #EXT-X-VERSION:3 │ │
│ │ #EXT-X-TARGETDURATION:6 │ │
│ │ #EXT-X-MEDIA-SEQUENCE:0 │ │
│ │ #EXTINF:6.000, │ │
│ │ segment_000.ts │ │
│ │ #EXTINF:6.000, │ │
│ │ segment_001.ts │ │
│ │ #EXTINF:6.000, │ │
│ │ segment_002.ts │ │
│ │ ... │ │
│ │ #EXT-X-ENDLIST │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Segment Files: │
│ ┌─────────┬─────────┬─────────┬─────────┬─────────┐ │
│ │ seg_000 │ seg_001 │ seg_002 │ seg_003 │ ... │ │
│ │ 6 sec │ 6 sec │ 6 sec │ 6 sec │ │ │
│ │ ~2 MB │ ~2 MB │ ~2 MB │ ~2 MB │ │ │
│ └─────────┴─────────┴─────────┴─────────┴─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘🎓 Giáo sư Tom
Tại sao segment 6 giây?
- Quá ngắn (2s): Nhiều HTTP requests, overhead cao
- Quá dài (30s): Chậm switch quality, buffer lớn
- 6 giây là sweet spot: Balance giữa latency và efficiency
- Netflix dùng 4s, YouTube dùng 5-6s
🔧 Raizo's Note
HLS vs DASH:
- HLS (Apple): Dùng
.m3u8playlist,.tssegments. Supported everywhere. - DASH (Industry standard): Dùng
.mpdmanifest,.m4ssegments. More flexible. - YouTube dùng cả hai: HLS cho iOS/Safari, DASH cho Android/Chrome.
- Trong interview, mention cả hai và trade-offs.
🚀 Optimization Techniques
1. Pre-fetching Next Segments
┌─────────────────────────────────────────────────────────────────┐
│ SEGMENT PRE-FETCHING │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Current playback position: Segment 5 │
│ │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ │
│ │ ✓ │ ✓ │ ✓ │ ✓ │ ▶️ │ ⏳ │ ⏳ │ │ │ │
│ │played│played│played│played│playing│buffer│buffer│ │ │
│ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │
│ │
│ Pre-fetch Strategy: │
│ • Always keep 2-3 segments buffered ahead │
│ • Start fetching segment N+1 when N is 50% played │
│ • Adjust buffer size based on network conditions │
│ │
│ Bandwidth Estimation: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Download time for segment 5: 0.8 seconds │ │
│ │ Segment size: 2 MB │ │
│ │ Estimated bandwidth: 2 MB / 0.8s = 2.5 MB/s = 20 Mbps │ │
│ │ → Safe to use 1080p (requires 6 Mbps) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘2. Video Deduplication
┌─────────────────────────────────────────────────────────────────┐
│ VIDEO DEDUPLICATION FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Problem: Same video uploaded multiple times │
│ • Re-uploads of viral videos │
│ • Same content from different users │
│ • Wastes storage and transcoding resources │
│ │
│ Solution: Content-based hashing │
│ │
│ ┌──────────────┐ │
│ │ New Upload │ │
│ │ (500 MB) │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Step 1: Calculate perceptual hash │ │
│ │ │ │
│ │ • Extract keyframes (every 30 seconds) │ │
│ │ • Calculate pHash for each keyframe │ │
│ │ • Combine into video fingerprint │ │
│ │ │ │
│ │ Video Hash: a3f2b8c1d4e5f6a7b8c9d0e1f2a3b4c5 │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Step 2: Check hash index │ │
│ │ │ │
│ │ SELECT video_id FROM video_hashes │ │
│ │ WHERE hash = 'a3f2b8c1...' │ │
│ │ OR hamming_distance(hash, 'a3f2b8c1...') < 5 │ │
│ └────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌─────────────────┴─────────────────┐ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Hash Found │ │ Hash Not │ │
│ │ (Duplicate) │ │ Found (New) │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Link to existing│ │ Process normally│ │
│ │ transcoded files│ │ Add hash to │ │
│ │ Skip transcoding│ │ index │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Deduplication Implementation
python
import hashlib
from PIL import Image
import imagehash
def calculate_video_fingerprint(video_path: str) -> str:
"""
Calculate perceptual hash fingerprint for video deduplication.
Uses keyframe extraction + perceptual hashing.
"""
keyframes = extract_keyframes(video_path, interval_seconds=30)
# Calculate perceptual hash for each keyframe
frame_hashes = []
for frame in keyframes:
img = Image.fromarray(frame)
phash = imagehash.phash(img, hash_size=16)
frame_hashes.append(str(phash))
# Combine frame hashes into video fingerprint
combined = ''.join(frame_hashes)
return hashlib.sha256(combined.encode()).hexdigest()
def check_duplicate(fingerprint: str, threshold: int = 5) -> Optional[str]:
"""
Check if video is duplicate using Hamming distance.
Returns existing video_id if duplicate found.
"""
# Query videos with similar fingerprints
candidates = db.query("""
SELECT video_id, fingerprint
FROM video_fingerprints
WHERE fingerprint_prefix = %s
""", fingerprint[:8]) # Use prefix for initial filtering
for candidate in candidates:
distance = hamming_distance(fingerprint, candidate.fingerprint)
if distance < threshold:
return candidate.video_id
return None
def hamming_distance(hash1: str, hash2: str) -> int:
"""Calculate Hamming distance between two hex hashes."""
return bin(int(hash1, 16) ^ int(hash2, 16)).count('1')🔧 Raizo's Note
Perceptual Hash vs Cryptographic Hash:
- MD5/SHA: Exact match only. 1 bit change = completely different hash.
- pHash: Similar content = similar hash. Tolerant to re-encoding, cropping, watermarks.
- YouTube dùng combination: pHash cho dedup, SHA cho integrity verification.
3. CDN Optimization Strategies
| Strategy | Description | Impact |
|---|---|---|
| Geographic Distribution | Edge servers in 100+ PoPs globally | Reduce latency by 50-80% |
| Hot Content Caching | Popular videos cached at all edges | 95%+ cache hit ratio |
| Origin Shield | Middle-tier cache before origin | Reduce origin load 10x |
| Predictive Caching | Pre-cache trending videos | Zero cold-start for viral content |
| ISP Peering | Direct connections to major ISPs | Bypass public internet |
🎓 Giáo sư Tom
Cache Hit Ratio là KPI quan trọng nhất của CDN:
- 95% hit ratio = chỉ 5% requests đến origin
- Với 150 Tbps peak traffic, 5% = 7.5 Tbps origin load
- Nếu hit ratio giảm xuống 90%, origin load tăng gấp đôi!
⚖️ Trade-offs Analysis
Architecture Decision Matrix
| Decision | Option A | Option B | Chosen | Rationale |
|---|---|---|---|---|
| Upload Method | Single stream | Chunked upload | Chunked | Resumable, parallel, better UX |
| Transcoding | On-demand | Pre-transcode all | Pre-transcode | Instant playback, predictable costs |
| Storage Format | Single file per resolution | HLS segments | HLS segments | Adaptive streaming, CDN-friendly |
| Video Codec | H.264 only | H.264 + H.265 + AV1 | Multiple codecs | Balance compatibility vs efficiency |
| CDN Strategy | Single CDN | Multi-CDN | Multi-CDN | Redundancy, cost optimization |
| Thumbnail Generation | On-demand | Pre-generate | Pre-generate | Instant display, better UX |
Codec Trade-offs
| Codec | Compression | CPU Cost | Browser Support | Use Case |
|---|---|---|---|---|
| H.264 | Baseline | Low | 100% | Default, mobile |
| H.265 (HEVC) | 50% better | 2x | 60% (Safari, Edge) | 4K content |
| VP9 | 50% better | 2x | 80% (Chrome, Firefox) | YouTube default |
| AV1 | 30% better than VP9 | 4x | 40% (growing) | Future standard |
🔧 Raizo's Note
Codec selection strategy:
- Transcode to H.264 + VP9 for all videos (covers 99% devices)
- Add H.265 for 4K content (Safari users)
- AV1 for new uploads only (save storage long-term)
- Serve best codec based on client capability header
🚨 Failure Scenarios & Mitigations
Scenario 1: Transcoding Worker Failure
| Aspect | Details |
|---|---|
| Impact | Video stuck in "Processing" state |
| Detection | Job timeout (> 30 min for 10-min video) |
| Mitigation | Auto-retry with exponential backoff |
| Fallback | Move to dead letter queue after 3 retries |
| Recovery | Manual review, re-queue with different worker |
┌─────────────────────────────────────────────────────────────────┐
│ TRANSCODING FAILURE HANDLING │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Job Queue ──► Worker 1 ──► FAIL (OOM) │
│ │ │
│ │ (retry_count < 3) │
│ ▼ │
│ Job Queue ──► Worker 2 ──► FAIL (corrupt file) │
│ │ │
│ │ (retry_count < 3) │
│ ▼ │
│ Job Queue ──► Worker 3 ──► SUCCESS ✓ │
│ │
│ OR if retry_count >= 3: │
│ │ │
│ ▼ │
│ Dead Letter Queue ──► Alert ──► Manual Review │
│ │
└─────────────────────────────────────────────────────────────────┘Scenario 2: CDN Edge Server Failure
| Aspect | Details |
|---|---|
| Impact | Users in affected region experience buffering |
| Detection | Health checks fail, error rate spike |
| Mitigation | DNS failover to next closest PoP |
| Failover Time | < 30 seconds with anycast routing |
| User Impact | Brief rebuffer, then normal playback |
🔧 Raizo's Note
Multi-CDN strategy:
- Primary: Google Edge (owned infrastructure)
- Secondary: Akamai/CloudFront (backup)
- Real-time switching based on performance metrics
- Cost: 10-20% premium, but worth it for reliability
Scenario 3: Origin Storage (S3/GCS) Outage
| Aspect | Details |
|---|---|
| Impact | Cache misses fail, new videos unavailable |
| Detection | 5xx errors from origin, CDN cache miss failures |
| Mitigation | Multi-region replication (3 regions minimum) |
| Fallback | Serve from secondary region |
| Data Durability | 99.999999999% (11 nines) with replication |
Scenario 4: Metadata Database Failure
| Aspect | Details |
|---|---|
| Impact | Video pages fail to load, search broken |
| Detection | Database connection errors, latency spike |
| Mitigation | Read replicas in multiple AZs |
| Failover | Automatic promotion of replica (< 60s) |
| Cache Strategy | Redis cache absorbs read traffic during failover |
Scenario 5: Upload Service Overload
| Aspect | Details |
|---|---|
| Impact | Creators can't upload new videos |
| Detection | Queue depth spike, upload latency increase |
| Mitigation | Auto-scaling upload workers |
| Rate Limiting | Per-user upload limits (10 videos/hour) |
| Graceful Degradation | Queue uploads, process async |
⚠️ Critical Failure: Viral Video Surge
Scenario: New viral video gets 100M views in 1 hour
Problem: CDN cache cold, all requests hit origin
Solution:
- Predictive caching: ML model detects viral potential early
- Origin shield: Middle-tier cache absorbs surge
- Request coalescing: Dedupe concurrent requests for same segment
- Backpressure: Rate limit if origin overwhelmed
💰 Cost Estimation
Monthly Infrastructure Costs (2B MAU Scale)
| Service | Specification | Unit Cost | Monthly Cost |
|---|---|---|---|
| Transcoding (GPU) | 1,000 × p3.2xlarge (spot) | $1.00/hr (spot) | $720,000 |
| API Servers | 500 × c5.2xlarge | $0.34/hr | $122,000 |
| Upload Workers | 100 × c5.xlarge | $0.17/hr | $12,000 |
| Raw Storage (S3) | 250 TB (temporary) | $0.023/GB | $6,000 |
| Transcoded Storage | 100 PB | $0.021/GB (Glacier IA) | $2,100,000 |
| CDN Bandwidth | 550 PB egress | $0.02/GB (negotiated) | $11,000,000 |
| CDN Storage (Edge) | 10 PB hot cache | $0.10/GB | $1,000,000 |
| Metadata DB | 100 TB PostgreSQL/Vitess | $0.115/GB | $12,000 |
| Search Cluster | 50 × Elasticsearch nodes | $0.50/hr | $18,000 |
| Redis Cache | 1 TB cluster | $0.068/GB/hr | $50,000 |
| Message Queue | Pub/Sub 100B messages | $0.04/M | $4,000 |
| ML Inference | 200 × GPU instances | $0.90/hr | $130,000 |
Cost Summary
| Category | Monthly Cost | % of Total |
|---|---|---|
| CDN & Bandwidth | $12,000,000 | 75% |
| Storage | $2,106,000 | 13% |
| Compute (Transcoding) | $720,000 | 4.5% |
| Compute (API/Workers) | $134,000 | 0.8% |
| ML/Recommendations | $130,000 | 0.8% |
| Database & Cache | $80,000 | 0.5% |
| Other | $830,000 | 5.4% |
| Total | ~$16,000,000 | 100% |
🎓 Giáo sư Tom
CDN là 75% chi phí! Đây là lý do YouTube:
- Xây dựng Google Global Cache (GGC) - đặt servers trong ISP data centers
- Negotiate peering agreements với major ISPs
- Invest heavily vào video compression (VP9, AV1)
- Mỗi 1% improvement trong compression = $120K/month savings
Cost Optimization Strategies
| Strategy | Savings | Implementation |
|---|---|---|
| Spot Instances for Transcoding | 70% | Interruptible workloads OK |
| S3 Intelligent Tiering | 40% | Auto-move cold videos |
| Reserved Instances (API) | 30% | 1-year commitment |
| VP9/AV1 Codec | 30-50% bandwidth | Better compression |
| ISP Peering | 50% CDN | Direct connections |
| Delete Raw After Transcode | 100% raw storage | Immediate cleanup |
Cost per User Metrics
Monthly cost: $16,000,000
MAU: 2,000,000,000
Cost per MAU per month = $16M / 2B = $0.008
Cost per MAU per year = $0.008 × 12 = $0.096
Revenue per user (ads): ~$7-15/year
Gross margin: Healthy but thin!
Cost per video view:
Daily views: 4B
Monthly views: 120B
Cost per view = $16M / 120B = $0.00013 (~0.01 cents)🔧 Raizo's Note
Hidden costs không có trong bảng:
- Content moderation: ML + human review = $50M+/month
- Legal/Copyright: Content ID system, licensing = $100M+/month
- Data transfer between regions: $0.02/GB adds up
- Monitoring/Logging: Petabytes of logs = $1M+/month
- Security/DDoS protection: $500K+/month
Actual YouTube operating cost likely $50-100M/month.
🎯 Interview Checklist
Must-Mention Items ✅
| Topic | Key Points |
|---|---|
| Scale Estimation | 2B MAU, 500K uploads/day, 550 PB/day egress |
| Chunked Upload | Resumable, parallel, presigned URLs |
| Transcoding Pipeline | Async workers, multiple resolutions, GPU acceleration |
| HLS/DASH Streaming | Adaptive bitrate, 6-second segments, manifest files |
| CDN Architecture | Edge caching, 95%+ hit ratio, multi-region |
| Video Deduplication | Perceptual hashing, storage optimization |
Bonus Points 🌟
- Codec Evolution: H.264 → VP9 → AV1 trade-offs
- Live Streaming: Different architecture (RTMP ingest, low-latency HLS)
- Content ID: Copyright detection using audio/video fingerprinting
- Recommendation System: Collaborative filtering, watch history, engagement signals
- Thumbnail A/B Testing: Multiple thumbnails, CTR optimization
- Video Quality Metrics: VMAF score, buffering ratio, startup time
Common Mistakes ❌
| Mistake | Why It's Wrong | Better Approach |
|---|---|---|
| Single file upload | Can't resume, poor UX | Chunked upload with presigned URLs |
| Transcode on-demand | High latency, unpredictable | Pre-transcode all resolutions |
| Single resolution | Wastes bandwidth on slow connections | Adaptive bitrate streaming |
| Store raw videos | Massive storage cost | Delete after transcoding |
| Single CDN | Single point of failure | Multi-CDN with failover |
| Ignore deduplication | Wasted storage/compute | Perceptual hashing |
⚠️ Interview Red Flags
- Không mention chunked upload cho large files
- Không biết HLS/DASH adaptive streaming
- Không address CDN và edge caching
- Thiết kế synchronous transcoding (blocks upload)
- Không có video deduplication strategy
- Underestimate bandwidth costs
🎓 Key Takeaways
- CDN là chi phí lớn nhất - 75% total cost, optimize relentlessly
- Chunked upload là must-have cho large file uploads
- Pre-transcode tất cả resolutions để instant playback
- HLS/DASH cho adaptive streaming - không có alternative
- Perceptual hashing cho video deduplication - save storage + compute
- Multi-codec strategy - balance compatibility vs compression
- Async processing - decouple upload từ transcoding
🔗 Navigation
Related Case Studies
| Case Study | Key Learning | Link |
|---|---|---|
| Fan-out patterns, timeline caching | Design Twitter → | |
| Uber | Real-time location, geospatial indexing | Design Uber → |
| Messaging, E2EE, connection management | Design WhatsApp → |