Skip to content

📺 DESIGN YOUTUBE

Video Streaming & Content Delivery at Scale

🎓 Giáo sư Tom

YouTube là bài toán kinh điển về Video Processing PipelineContent Delivery Network. Làm sao để upload, transcode, và stream video đến hàng tỷ người dùng với latency thấp? Đây là nơi bạn học về chunked upload, adaptive bitrate streaming, và CDN architecture.

📊 Back-of-Envelope Calculations

Scale Assumptions

MetricValueRationale
Monthly Active Users (MAU)2BGlobal video platform
Daily Active Users (DAU)800M~40% of MAU
Videos watched per user per day5Average viewing sessions
Average video duration5 minutesMix of short and long content
Video uploads per day500KCreator uploads
Average upload size (raw)500 MBHD video before compression
Storage per minute (compressed)50 MBAfter transcoding to multiple resolutions

Video Upload & Storage Calculations

Daily Upload Volume:
Videos uploaded/day = 500,000 videos
Raw upload size = 500K × 500 MB = 250 TB/day

Transcoded Storage (per video):
- 360p: ~10 MB/min
- 720p: ~25 MB/min  
- 1080p: ~50 MB/min
- 4K: ~200 MB/min
Total per minute = ~285 MB/min (all resolutions)

Average 5-min video = 285 MB × 5 = 1.425 GB (all resolutions)
Daily transcoded storage = 500K × 1.425 GB = ~712 TB/day
Monthly storage growth = 712 TB × 30 = ~21 PB/month
Storage Breakdown by Resolution
ResolutionBitrateStorage/min% of Views
360p1 Mbps7.5 MB15%
720p3 Mbps22.5 MB35%
1080p6 Mbps45 MB40%
4K25 Mbps187.5 MB10%

Transcoding Capacity Calculations

Transcoding Requirements:
Videos to transcode/day = 500,000
Average video duration = 5 minutes
Total minutes to transcode = 500K × 5 = 2.5M minutes/day

Transcoding time (per resolution):
- Real-time transcoding: 1 min video = 1 min processing
- With GPU acceleration: 1 min video = 0.1 min processing (10x faster)

Total transcoding work = 2.5M min × 4 resolutions = 10M min/day
With GPU workers: 10M / 10 = 1M GPU-minutes/day

Workers needed (24/7):
1M GPU-min / 1440 min/day = ~700 GPU workers (sustained)
Peak capacity (3x): ~2,100 GPU workers

CDN Bandwidth Calculations

Video Streaming Volume:
Daily views = 800M DAU × 5 videos = 4B video views/day
Average video duration watched = 3 minutes (not all finish)
Total watch time = 4B × 3 min = 12B minutes/day

Bandwidth per resolution (weighted average):
Weighted bitrate = (0.15 × 1) + (0.35 × 3) + (0.40 × 6) + (0.10 × 25)
                 = 0.15 + 1.05 + 2.4 + 2.5 = 6.1 Mbps average

Daily bandwidth = 12B min × 60 sec × 6.1 Mbps
                = 4.4 × 10^12 Mb = 4.4 Exabits/day
                = 550 PB/day egress

Peak bandwidth:
Average egress = 550 PB / 86400 sec = 6.4 TB/s = 51 Tbps
Peak egress (3x) = ~150 Tbps

🔧 Raizo's Note

CDN là chi phí lớn nhất của YouTube. 550 PB/day egress với giá $0.02/GB = $11M/day chỉ riêng bandwidth! Đây là lý do YouTube đầu tư heavily vào edge caching và peering agreements với ISPs.

QPS Calculations

Video Metadata Requests:
- Video page loads = 4B/day
- Search queries = 1B/day
- Recommendations = 4B/day
Total metadata QPS = 9B / 86400 = ~104,000 QPS
Peak QPS = 104K × 3 = ~312,000 QPS

Upload API:
- Upload initiations = 500K/day
- Chunk uploads = 500K × 100 chunks = 50M/day
Upload QPS = 50M / 86400 = ~580 QPS (relatively low)

🏗️ High-Level Architecture

Component Responsibilities

ComponentResponsibilityTechnology
CDN Edge ServersCache & serve video segments globallyAkamai/CloudFront/Google Edge
Upload ServiceHandle chunked uploads, resumable uploadsGo microservice
Video ServiceVideo metadata CRUD, status managementGo/Java microservice
Transcoding WorkersConvert raw video to multiple resolutionsFFmpeg on GPU instances
Thumbnail GeneratorExtract frames, generate thumbnailsFFmpeg + ImageMagick
Search ServiceFull-text search on titles, descriptionsElasticsearch cluster
Recommendation ServicePersonalized video suggestionsTensorFlow/PyTorch ML models
Raw StorageTemporary storage for uploaded videosS3/GCS (delete after processing)
Transcoded StoragePermanent storage for all resolutionsS3/GCS with lifecycle policies
Metadata DBVideo info, user data, commentsPostgreSQL/Vitess (sharded)

🔧 Raizo's Note

Tại sao tách Raw và Transcoded storage?

Raw videos rất lớn (500MB+) và chỉ cần giữ tạm thời. Sau khi transcode xong, delete raw để tiết kiệm storage cost. Transcoded videos nhỏ hơn nhiều và được giữ permanent với multiple resolutions.

🔄 Core Flows

Flow 1: Video Upload & Processing

Flow 2: Video Playback (Streaming)

🎓 Giáo sư Tom

Adaptive Bitrate Streaming (ABR) là key feature:

  • Player liên tục monitor bandwidth
  • Tự động switch giữa các quality levels
  • Không cần user intervention
  • Giảm buffering, tăng watch time

💡 Deep Dive: Video Processing Pipeline

The Core Challenge

Upload một video 500MB và convert thành 4 resolutions với HLS segments là process phức tạp. Làm sao để handle 500K uploads/day một cách reliable và efficient?

Chunked Upload Mechanism

┌─────────────────────────────────────────────────────────────────┐
│                    CHUNKED UPLOAD FLOW                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Original Video: 500 MB                                         │
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Split into 5MB chunks                       │   │
│  │  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐     │   │
│  │  │  1  │  2  │  3  │  4  │  5  │ ... │ 99  │ 100 │     │   │
│  │  │ 5MB │ 5MB │ 5MB │ 5MB │ 5MB │     │ 5MB │ 5MB │     │   │
│  │  └──┬──┴──┬──┴──┬──┴──┬──┴──┬──┴─────┴──┬──┴──┬──┘     │   │
│  └─────┼─────┼─────┼─────┼─────┼───────────┼─────┼────────┘   │
│        │     │     │     │     │           │     │             │
│        ▼     ▼     ▼     ▼     ▼           ▼     ▼             │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │           Parallel Upload to S3 (Presigned URLs)        │   │
│  │                                                          │   │
│  │   • Each chunk gets unique presigned URL                │   │
│  │   • Upload in parallel (5-10 concurrent)                │   │
│  │   • Retry failed chunks independently                   │   │
│  │   • Track progress: chunks_uploaded / total_chunks      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Benefits:                                                      │
│  ✅ Resumable: Network fails? Resume from last chunk           │
│  ✅ Parallel: 10x faster than single stream                    │
│  ✅ Progress: Show accurate upload percentage                  │
│  ✅ Retry: Only retry failed chunks, not entire file           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Transcoding Pipeline Architecture

┌─────────────────────────────────────────────────────────────────┐
│                 TRANSCODING PIPELINE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Raw Video (S3)                                                 │
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                  Job Scheduler                           │   │
│  │  • Priority queue (premium creators first)              │   │
│  │  • Distribute to available GPU workers                  │   │
│  │  • Handle retries and dead letter queue                 │   │
│  └────────────────────────┬────────────────────────────────┘   │
│                           │                                     │
│         ┌─────────────────┼─────────────────┐                  │
│         ▼                 ▼                 ▼                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │ GPU Worker 1│  │ GPU Worker 2│  │ GPU Worker N│            │
│  │   (FFmpeg)  │  │   (FFmpeg)  │  │   (FFmpeg)  │            │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘            │
│         │                │                │                    │
│         └────────────────┼────────────────┘                    │
│                          ▼                                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Per-Worker Processing                       │   │
│  │                                                          │   │
│  │  1. Download raw video from S3                          │   │
│  │  2. Analyze: codec, resolution, framerate, duration     │   │
│  │  3. Transcode to each target resolution:                │   │
│  │     ┌────────────────────────────────────────────┐      │   │
│  │     │  Input: 4K raw video                       │      │   │
│  │     │         │                                  │      │   │
│  │     │    ┌────┴────┬────────┬────────┐          │      │   │
│  │     │    ▼         ▼        ▼        ▼          │      │   │
│  │     │  360p      720p    1080p      4K          │      │   │
│  │     │  H.264     H.264   H.264    H.265         │      │   │
│  │     └────────────────────────────────────────────┘      │   │
│  │  4. Segment each resolution into HLS chunks             │   │
│  │  5. Generate manifest files (.m3u8)                     │   │
│  │  6. Extract thumbnails (every 10 seconds)               │   │
│  │  7. Upload all outputs to transcoded storage            │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

FFmpeg Transcoding Commands

bash
# Transcode to 720p with HLS segmentation
ffmpeg -i input.mp4 \
  -vf scale=1280:720 \
  -c:v libx264 -preset fast -crf 23 \
  -c:a aac -b:a 128k \
  -hls_time 6 \
  -hls_playlist_type vod \
  -hls_segment_filename '720p/segment_%03d.ts' \
  720p/playlist.m3u8

# Generate thumbnail sprite (1 frame every 10 seconds)
ffmpeg -i input.mp4 \
  -vf "fps=1/10,scale=160:90,tile=10x10" \
  -frames:v 1 \
  thumbnail_sprite.jpg

HLS/DASH Adaptive Bitrate Streaming

┌─────────────────────────────────────────────────────────────────┐
│              HLS (HTTP Live Streaming) Structure                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Master Playlist (master.m3u8):                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  #EXTM3U                                                 │   │
│  │  #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360  │   │
│  │  360p/playlist.m3u8                                      │   │
│  │  #EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720│   │
│  │  720p/playlist.m3u8                                      │   │
│  │  #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080│  │
│  │  1080p/playlist.m3u8                                     │   │
│  │  #EXT-X-STREAM-INF:BANDWIDTH=14000000,RESOLUTION=3840x2160│ │
│  │  4k/playlist.m3u8                                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Resolution Playlist (720p/playlist.m3u8):                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  #EXTM3U                                                 │   │
│  │  #EXT-X-VERSION:3                                        │   │
│  │  #EXT-X-TARGETDURATION:6                                 │   │
│  │  #EXT-X-MEDIA-SEQUENCE:0                                 │   │
│  │  #EXTINF:6.000,                                          │   │
│  │  segment_000.ts                                          │   │
│  │  #EXTINF:6.000,                                          │   │
│  │  segment_001.ts                                          │   │
│  │  #EXTINF:6.000,                                          │   │
│  │  segment_002.ts                                          │   │
│  │  ...                                                     │   │
│  │  #EXT-X-ENDLIST                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Segment Files:                                                 │
│  ┌─────────┬─────────┬─────────┬─────────┬─────────┐          │
│  │ seg_000 │ seg_001 │ seg_002 │ seg_003 │   ...   │          │
│  │  6 sec  │  6 sec  │  6 sec  │  6 sec  │         │          │
│  │  ~2 MB  │  ~2 MB  │  ~2 MB  │  ~2 MB  │         │          │
│  └─────────┴─────────┴─────────┴─────────┴─────────┘          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

🎓 Giáo sư Tom

Tại sao segment 6 giây?

  • Quá ngắn (2s): Nhiều HTTP requests, overhead cao
  • Quá dài (30s): Chậm switch quality, buffer lớn
  • 6 giây là sweet spot: Balance giữa latency và efficiency
  • Netflix dùng 4s, YouTube dùng 5-6s

🔧 Raizo's Note

HLS vs DASH:

  • HLS (Apple): Dùng .m3u8 playlist, .ts segments. Supported everywhere.
  • DASH (Industry standard): Dùng .mpd manifest, .m4s segments. More flexible.
  • YouTube dùng cả hai: HLS cho iOS/Safari, DASH cho Android/Chrome.
  • Trong interview, mention cả hai và trade-offs.

🚀 Optimization Techniques

1. Pre-fetching Next Segments

┌─────────────────────────────────────────────────────────────────┐
│                    SEGMENT PRE-FETCHING                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Current playback position: Segment 5                           │
│                                                                 │
│  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐      │
│  │  1  │  2  │  3  │  4  │  5  │  6  │  7  │  8  │  9  │      │
│  │ ✓   │ ✓   │ ✓   │ ✓   │ ▶️  │ ⏳  │ ⏳  │     │     │      │
│  │played│played│played│played│playing│buffer│buffer│     │     │
│  └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘      │
│                                                                 │
│  Pre-fetch Strategy:                                            │
│  • Always keep 2-3 segments buffered ahead                     │
│  • Start fetching segment N+1 when N is 50% played             │
│  • Adjust buffer size based on network conditions              │
│                                                                 │
│  Bandwidth Estimation:                                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Download time for segment 5: 0.8 seconds               │   │
│  │  Segment size: 2 MB                                      │   │
│  │  Estimated bandwidth: 2 MB / 0.8s = 2.5 MB/s = 20 Mbps  │   │
│  │  → Safe to use 1080p (requires 6 Mbps)                  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

2. Video Deduplication

┌─────────────────────────────────────────────────────────────────┐
│                  VIDEO DEDUPLICATION FLOW                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Problem: Same video uploaded multiple times                    │
│  • Re-uploads of viral videos                                   │
│  • Same content from different users                            │
│  • Wastes storage and transcoding resources                     │
│                                                                 │
│  Solution: Content-based hashing                                │
│                                                                 │
│  ┌──────────────┐                                              │
│  │ New Upload   │                                              │
│  │ (500 MB)     │                                              │
│  └──────┬───────┘                                              │
│         │                                                       │
│         ▼                                                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Step 1: Calculate perceptual hash                       │   │
│  │                                                          │   │
│  │  • Extract keyframes (every 30 seconds)                 │   │
│  │  • Calculate pHash for each keyframe                    │   │
│  │  • Combine into video fingerprint                       │   │
│  │                                                          │   │
│  │  Video Hash: a3f2b8c1d4e5f6a7b8c9d0e1f2a3b4c5           │   │
│  └────────────────────────┬────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Step 2: Check hash index                                │   │
│  │                                                          │   │
│  │  SELECT video_id FROM video_hashes                      │   │
│  │  WHERE hash = 'a3f2b8c1...'                             │   │
│  │  OR hamming_distance(hash, 'a3f2b8c1...') < 5           │   │
│  └────────────────────────┬────────────────────────────────┘   │
│                           │                                     │
│         ┌─────────────────┴─────────────────┐                  │
│         ▼                                   ▼                  │
│  ┌─────────────┐                    ┌─────────────┐           │
│  │ Hash Found  │                    │ Hash Not    │           │
│  │ (Duplicate) │                    │ Found (New) │           │
│  └──────┬──────┘                    └──────┬──────┘           │
│         │                                  │                   │
│         ▼                                  ▼                   │
│  ┌─────────────────┐              ┌─────────────────┐         │
│  │ Link to existing│              │ Process normally│         │
│  │ transcoded files│              │ Add hash to     │         │
│  │ Skip transcoding│              │ index           │         │
│  └─────────────────┘              └─────────────────┘         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Deduplication Implementation

python
import hashlib
from PIL import Image
import imagehash

def calculate_video_fingerprint(video_path: str) -> str:
    """
    Calculate perceptual hash fingerprint for video deduplication.
    Uses keyframe extraction + perceptual hashing.
    """
    keyframes = extract_keyframes(video_path, interval_seconds=30)
    
    # Calculate perceptual hash for each keyframe
    frame_hashes = []
    for frame in keyframes:
        img = Image.fromarray(frame)
        phash = imagehash.phash(img, hash_size=16)
        frame_hashes.append(str(phash))
    
    # Combine frame hashes into video fingerprint
    combined = ''.join(frame_hashes)
    return hashlib.sha256(combined.encode()).hexdigest()

def check_duplicate(fingerprint: str, threshold: int = 5) -> Optional[str]:
    """
    Check if video is duplicate using Hamming distance.
    Returns existing video_id if duplicate found.
    """
    # Query videos with similar fingerprints
    candidates = db.query("""
        SELECT video_id, fingerprint 
        FROM video_fingerprints 
        WHERE fingerprint_prefix = %s
    """, fingerprint[:8])  # Use prefix for initial filtering
    
    for candidate in candidates:
        distance = hamming_distance(fingerprint, candidate.fingerprint)
        if distance < threshold:
            return candidate.video_id
    
    return None

def hamming_distance(hash1: str, hash2: str) -> int:
    """Calculate Hamming distance between two hex hashes."""
    return bin(int(hash1, 16) ^ int(hash2, 16)).count('1')

🔧 Raizo's Note

Perceptual Hash vs Cryptographic Hash:

  • MD5/SHA: Exact match only. 1 bit change = completely different hash.
  • pHash: Similar content = similar hash. Tolerant to re-encoding, cropping, watermarks.
  • YouTube dùng combination: pHash cho dedup, SHA cho integrity verification.

3. CDN Optimization Strategies

StrategyDescriptionImpact
Geographic DistributionEdge servers in 100+ PoPs globallyReduce latency by 50-80%
Hot Content CachingPopular videos cached at all edges95%+ cache hit ratio
Origin ShieldMiddle-tier cache before originReduce origin load 10x
Predictive CachingPre-cache trending videosZero cold-start for viral content
ISP PeeringDirect connections to major ISPsBypass public internet

🎓 Giáo sư Tom

Cache Hit Ratio là KPI quan trọng nhất của CDN:

  • 95% hit ratio = chỉ 5% requests đến origin
  • Với 150 Tbps peak traffic, 5% = 7.5 Tbps origin load
  • Nếu hit ratio giảm xuống 90%, origin load tăng gấp đôi!

⚖️ Trade-offs Analysis

Architecture Decision Matrix

DecisionOption AOption BChosenRationale
Upload MethodSingle streamChunked uploadChunkedResumable, parallel, better UX
TranscodingOn-demandPre-transcode allPre-transcodeInstant playback, predictable costs
Storage FormatSingle file per resolutionHLS segmentsHLS segmentsAdaptive streaming, CDN-friendly
Video CodecH.264 onlyH.264 + H.265 + AV1Multiple codecsBalance compatibility vs efficiency
CDN StrategySingle CDNMulti-CDNMulti-CDNRedundancy, cost optimization
Thumbnail GenerationOn-demandPre-generatePre-generateInstant display, better UX

Codec Trade-offs

CodecCompressionCPU CostBrowser SupportUse Case
H.264BaselineLow100%Default, mobile
H.265 (HEVC)50% better2x60% (Safari, Edge)4K content
VP950% better2x80% (Chrome, Firefox)YouTube default
AV130% better than VP94x40% (growing)Future standard

🔧 Raizo's Note

Codec selection strategy:

  • Transcode to H.264 + VP9 for all videos (covers 99% devices)
  • Add H.265 for 4K content (Safari users)
  • AV1 for new uploads only (save storage long-term)
  • Serve best codec based on client capability header

🚨 Failure Scenarios & Mitigations

Scenario 1: Transcoding Worker Failure

AspectDetails
ImpactVideo stuck in "Processing" state
DetectionJob timeout (> 30 min for 10-min video)
MitigationAuto-retry with exponential backoff
FallbackMove to dead letter queue after 3 retries
RecoveryManual review, re-queue with different worker
┌─────────────────────────────────────────────────────────────────┐
│              TRANSCODING FAILURE HANDLING                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Job Queue ──► Worker 1 ──► FAIL (OOM)                         │
│      │                                                          │
│      │ (retry_count < 3)                                       │
│      ▼                                                          │
│  Job Queue ──► Worker 2 ──► FAIL (corrupt file)                │
│      │                                                          │
│      │ (retry_count < 3)                                       │
│      ▼                                                          │
│  Job Queue ──► Worker 3 ──► SUCCESS ✓                          │
│                                                                 │
│  OR if retry_count >= 3:                                       │
│      │                                                          │
│      ▼                                                          │
│  Dead Letter Queue ──► Alert ──► Manual Review                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Scenario 2: CDN Edge Server Failure

AspectDetails
ImpactUsers in affected region experience buffering
DetectionHealth checks fail, error rate spike
MitigationDNS failover to next closest PoP
Failover Time< 30 seconds with anycast routing
User ImpactBrief rebuffer, then normal playback

🔧 Raizo's Note

Multi-CDN strategy:

  • Primary: Google Edge (owned infrastructure)
  • Secondary: Akamai/CloudFront (backup)
  • Real-time switching based on performance metrics
  • Cost: 10-20% premium, but worth it for reliability

Scenario 3: Origin Storage (S3/GCS) Outage

AspectDetails
ImpactCache misses fail, new videos unavailable
Detection5xx errors from origin, CDN cache miss failures
MitigationMulti-region replication (3 regions minimum)
FallbackServe from secondary region
Data Durability99.999999999% (11 nines) with replication

Scenario 4: Metadata Database Failure

AspectDetails
ImpactVideo pages fail to load, search broken
DetectionDatabase connection errors, latency spike
MitigationRead replicas in multiple AZs
FailoverAutomatic promotion of replica (< 60s)
Cache StrategyRedis cache absorbs read traffic during failover

Scenario 5: Upload Service Overload

AspectDetails
ImpactCreators can't upload new videos
DetectionQueue depth spike, upload latency increase
MitigationAuto-scaling upload workers
Rate LimitingPer-user upload limits (10 videos/hour)
Graceful DegradationQueue uploads, process async

⚠️ Critical Failure: Viral Video Surge

Scenario: New viral video gets 100M views in 1 hour

Problem: CDN cache cold, all requests hit origin

Solution:

  1. Predictive caching: ML model detects viral potential early
  2. Origin shield: Middle-tier cache absorbs surge
  3. Request coalescing: Dedupe concurrent requests for same segment
  4. Backpressure: Rate limit if origin overwhelmed

💰 Cost Estimation

Monthly Infrastructure Costs (2B MAU Scale)

ServiceSpecificationUnit CostMonthly Cost
Transcoding (GPU)1,000 × p3.2xlarge (spot)$1.00/hr (spot)$720,000
API Servers500 × c5.2xlarge$0.34/hr$122,000
Upload Workers100 × c5.xlarge$0.17/hr$12,000
Raw Storage (S3)250 TB (temporary)$0.023/GB$6,000
Transcoded Storage100 PB$0.021/GB (Glacier IA)$2,100,000
CDN Bandwidth550 PB egress$0.02/GB (negotiated)$11,000,000
CDN Storage (Edge)10 PB hot cache$0.10/GB$1,000,000
Metadata DB100 TB PostgreSQL/Vitess$0.115/GB$12,000
Search Cluster50 × Elasticsearch nodes$0.50/hr$18,000
Redis Cache1 TB cluster$0.068/GB/hr$50,000
Message QueuePub/Sub 100B messages$0.04/M$4,000
ML Inference200 × GPU instances$0.90/hr$130,000

Cost Summary

CategoryMonthly Cost% of Total
CDN & Bandwidth$12,000,00075%
Storage$2,106,00013%
Compute (Transcoding)$720,0004.5%
Compute (API/Workers)$134,0000.8%
ML/Recommendations$130,0000.8%
Database & Cache$80,0000.5%
Other$830,0005.4%
Total~$16,000,000100%

🎓 Giáo sư Tom

CDN là 75% chi phí! Đây là lý do YouTube:

  1. Xây dựng Google Global Cache (GGC) - đặt servers trong ISP data centers
  2. Negotiate peering agreements với major ISPs
  3. Invest heavily vào video compression (VP9, AV1)
  4. Mỗi 1% improvement trong compression = $120K/month savings

Cost Optimization Strategies

StrategySavingsImplementation
Spot Instances for Transcoding70%Interruptible workloads OK
S3 Intelligent Tiering40%Auto-move cold videos
Reserved Instances (API)30%1-year commitment
VP9/AV1 Codec30-50% bandwidthBetter compression
ISP Peering50% CDNDirect connections
Delete Raw After Transcode100% raw storageImmediate cleanup

Cost per User Metrics

Monthly cost: $16,000,000
MAU: 2,000,000,000

Cost per MAU per month = $16M / 2B = $0.008
Cost per MAU per year = $0.008 × 12 = $0.096

Revenue per user (ads): ~$7-15/year
Gross margin: Healthy but thin!

Cost per video view:
Daily views: 4B
Monthly views: 120B
Cost per view = $16M / 120B = $0.00013 (~0.01 cents)

🔧 Raizo's Note

Hidden costs không có trong bảng:

  • Content moderation: ML + human review = $50M+/month
  • Legal/Copyright: Content ID system, licensing = $100M+/month
  • Data transfer between regions: $0.02/GB adds up
  • Monitoring/Logging: Petabytes of logs = $1M+/month
  • Security/DDoS protection: $500K+/month

Actual YouTube operating cost likely $50-100M/month.

🎯 Interview Checklist

Must-Mention Items

TopicKey Points
Scale Estimation2B MAU, 500K uploads/day, 550 PB/day egress
Chunked UploadResumable, parallel, presigned URLs
Transcoding PipelineAsync workers, multiple resolutions, GPU acceleration
HLS/DASH StreamingAdaptive bitrate, 6-second segments, manifest files
CDN ArchitectureEdge caching, 95%+ hit ratio, multi-region
Video DeduplicationPerceptual hashing, storage optimization

Bonus Points 🌟

  • Codec Evolution: H.264 → VP9 → AV1 trade-offs
  • Live Streaming: Different architecture (RTMP ingest, low-latency HLS)
  • Content ID: Copyright detection using audio/video fingerprinting
  • Recommendation System: Collaborative filtering, watch history, engagement signals
  • Thumbnail A/B Testing: Multiple thumbnails, CTR optimization
  • Video Quality Metrics: VMAF score, buffering ratio, startup time

Common Mistakes

MistakeWhy It's WrongBetter Approach
Single file uploadCan't resume, poor UXChunked upload with presigned URLs
Transcode on-demandHigh latency, unpredictablePre-transcode all resolutions
Single resolutionWastes bandwidth on slow connectionsAdaptive bitrate streaming
Store raw videosMassive storage costDelete after transcoding
Single CDNSingle point of failureMulti-CDN with failover
Ignore deduplicationWasted storage/computePerceptual hashing

⚠️ Interview Red Flags

  • Không mention chunked upload cho large files
  • Không biết HLS/DASH adaptive streaming
  • Không address CDN và edge caching
  • Thiết kế synchronous transcoding (blocks upload)
  • Không có video deduplication strategy
  • Underestimate bandwidth costs

🎓 Key Takeaways

  1. CDN là chi phí lớn nhất - 75% total cost, optimize relentlessly
  2. Chunked upload là must-have cho large file uploads
  3. Pre-transcode tất cả resolutions để instant playback
  4. HLS/DASH cho adaptive streaming - không có alternative
  5. Perceptual hashing cho video deduplication - save storage + compute
  6. Multi-codec strategy - balance compatibility vs compression
  7. Async processing - decouple upload từ transcoding

🔗 Navigation

Case StudyKey LearningLink
TwitterFan-out patterns, timeline cachingDesign Twitter →
UberReal-time location, geospatial indexingDesign Uber →
WhatsAppMessaging, E2EE, connection managementDesign WhatsApp →

Prerequisites

Advanced Topics