Skip to content

🚗 DESIGN UBER

Real-time Location Services & Driver Matching

🎓 Giáo sư Tom

Uber là bài toán kinh điển về Geospatial Indexing - làm sao để tìm tài xế gần nhất trong hàng triệu tài xế đang di chuyển liên tục? Đây là nơi bạn học được tại sao SQL queries thông thường không hoạt động và cần các cấu trúc dữ liệu đặc biệt như Quadtree hay S2 Geometry.

📊 Back-of-Envelope Calculations

Scale Assumptions

MetricValueRationale
Monthly Active Users (MAU)100MGlobal ride-sharing platform
Daily Active Users (DAU)30M~30% of MAU
Active Drivers (peak)2MSupply side at peak hours
Trips per day15MAverage across all markets
Location updates per driver4/secondGPS ping frequency
Average trip duration15 minutesCity rides
Average trip distance8 kmUrban environment

Location Update Calculations

Driver Location Updates:
Active drivers at peak = 2,000,000
Updates per driver per second = 4

Location Update QPS = 2M × 4 = 8,000,000 QPS (8M QPS!)

This is MASSIVE - one of the highest write loads in tech.

Location data per update = 50 bytes (lat, lng, timestamp, driver_id, status)
Bandwidth for location updates = 8M × 50 bytes = 400 MB/s = 3.2 Gbps

🔧 Raizo's Note

8 triệu writes/second là con số khủng khiếp. Không database nào handle được trực tiếp. Đây là lý do Uber dùng in-memory data structures (Redis Geo) và chỉ persist aggregated data.

Ride Request Calculations

Ride Requests:
Daily trips = 15,000,000
Peak hours = 4 hours (morning + evening rush)
Peak trip percentage = 40% of daily trips in peak hours

Peak trips = 15M × 0.4 = 6M trips in 4 hours
Peak ride request QPS = 6M / (4 × 3600) = ~420 QPS

With retries and failed matches: ~1,000 QPS peak

Matching Calculations

For each ride request:
1. Find nearby drivers (query geospatial index)
2. Calculate ETA for top 10 candidates
3. Send request to best driver
4. Wait for acceptance (timeout 15s)
5. If rejected, try next driver

Average drivers queried per request = 10
ETA calculations per request = 10
Maps API calls per request = 10

Peak Maps API calls = 1,000 QPS × 10 = 10,000 QPS

Storage Calculations

Trip Data:
Daily trips = 15M
Trip record size = 2 KB (pickup, dropoff, route, fare, timestamps)
Daily trip storage = 15M × 2 KB = 30 GB/day
Monthly trip storage = 30 GB × 30 = 900 GB/month
Yearly trip storage = 900 GB × 12 = ~11 TB/year

Driver Location History (for analytics):
Sampled at 1/minute (not every 4/sec)
2M drivers × 60 samples/hour × 24 hours = 2.88B records/day
Record size = 30 bytes
Daily location history = 2.88B × 30 = 86 GB/day

Real-time Location Index (Redis):
2M drivers × 50 bytes = 100 MB (fits in memory easily!)
Calculation Breakdown
  • Peak hours: 7-9 AM và 5-7 PM là rush hours ở hầu hết cities
  • 4 updates/second: Balance giữa accuracy và bandwidth
  • Trip record 2KB: Includes polyline route, fare breakdown, ratings
  • Location sampling: Real-time dùng 4/sec, history chỉ cần 1/min

Bandwidth Calculations

Inbound (Location Updates):
8M QPS × 50 bytes = 400 MB/s = 3.2 Gbps

Outbound (Driver App):
- Ride requests: 1,000 QPS × 1 KB = 1 MB/s
- Navigation updates: 2M drivers × 1 update/5s × 500 bytes = 200 MB/s

Outbound (Rider App):
- Driver location during trip: 15M active trips / 15 min avg
  = 1M concurrent trips × 1 update/s × 100 bytes = 100 MB/s
- ETA updates: 100 MB/s

Total Outbound: ~400 MB/s = 3.2 Gbps

🏗️ High-Level Architecture

Component Responsibilities

ComponentResponsibilityTechnology
Location ServiceIngest 8M location updates/sec, update geo indexGo + Redis Geo
Matching EngineFind optimal driver for rider, handle dispatchGo microservice
Trip ServiceManage trip lifecycle (request → complete)Go + PostgreSQL
Pricing ServiceCalculate fares, surge pricingGo microservice
Redis GeoReal-time driver location indexRedis 6.0+ GEOADD
Maps ServiceETA calculation, routing, distanceGoogle Maps API
WebSocket ServersPush real-time updates to appsGo + gorilla/websocket
KafkaEvent streaming for analytics, notificationsApache Kafka

🔧 Raizo's Note

Tại sao dùng Go cho Location Service?

Go's goroutines handle millions of concurrent connections efficiently. Một server Go có thể handle 100K+ concurrent WebSocket connections với memory footprint nhỏ. Java/Node.js sẽ cần nhiều servers hơn đáng kể.

🔄 Core Flows

Flow 1: Driver Location Update

Flow 2: Ride Request & Driver Matching

Flow 3: Trip Lifecycle

🎓 Giáo sư Tom

State Machine là critical cho Trip Service. Mỗi trip đi qua các states: REQUESTED → ACCEPTED → DRIVER_ARRIVED → IN_PROGRESS → COMPLETED

Mỗi transition phải được validate. Ví dụ: không thể COMPLETE trip nếu chưa START.

💡 Deep Dive: Geospatial Indexing

The Core Problem

Khi rider request ride tại vị trí (10.8231, 106.6297), làm sao tìm tài xế gần nhất trong 2 triệu tài xế đang online?

Why SQL BETWEEN Queries Don't Work

sql
-- Naive approach: Find drivers within a bounding box
SELECT driver_id, lat, lng
FROM driver_locations
WHERE lat BETWEEN 10.82 AND 10.84
  AND lng BETWEEN 106.62 AND 106.64;

⚠️ Problems with SQL Approach

  1. Full table scan: Với 2M rows, query mất 2-5 seconds
  2. Index không hiệu quả: B-tree index trên (lat, lng) chỉ dùng được 1 column
  3. Bounding box không chính xác: Corners của box xa hơn edges
  4. Không scale: 8M updates/sec sẽ lock database liên tục
┌─────────────────────────────────────────────────────────────┐
│           WHY BOUNDING BOX IS INACCURATE                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│     ┌─────────────────────────────┐                        │
│     │    ×         ×         ×    │  × = Drivers in box    │
│     │         ┌─────────┐         │  ● = Rider location    │
│     │    ×    │    ●    │    ×    │  ○ = Actual 3km radius │
│     │         │  ○○○○○  │         │                        │
│     │    ×    │ ○     ○ │    ×    │  Drivers at corners    │
│     │         │○   ●   ○│         │  are ~4.2km away!      │
│     │    ×    │ ○     ○ │    ×    │  (3km × √2 = 4.24km)   │
│     │         │  ○○○○○  │         │                        │
│     │    ×    └─────────┘    ×    │                        │
│     │                             │                        │
│     └─────────────────────────────┘                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Solution 1: Quadtree

Quadtree chia không gian 2D thành 4 quadrants recursively.

┌─────────────────────────────────────────────────────────────┐
│                      QUADTREE                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Level 0 (Root):                                            │
│  ┌─────────────────────────────────┐                       │
│  │               │                 │                       │
│  │      NW       │       NE        │                       │
│  │               │                 │                       │
│  ├───────────────┼─────────────────┤                       │
│  │               │                 │                       │
│  │      SW       │       SE        │                       │
│  │               │                 │                       │
│  └─────────────────────────────────┘                       │
│                                                             │
│  Level 2 (Subdivided):                                      │
│  ┌────────┬────────┬────────┬────────┐                     │
│  │   │    │   │    │   │    │   │    │                     │
│  ├───┼────┼───┼────┼───┼────┼───┼────┤                     │
│  │   │ 🚗 │   │    │   │ 🚗 │   │    │                     │
│  ├────────┼────────┼────────┼────────┤                     │
│  │   │    │   │ 📍 │   │    │   │ 🚗 │  📍 = Rider        │
│  ├───┼────┼───┼────┼───┼────┼───┼────┤  🚗 = Drivers      │
│  │ 🚗│    │   │    │   │    │   │    │                     │
│  └────────┴────────┴────────┴────────┘                     │
│                                                             │
│  Search: Start at root → traverse to rider's cell          │
│          → check adjacent cells → filter by distance       │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ O(log n) search❌ Unbalanced with clustered data
✅ Easy to implement❌ Rebalancing expensive
✅ Good for static data❌ Not great for moving objects

Solution 2: Geohash

Geohash encode lat/lng thành string, nearby locations share prefix.

┌─────────────────────────────────────────────────────────────┐
│                      GEOHASH                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Location: (10.8231, 106.6297) → Geohash: "w3gvk1"         │
│                                                             │
│  Precision levels:                                          │
│  ┌──────────┬─────────────┬──────────────────────┐         │
│  │ Length   │ Cell Size   │ Example              │         │
│  ├──────────┼─────────────┼──────────────────────┤         │
│  │ 1        │ 5000 km     │ w                    │         │
│  │ 2        │ 1250 km     │ w3                   │         │
│  │ 3        │ 156 km      │ w3g                  │         │
│  │ 4        │ 39 km       │ w3gv                 │         │
│  │ 5        │ 4.9 km      │ w3gvk                │         │
│  │ 6        │ 1.2 km      │ w3gvk1               │  ← Good │
│  │ 7        │ 153 m       │ w3gvk1p              │         │
│  │ 8        │ 38 m        │ w3gvk1pq             │         │
│  └──────────┴─────────────┴──────────────────────┘         │
│                                                             │
│  Search: Find all drivers with geohash prefix "w3gvk"      │
│          + 8 adjacent cells (edge cases)                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ String prefix search (fast)❌ Edge problem (adjacent cells different prefix)
✅ Easy to index in DB❌ Fixed cell sizes
✅ Human readable❌ Cells not uniform near poles

Solution 3: S2 Geometry (Google's Choice)

S2 projects Earth onto a cube, then uses Hilbert curve to map 2D → 1D.

┌─────────────────────────────────────────────────────────────┐
│                    S2 GEOMETRY                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Earth → Cube → Hilbert Curve → 64-bit Cell ID             │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │    ┌─────┬─────┬─────┬─────┐                       │   │
│  │    │ L14 │ L14 │ L14 │ L14 │  Level 14: ~0.8 km²   │   │
│  │    │Cell1│Cell2│Cell3│Cell4│                       │   │
│  │    ├─────┼─────┼─────┼─────┤                       │   │
│  │    │  🚗 │     │ 🚗  │     │  Drivers in cells     │   │
│  │    │  🚗 │ 🚗  │     │ 🚗  │                       │   │
│  │    ├─────┼─────┼─────┼─────┤                       │   │
│  │    │     │ 📍  │     │     │  📍 = Rider location  │   │
│  │    │ 🚗  │     │ 🚗  │     │                       │   │
│  │    └─────┴─────┴─────┴─────┘                       │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Cell Hierarchy:                                            │
│  Level 0:  Earth face (~85M km²)                           │
│  Level 12: ~3.3 km² (city district)                        │
│  Level 14: ~0.8 km² (neighborhood) ← Uber uses this        │
│  Level 16: ~0.05 km² (city block)                          │
│  Level 30: ~1 cm² (maximum precision)                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why S2 is Superior:

FeatureGeohashS2
Cell shapeRectangle (varies)Nearly square (uniform)
Edge handling8 neighbors neededHierarchical covering
Variable precisionFixed levelsAny level 0-30
Covering algorithmManualBuilt-in GetCovering()
Used byMany appsGoogle Maps, Uber, Foursquare

Implementation: Redis Geo with S2

python
import redis
from s2sphere import CellId, LatLng

redis_client = redis.Redis()

def update_driver_location(driver_id: str, lat: float, lng: float):
    """Update driver location in Redis Geo index"""
    # Redis GEOADD uses geohash internally, but we also store S2 cell
    redis_client.geoadd("drivers:active", (lng, lat, driver_id))
    
    # Store S2 cell for advanced queries
    cell_id = CellId.from_lat_lng(LatLng.from_degrees(lat, lng))
    cell_token = cell_id.parent(14).to_token()  # Level 14
    
    redis_client.hset(f"driver:{driver_id}", mapping={
        "lat": lat,
        "lng": lng,
        "s2_cell": cell_token,
        "last_update": int(time.time()),
        "status": "available"
    })
    
    # Add to cell-based index for fast lookup
    redis_client.sadd(f"cell:{cell_token}:drivers", driver_id)
    
    # Set TTL for stale location cleanup
    redis_client.expire(f"driver:{driver_id}", 30)

def find_nearby_drivers(lat: float, lng: float, radius_km: float = 3) -> list:
    """Find drivers within radius using Redis GEORADIUS"""
    # Primary: Use Redis Geo (geohash-based)
    nearby = redis_client.georadius(
        "drivers:active",
        lng, lat,
        radius_km, unit="km",
        withdist=True,
        count=20,
        sort="ASC"
    )
    
    # Filter by availability
    available_drivers = []
    for driver_id, distance in nearby:
        driver_data = redis_client.hgetall(f"driver:{driver_id}")
        if driver_data.get("status") == "available":
            available_drivers.append({
                "driver_id": driver_id,
                "distance_km": distance,
                "lat": float(driver_data["lat"]),
                "lng": float(driver_data["lng"])
            })
    
    return available_drivers

🔧 Raizo's Note

Redis GEORADIUS internally dùng geohash, không phải S2. Nhưng với radius 3-5km, accuracy đủ tốt. Uber dùng custom S2 implementation cho advanced features như:

  • Variable cell sizes based on driver density
  • Efficient range queries across cell boundaries
  • Hierarchical caching

🎯 Deep Dive: Driver Matching Logic

The Matching Problem

Khi có 10 drivers gần rider, chọn driver nào? Đây không chỉ là "driver gần nhất" mà còn phải optimize cho:

  • Rider experience: ETA thấp, driver rating cao
  • Driver fairness: Phân phối rides đều
  • Platform efficiency: Minimize empty miles

Strategy 1: Greedy Matching (Simple)

┌─────────────────────────────────────────────────────────────┐
│                   GREEDY MATCHING                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Algorithm:                                                 │
│  1. Find all available drivers within 3km                  │
│  2. Calculate ETA for each driver                          │
│  3. Pick driver with lowest ETA                            │
│  4. Send request, wait for acceptance                      │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │    🚗 D1 (2.1km, ETA 4min)                         │   │
│  │         \                                          │   │
│  │          \                                         │   │
│  │    🚗 D2 (1.5km, ETA 3min) ←── SELECTED           │   │
│  │            \                                       │   │
│  │             📍 Rider                               │   │
│  │            /                                       │   │
│  │    🚗 D3 (1.8km, ETA 5min)                        │   │
│  │         /                                          │   │
│  │    🚗 D4 (2.5km, ETA 6min)                        │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ Simple, fast O(n)❌ Locally optimal, not globally
✅ Low latency❌ Can cause driver starvation
✅ Easy to understand❌ Doesn't consider future requests

Strategy 2: Global Optimization (Batch Matching)

┌─────────────────────────────────────────────────────────────┐
│               GLOBAL OPTIMIZATION                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Batch requests every 2 seconds, solve assignment problem  │
│                                                             │
│  Riders:  R1    R2    R3    R4                             │
│            │     │     │     │                              │
│            ▼     ▼     ▼     ▼                              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         HUNGARIAN ALGORITHM                          │   │
│  │         (Bipartite Matching)                         │   │
│  │                                                      │   │
│  │  Cost Matrix (ETA in minutes):                      │   │
│  │         D1    D2    D3    D4                        │   │
│  │    R1 [ 4     3     5     6  ]                      │   │
│  │    R2 [ 7     2     4     3  ]                      │   │
│  │    R3 [ 3     5     2     4  ]                      │   │
│  │    R4 [ 5     4     6     2  ]                      │   │
│  │                                                      │   │
│  │  Optimal: R1→D2, R2→D4, R3→D3, R4→D1               │   │
│  │  Total ETA: 3+3+2+5 = 13 minutes                   │   │
│  │                                                      │   │
│  │  vs Greedy: R1→D2, R2→D2(taken)→D4, ...            │   │
│  │  Greedy Total: 3+3+4+6 = 16 minutes                │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ Globally optimal❌ Higher latency (batching delay)
✅ Better overall efficiency❌ Complex implementation
✅ Fairer distribution❌ O(n³) Hungarian algorithm

Uber's Hybrid Approach

python
def match_driver(rider_location, ride_type, urgency="normal"):
    """
    Hybrid matching: Greedy with local optimization window
    """
    # 1. Get candidate drivers
    candidates = find_nearby_drivers(
        rider_location.lat, 
        rider_location.lng,
        radius_km=3
    )
    
    if not candidates:
        # Expand search radius
        candidates = find_nearby_drivers(
            rider_location.lat,
            rider_location.lng, 
            radius_km=5
        )
    
    # 2. Filter by ride type and availability
    eligible = []
    for driver in candidates:
        if not is_driver_eligible(driver, ride_type):
            continue
        if is_driver_locked(driver.id):
            continue
        eligible.append(driver)
    
    if not eligible:
        return None
    
    # 3. Calculate scores for each driver
    scored_drivers = []
    for driver in eligible[:10]:  # Limit ETA calculations
        eta = get_eta_from_maps(driver.location, rider_location)
        
        score = calculate_match_score(
            eta=eta,
            driver_rating=driver.rating,
            acceptance_rate=driver.acceptance_rate,
            trips_today=driver.trips_today,
            time_since_last_trip=driver.idle_time
        )
        
        scored_drivers.append((driver, score, eta))
    
    # 4. Sort by score (higher is better)
    scored_drivers.sort(key=lambda x: x[1], reverse=True)
    
    # 5. Try to lock and dispatch
    for driver, score, eta in scored_drivers:
        if try_lock_driver(driver.id, timeout_seconds=15):
            return dispatch_ride(driver, rider_location, eta)
    
    return None  # No driver available

def calculate_match_score(eta, driver_rating, acceptance_rate, 
                          trips_today, idle_time):
    """
    Multi-factor scoring function
    
    Weights tuned based on business priorities:
    - ETA: Primary factor (rider experience)
    - Rating: Quality signal
    - Acceptance rate: Reliability
    - Trips today: Fairness (give rides to drivers with fewer trips)
    - Idle time: Fairness (prioritize waiting drivers)
    """
    # Normalize factors to 0-1 scale
    eta_score = max(0, 1 - eta / 15)  # 15 min max
    rating_score = (driver_rating - 4.0) / 1.0  # 4.0-5.0 range
    acceptance_score = acceptance_rate
    fairness_score = max(0, 1 - trips_today / 20)  # 20 trips/day target
    idle_score = min(1, idle_time / 600)  # 10 min idle = max score
    
    # Weighted combination
    score = (
        0.40 * eta_score +
        0.20 * rating_score +
        0.15 * acceptance_score +
        0.15 * fairness_score +
        0.10 * idle_score
    )
    
    return score

Driver Locking Mechanism

┌─────────────────────────────────────────────────────────────┐
│                 DRIVER LOCKING                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Problem: Two riders request at same time, both get same   │
│           driver → one rider left hanging                  │
│                                                             │
│  Solution: Distributed lock with Redis                     │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  Rider A requests → Lock Driver 1 (15s TTL)        │   │
│  │                     SET driver:1:lock ride_A EX 15 │   │
│  │                                                     │   │
│  │  Rider B requests → Try lock Driver 1              │   │
│  │                     SETNX fails (already locked)   │   │
│  │                     → Try Driver 2 instead         │   │
│  │                                                     │   │
│  │  Driver 1 accepts → Extend lock, start trip        │   │
│  │  Driver 1 rejects → Release lock (DEL)             │   │
│  │  Timeout (15s)    → Lock auto-expires              │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
python
def try_lock_driver(driver_id: str, timeout_seconds: int = 15) -> bool:
    """
    Attempt to acquire exclusive lock on driver
    Returns True if lock acquired, False if driver already locked
    """
    lock_key = f"driver:{driver_id}:lock"
    lock_value = str(uuid.uuid4())  # Unique lock identifier
    
    # SETNX with expiry - atomic operation
    acquired = redis_client.set(
        lock_key, 
        lock_value,
        nx=True,  # Only set if not exists
        ex=timeout_seconds
    )
    
    if acquired:
        # Store lock value for later release verification
        return True
    return False

def release_driver_lock(driver_id: str):
    """Release driver lock"""
    lock_key = f"driver:{driver_id}:lock"
    redis_client.delete(lock_key)

ETA Calculation

┌─────────────────────────────────────────────────────────────┐
│                 ETA CALCULATION                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Naive: Straight-line distance / average speed             │
│         ETA = 2km / 30km/h = 4 minutes                     │
│         ❌ Ignores roads, traffic, one-way streets         │
│                                                             │
│  Better: Google Maps Directions API                        │
│         ✅ Real road network                               │
│         ✅ Real-time traffic                               │
│         ✅ Turn-by-turn routing                            │
│         ❌ Expensive at scale ($5/1000 requests)           │
│                                                             │
│  Uber's Approach: Hybrid                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  1. Pre-compute base ETAs between S2 cells         │   │
│  │     (offline, updated hourly)                      │   │
│  │                                                     │   │
│  │  2. Apply real-time traffic multiplier             │   │
│  │     (from live driver data)                        │   │
│  │                                                     │   │
│  │  3. Use Maps API only for final routing            │   │
│  │     (after driver accepts)                         │   │
│  │                                                     │   │
│  │  ETA = base_eta[cell_A][cell_B] × traffic_factor   │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🎓 Giáo sư Tom

ETA accuracy là critical metric. Uber tracks:

  • ETA accuracy: Predicted vs actual arrival time
  • ETA variance: Consistency of predictions
  • User trust: Users stop using app if ETAs are unreliable

Target: 90% of ETAs within ±2 minutes of actual.

🗺️ Build vs Buy: Maps Service

The Maps Dilemma

Maps là critical dependency cho Uber. Mỗi ride cần:

  • Geocoding: Address → Coordinates
  • Reverse Geocoding: Coordinates → Address
  • Routing: Point A → Point B directions
  • ETA: Estimated time of arrival
  • Distance: Actual road distance (not straight-line)

Option 1: Google Maps API

┌─────────────────────────────────────────────────────────────┐
│                 GOOGLE MAPS API                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Pricing (as of 2024):                                     │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ API                    │ Price per 1000 requests    │   │
│  ├────────────────────────┼────────────────────────────┤   │
│  │ Directions             │ $5.00                      │   │
│  │ Distance Matrix        │ $5.00                      │   │
│  │ Geocoding              │ $5.00                      │   │
│  │ Places                 │ $17.00                     │   │
│  │ Roads (Snap to Roads)  │ $10.00                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Uber's Usage (estimated):                                 │
│  - 15M trips/day × 10 API calls/trip = 150M calls/day     │
│  - Monthly: 150M × 30 = 4.5B calls                        │
│  - Cost: 4.5B / 1000 × $5 = $22.5M/month 😱               │
│                                                             │
│  With volume discounts: ~$500K/month (still huge!)        │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ Best-in-class accuracy❌ Extremely expensive at scale
✅ Global coverage❌ Vendor lock-in
✅ Real-time traffic❌ Rate limits
✅ No maintenance❌ No customization
✅ Continuous updates❌ Dependency risk

Option 2: Self-Hosted OpenStreetMap (OSM)

┌─────────────────────────────────────────────────────────────┐
│              SELF-HOSTED OSM STACK                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Components:                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  ┌─────────────┐    ┌─────────────┐               │   │
│  │  │ Nominatim   │    │   OSRM      │               │   │
│  │  │ (Geocoding) │    │ (Routing)   │               │   │
│  │  └─────────────┘    └─────────────┘               │   │
│  │         │                  │                       │   │
│  │         └────────┬─────────┘                       │   │
│  │                  ▼                                 │   │
│  │         ┌─────────────┐                           │   │
│  │         │ PostgreSQL  │                           │   │
│  │         │ + PostGIS   │                           │   │
│  │         │ (OSM Data)  │                           │   │
│  │         └─────────────┘                           │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Infrastructure Cost:                                      │
│  - 50 routing servers: $15,000/month                      │
│  - 10 geocoding servers: $3,000/month                     │
│  - Database cluster: $5,000/month                         │
│  - Engineering team (5 people): $100,000/month            │
│  - Total: ~$125,000/month                                 │
│                                                             │
│  vs Google Maps: $500,000/month                           │
│  Savings: $375,000/month = $4.5M/year                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘
ProsCons
✅ Much cheaper at scale❌ No real-time traffic
✅ Full control❌ Data quality varies by region
✅ No rate limits❌ Requires dedicated team
✅ Customizable❌ Map updates lag behind
✅ No vendor lock-in❌ Complex infrastructure

Uber's Actual Approach: Hybrid

┌─────────────────────────────────────────────────────────────┐
│              UBER'S HYBRID APPROACH                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  Use Case              │ Solution                  │   │
│  │  ──────────────────────┼─────────────────────────  │   │
│  │  Driver matching ETA   │ Pre-computed + OSM        │   │
│  │  (high volume, approx) │                           │   │
│  │                        │                           │   │
│  │  Turn-by-turn nav      │ Google Maps API           │   │
│  │  (accuracy critical)   │                           │   │
│  │                        │                           │   │
│  │  Fare calculation      │ OSM (distance) +          │   │
│  │                        │ Internal (time)           │   │
│  │                        │                           │   │
│  │  Geocoding             │ Google (primary) +        │   │
│  │                        │ OSM (fallback)            │   │
│  │                        │                           │   │
│  │  Traffic data          │ Internal (driver GPS) +   │   │
│  │                        │ Google (supplement)       │   │
│  │                        │                           │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Key Insight: Uber has MILLIONS of drivers sending GPS    │
│  data every second. This is better traffic data than      │
│  Google has in many markets!                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🔧 Raizo's Note

Uber's secret weapon: Real-time traffic from driver fleet.

Với 2M active drivers gửi location 4 lần/giây, Uber có traffic data density mà Google không có. Họ dùng data này để:

  1. Detect traffic jams trước Google
  2. Find shortcuts drivers actually use
  3. Improve ETA accuracy in their markets

Đây là lý do Uber đầu tư heavily vào internal maps team.

Decision Framework

┌─────────────────────────────────────────────────────────────┐
│           MAPS BUILD VS BUY DECISION                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Choose Google Maps API if:                                │
│  □ < 1M API calls/month (cost manageable)                  │
│  □ Global coverage needed immediately                      │
│  □ No engineering bandwidth for maps                       │
│  □ Real-time traffic is critical                          │
│  □ Startup phase (focus on core product)                  │
│                                                             │
│  Choose Self-hosted OSM if:                                │
│  □ > 100M API calls/month (cost savings significant)       │
│  □ Operating in limited geographies                        │
│  □ Have dedicated maps engineering team                    │
│  □ Need customization (e.g., bike lanes, truck routes)    │
│  □ Have alternative traffic data source                   │
│                                                             │
│  Choose Hybrid if:                                         │
│  □ High volume but need accuracy for some use cases       │
│  □ Want to reduce dependency gradually                    │
│  □ Have some engineering capacity                         │
│  □ Can tolerate complexity                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🎓 Giáo sư Tom

Interview tip: Khi được hỏi về Maps, đừng chỉ nói "dùng Google Maps API". Hãy discuss:

  1. Cost implications at scale
  2. Build vs buy trade-offs
  3. How to reduce dependency
  4. Alternative data sources (driver fleet)

Điều này shows bạn think about long-term architecture, không chỉ MVP.

⚖️ Trade-offs Analysis

Architecture Decision Matrix

DecisionOption AOption BChosenRationale
Location IndexPostGIS (SQL)Redis GeoRedis GeoIn-memory, handles 8M writes/sec
Geospatial AlgorithmGeohashS2 GeometryS2Variable cell sizes, better covering
Matching StrategyGreedy (first available)Global optimizationHybridBalance latency vs efficiency
Map DataGoogle Maps APISelf-hosted OSMHybridCost vs accuracy trade-off
ETA CalculationReal-time routingPre-computedHybridPre-computed base + real-time adjustment
Driver StatePostgreSQLRedisRedisFast reads, TTL for stale data
Trip StorageNoSQLPostgreSQLPostgreSQLACID for financial data
Real-time UpdatesHTTP PollingWebSocketWebSocketLower latency, less bandwidth

Location Index: PostGIS vs Redis Geo

┌─────────────────────────────────────────────────────────────┐
│           POSTGIS vs REDIS GEO                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  PostGIS:                                                   │
│  + Rich geospatial queries (polygons, intersections)       │
│  + Persistent storage                                      │
│  + Complex spatial joins                                   │
│  - Disk I/O bottleneck                                     │
│  - ~1000 writes/sec max                                    │
│  - Query latency 10-100ms                                  │
│                                                             │
│  Redis Geo:                                                 │
│  + In-memory (microsecond latency)                         │
│  + 100K+ writes/sec per node                               │
│  + Simple GEORADIUS queries                                │
│  - Limited query types                                     │
│  - Memory-bound                                            │
│  - No persistence (need backup strategy)                   │
│                                                             │
│  Verdict: Redis Geo for real-time, PostGIS for analytics  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🚨 Failure Scenarios & Mitigations

Scenario 1: Location Service Lag

AspectDetails
ImpactStale driver locations → bad matches, wrong ETAs
CauseNetwork issues, Redis overload, driver app bugs
DetectionMonitor last_update timestamp, alert if > 30s
MitigationLocation TTL (auto-expire after 30s)
FallbackUse last known location + velocity vector prediction
python
def get_driver_location(driver_id: str) -> Optional[Location]:
    driver_data = redis_client.hgetall(f"driver:{driver_id}")
    
    if not driver_data:
        return None
    
    last_update = int(driver_data.get("last_update", 0))
    age_seconds = time.time() - last_update
    
    if age_seconds > 30:
        # Location too stale, don't use
        logger.warning(f"Stale location for driver {driver_id}: {age_seconds}s old")
        return None
    
    if age_seconds > 10:
        # Predict location based on velocity
        return predict_location(driver_data, age_seconds)
    
    return Location(
        lat=float(driver_data["lat"]),
        lng=float(driver_data["lng"])
    )

🔧 Raizo's Note

Location prediction dùng simple physics:

new_lat = old_lat + (speed × cos(heading) × time) / 111000
new_lng = old_lng + (speed × sin(heading) × time) / (111000 × cos(lat))

111000 = meters per degree latitude. Không perfect nhưng better than stale data.

Scenario 2: Matching Service Overload (Concert Ends)

AspectDetails
ImpactLong wait times, failed matches, angry users
Cause50,000 people request rides simultaneously
DetectionQueue depth monitoring, latency spikes
MitigationQueue-based matching with priority
SolutionSurge pricing to balance supply/demand
┌─────────────────────────────────────────────────────────────┐
│           HANDLING DEMAND SPIKES                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Normal: 1,000 requests/sec → Direct matching              │
│                                                             │
│  Spike: 10,000 requests/sec → Queue + Surge                │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  1. Detect spike (requests > 3x normal)            │   │
│  │                                                     │   │
│  │  2. Enable surge pricing                           │   │
│  │     - Reduces demand (price-sensitive users wait)  │   │
│  │     - Increases supply (drivers come online)       │   │
│  │                                                     │   │
│  │  3. Queue requests with estimated wait time        │   │
│  │     "Your ride will be ready in ~8 minutes"       │   │
│  │                                                     │   │
│  │  4. Pre-position drivers for known events          │   │
│  │     (concerts, sports games, conferences)          │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Scenario 3: Maps API Failure

AspectDetails
ImpactNo ETA calculation, no turn-by-turn navigation
CauseGoogle Maps outage, rate limiting, network issues
DetectionAPI error rates, latency monitoring
MitigationCache recent routes, multiple providers
FallbackStraight-line distance × 1.4 factor
python
def get_eta(origin: Location, destination: Location) -> int:
    """Get ETA in seconds with fallback chain"""
    
    # Try primary: Google Maps
    try:
        eta = google_maps_client.get_eta(origin, destination)
        cache_route(origin, destination, eta)
        return eta
    except GoogleMapsError as e:
        logger.warning(f"Google Maps failed: {e}")
    
    # Try cache
    cached_eta = get_cached_eta(origin, destination)
    if cached_eta:
        return cached_eta
    
    # Try secondary: OSM/OSRM
    try:
        eta = osrm_client.get_eta(origin, destination)
        return eta
    except OSRMError as e:
        logger.warning(f"OSRM failed: {e}")
    
    # Last resort: straight-line estimate
    distance_km = haversine_distance(origin, destination)
    avg_speed_kmh = 25  # Urban average
    eta_hours = distance_km / avg_speed_kmh
    eta_seconds = int(eta_hours * 3600 * 1.4)  # 1.4x factor for roads
    
    logger.warning(f"Using fallback ETA: {eta_seconds}s")
    return eta_seconds

Scenario 4: Redis Cluster Failure

AspectDetails
ImpactCannot find nearby drivers, matching fails
CauseNode failure, network partition, memory exhaustion
DetectionRedis health checks, connection errors
MitigationRedis Cluster with replicas (3 replicas per shard)
FailoverAutomatic promotion of replica (< 10 seconds)

🔧 Raizo's Note

Redis Cluster gotchas:

  1. GEORADIUS không work across slots - cần ensure all geo data in same slot
  2. Use hash tags: drivers:{city}:active → all drivers in same city go to same slot
  3. Monitor memory: Redis OOM = instant outage
  4. Set maxmemory-policy volatile-ttl để auto-evict stale locations

Scenario 5: Payment Gateway Failure

AspectDetails
ImpactCannot complete trips, drivers not paid
CauseStripe/Braintree outage, network issues
DetectionPayment error rates, timeout monitoring
MitigationMultiple payment providers, retry queue
FallbackComplete trip, charge later (trust-based)
┌─────────────────────────────────────────────────────────────┐
│           PAYMENT FAILURE HANDLING                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Trip completes → Payment fails                            │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                                                     │   │
│  │  1. Mark trip as COMPLETED (don't block driver)    │   │
│  │                                                     │   │
│  │  2. Queue payment for retry                        │   │
│  │     - Retry 3x with exponential backoff            │   │
│  │     - Try alternate payment method                 │   │
│  │                                                     │   │
│  │  3. If all retries fail:                          │   │
│  │     - Notify rider to update payment              │   │
│  │     - Block future rides until resolved           │   │
│  │     - Pay driver from Uber's float                │   │
│  │                                                     │   │
│  │  4. Reconcile later (batch job)                   │   │
│  │                                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Key: Never block trip completion due to payment          │
│       Driver experience > payment timing                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

💰 Cost Estimation

Monthly Infrastructure Costs (100M MAU Scale)

ServiceSpecificationUnit CostMonthly Cost
Location Service100 × c5.4xlarge (16 vCPU, 32GB)$0.68/hr$49,000
Matching Engine50 × c5.2xlarge (8 vCPU, 16GB)$0.34/hr$12,200
Trip Service30 × c5.xlarge (4 vCPU, 8GB)$0.17/hr$3,700
WebSocket Servers100 × c5.xlarge$0.17/hr$12,200
Redis Geo Cluster200GB, 6 nodes (3 primary + 3 replica)$0.068/GB/hr$8,000
Redis Cache100GB for sessions, locks$0.068/GB/hr$5,000
Trip DatabasePostgreSQL 10TB (Multi-AZ)$0.115/GB/mo$5,000
Analytics DBClickHouse 50TB$0.10/GB/mo$5,000
Kafka Cluster10 brokers (m5.2xlarge)$0.384/hr$12,000
Maps API1B requests/month (with discounts)$0.50/1000$500,000
Load Balancers4 × ALB (high throughput)$0.025/hr + LCU$5,000
Data Transfer500TB egress$0.05/GB$25,000

Cost Summary

CategoryMonthly Cost% of Total
Compute (Services)$77,10012%
Caching (Redis)$13,0002%
Database (PostgreSQL + ClickHouse)$10,0002%
Message Queue (Kafka)$12,0002%
Maps API$500,00078%
Networking$30,0005%
Total~$642,000100%

⚠️ Maps API Dominates Cost!

78% của infrastructure cost là Maps API. Đây là lý do Uber đầu tư heavily vào:

  1. Internal maps team
  2. Pre-computed ETAs
  3. Driver fleet data for traffic
  4. OSM-based routing for non-critical paths

Với hybrid approach, có thể giảm Maps cost xuống ~$100K/month.

Cost Optimization Strategies

┌─────────────────────────────────────────────────────────────┐
│           COST OPTIMIZATION                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Maps API Reduction:                                    │
│     - Pre-compute cell-to-cell ETAs (offline)             │
│     - Use driver GPS for traffic instead of API           │
│     - Cache routes for popular corridors                  │
│     - Potential savings: $400K/month                      │
│                                                             │
│  2. Compute Optimization:                                  │
│     - Reserved Instances (1-year): 30% savings            │
│     - Spot Instances for analytics: 70% savings           │
│     - Right-sizing based on actual usage                  │
│     - Potential savings: $25K/month                       │
│                                                             │
│  3. Data Transfer:                                         │
│     - Keep services in same AZ when possible              │
│     - Compress WebSocket messages                         │
│     - Use regional endpoints                              │
│     - Potential savings: $10K/month                       │
│                                                             │
│  Optimized Total: ~$200K/month (vs $642K baseline)        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Cost per Trip Metrics

Monthly cost: $642,000 (baseline) or $200,000 (optimized)
Monthly trips: 15M × 30 = 450M trips

Baseline cost per trip = $642,000 / 450M = $0.0014
Optimized cost per trip = $200,000 / 450M = $0.0004

Average fare: $15
Platform fee (25%): $3.75
Infrastructure cost: $0.0014

Gross margin per trip: $3.75 - $0.0014 = $3.7486 (99.96% margin!)

🎓 Giáo sư Tom

Unit economics của ride-sharing rất healthy về infrastructure. Chi phí chính là:

  1. Driver incentives (subsidies, bonuses)
  2. Customer acquisition (marketing, promos)
  3. Insurance & compliance

Infrastructure chỉ chiếm < 1% of revenue. Đây là lý do ride-sharing có thể scale globally.

🎯 Interview Checklist

Must-Mention Items

TopicKey Points
Scale Estimation100M MAU, 8M location updates/sec, 15M trips/day
Geospatial IndexingWhy SQL doesn't work, Quadtree/Geohash/S2 comparison
S2 GeometryCell hierarchy, Level 14 for neighborhood-level
Driver MatchingGreedy vs global optimization, scoring function
Driver LockingRedis SETNX, prevent double-dispatch
ETA CalculationPre-computed + real-time adjustment
Maps Trade-offGoogle API cost vs self-hosted OSM

Bonus Points 🌟

  • Surge Pricing: Dynamic pricing algorithm based on supply/demand ratio
  • Driver Pre-positioning: ML model to predict demand hotspots
  • Fraud Detection: Fake GPS, driver collusion, promo abuse
  • Multi-destination: Optimize route for multiple stops
  • Pool/Share Rides: Matching multiple riders going same direction
  • Driver Earnings: Fair distribution algorithm

Common Mistakes

MistakeWhy It's WrongBetter Approach
SQL for location queriesCan't handle 8M writes/secRedis Geo (in-memory)
Simple lat/lng rangeInaccurate, slowS2 Geometry cells
First-available matchingSuboptimal, unfairMulti-factor scoring
No driver lockingRace conditionsRedis distributed lock
Google Maps for everything$500K+/monthHybrid approach
Sync ETA calculationBlocks matchingPre-computed + cache

⚠️ Interview Red Flags

  • Không biết tại sao SQL BETWEEN không work cho geo queries
  • Không mention driver locking mechanism
  • Không discuss Maps API cost at scale
  • Single point of failure trong architecture
  • Không có fallback strategy cho external dependencies

🎓 Key Takeaways

  1. Geospatial indexing là core challenge - S2 Geometry là industry standard
  2. 8M writes/sec đòi hỏi in-memory solutions (Redis), không phải traditional DB
  3. Driver matching cần balance nhiều factors, không chỉ distance
  4. Distributed locking critical để prevent race conditions
  5. Maps API là biggest cost driver - hybrid approach saves millions
  6. Real-time updates via WebSocket, không phải polling
  7. Graceful degradation cho mọi external dependency

🔗 Navigation

Case StudyKey LearningLink
TwitterFan-out patterns, timeline cachingDesign Twitter →
YouTubeVideo processing, CDN, adaptive streamingDesign YouTube →
WhatsAppReal-time messaging, E2EE, connection managementDesign WhatsApp →

Prerequisites

Advanced Topics