Thiết kế Uber — Geospatial Matching và Dispatch ở quy mô toàn cầu

8 triệu location updates MỖI GIÂY. Đây là một trong những write loads lớn nhất trong toàn bộ ngành công nghệ — lớn hơn cả Twitter firehose, lớn hơn cả Facebook News Feed ingestion. Mỗi giây, hệ thống phải nuốt 400 MB dữ liệu tọa độ GPS từ hàng triệu tài xế đang di chuyển trên khắp hành tinh, xử lý chúng trong bộ nhớ, và trả lời câu hỏi cốt lõi: ai là tài xế phù hợp nhất cho chuyến đi này?

Với 100 triệu MAU, 15 triệu chuyến/ngày, 2 triệu tài xế peak — bài toán không chỉ là "tìm người gần nhất" mà là global optimization: phân bổ hàng triệu tài xế cho hàng triệu hành khách sao cho tổng thời gian chờ nhỏ nhất, giá cả phản ánh đúng cung-cầu theo thời gian thực.

Tại sao SELECT * FROM drivers WHERE lat BETWEEN x AND y thất bại? Vì scan O(n) trên hàng triệu rows cho MỖI request, 170 lần/giây → database sụp trong vài phút. Uber giải quyết bằng cách chia Trái Đất thành hàng tỷ cells, biến tìm kiếm không gian thành tra bảng hash — O(1) thay vì O(n).

Bức tranh tư duy

Hình dung hệ thống điều phối xe cứu thương quốc gia — tổng đài 115 phải xác định xe GẦN NHẤT và SẴN SÀNG NHẤT trong vài giây. Ở Uber, cả "bệnh nhân" lẫn "xe cứu thương" đều di chuyển liên tục — và có hàng triệu cái cùng lúc thay vì vài trăm.

Mental model cốt lõi:

Request → Find Nearby Cells → Query Drivers in Cells → Rank & Score → Dispatch → Track → Complete
   │              │                    │                      │            │        │         │
   ▼              ▼                    ▼                      ▼            ▼        ▼         ▼
 Rider         S2/Geohash          In-memory             ML Model      WebSocket  GPS     Payment
  App           Index               Redis Geo             Scoring       Push      Stream   Gateway

Key insight: Thế giới được chia thành các cells (S2 Geometry hoặc Geohash), không phải tìm kiếm bằng brute-force distance. Khi rider request ở cell X, hệ thống chỉ cần query drivers trong cell X và các cells lân cận — giảm search space từ 2 triệu xuống còn vài trăm.

🎓 Giáo sư Tom

Khi nào analogy này bị phá vỡ? Cứu thương hoạt động vài trăm xe/thành phố, cho phép tối ưu manually. Uber xử lý 8M updates/giây trên toàn cầu, multi-city concurrent, mỗi city có traffic patterns khác nhau — không thể dùng cùng dispatch logic cho Tokyo (metro-centric) và São Paulo (highway-centric).

Cốt lõi kỹ thuật

Geospatial Indexing — Chia nhỏ Trái Đất

Ba approach chính để index vị trí trên Trái Đất:

Approach	Cơ chế	Precision	Dynamic Updates	Production Usage
S2 Geometry	Chia Earth thành cells trên Hilbert curve	Configurable (30 levels)	Excellent — O(1) cell lookup	Uber, Google Maps
Geohash	Encode lat/lng thành string, prefix = area	Fixed tiers (1-12 chars)	Good — string prefix matching	Elasticsearch, Redis
Quadtree	Recursive spatial subdivision	Adaptive	Poor — rebalancing expensive	Legacy systems

S2 Geometry (Google) là lựa chọn của Uber — chiếu bề mặt Trái Đất lên 6 mặt hình lập phương, chia mỗi mặt thành cells theo Hilbert space-filling curve. Cells liền kề trên curve cũng liền kề trong không gian thực → range queries cực nhanh.

S2 Level	Diện tích Cell	Use Case
Level 12	~3.3 km²	Driver matching radius
Level 14	~0.8 km²	Surge pricing zones
Level 16	~0.05 km²	Pickup precision
Level 20	~0.5 m²	Exact GPS position

Cách matching hoạt động:

Rider request tại (lat, lng) → convert thành S2 Cell ID level 12
Tìm 8 adjacent cells (neighbors) của cell đó
Query tất cả drivers trong 9 cells (center + 8 neighbors)
Rank drivers theo scoring function
Nếu không đủ drivers → expand lên level 11 (area lớn hơn 4x)

python

def find_nearby_drivers(rider_lat: float, rider_lng: float, radius_km: float = 5.0) -> list[Driver]:
    cell_id = s2.cell_id_from_lat_lng(rider_lat, rider_lng, level=12)
    search_cells = [cell_id] + s2.get_neighbors(cell_id)

    candidate_drivers = []
    for cell in search_cells:
        candidate_drivers.extend(driver_index.get_drivers(cell))

    return [
        d for d in candidate_drivers
        if haversine(rider_lat, rider_lng, d.lat, d.lng) <= radius_km
        and d.status == DriverStatus.AVAILABLE
    ]

Location Service — 8 triệu writes/giây

Thành phần chịu tải nặng nhất. Mỗi tài xế gửi GPS update 4 lần/giây × 2 triệu tài xế peak → 8M writes/sec.

Kiến trúc write path:

Driver App → Load Balancer → Location Gateway (stateless)
    → Ring Buffer (in-memory, per-partition)
        → Batch Flush (every 500ms)
            → Redis Geo (hot data — last 30s)
            → Kafka (event stream)
                → Cassandra (cold storage, aggregated per minute)

Tại sao KHÔNG persist trực tiếp? 8M writes/sec × 50 bytes = 400 MB/s — vượt quá bất kỳ single database nào. Location data stale sau 3 giây → dùng Redis Geo (cluster 20 nodes, 400K writes/node), chỉ persist aggregated data (1 record/driver/minute) vào Cassandra cho analytics.

🔧 Raizo's Note

Redis Geo dùng sorted set với geohash score để lưu vị trí. Command GEOADD + GEORADIUS cho phép add và query trong O(log(N) + M) với N = total items, M = results. Với 100K drivers/Redis instance, query nearby trong vòng < 2ms. Nhưng Redis là single-threaded — bạn cần sharding theo city/region để scale horizontally.

Matching & Dispatch Engine — Bộ não của Uber

Matching không đơn giản là "chọn driver gần nhất" — Uber dùng multi-factor scoring:

python

def calculate_driver_score(driver: Driver, rider: RideRequest) -> float:
    """Score driver for ride request. Higher = better match. Weights tuned per city."""
    eta_seconds = maps_service.get_eta(driver.location, rider.pickup)
    eta_score = max(0, 1.0 - (eta_seconds / MAX_ETA_THRESHOLD))

    distance_km = haversine(driver.lat, driver.lng, rider.pickup_lat, rider.pickup_lng)
    distance_score = max(0, 1.0 - (distance_km / MAX_DISTANCE_KM))

    rating_score = (driver.rating - 4.0) / 1.0
    acceptance_score = driver.acceptance_rate
    type_bonus = 1.2 if driver.vehicle_type >= rider.requested_type else 0.8

    score = (
        0.40 * eta_score +
        0.15 * distance_score +
        0.20 * rating_score +
        0.15 * acceptance_score +
        0.10 * type_bonus
    )

    return score

Dispatch flow (state machine):

Dispatch timeout: Driver có 15 giây để accept/reject qua push notification. Timeout → tự động dispatch driver tiếp theo trong ranked list. Tối đa 3 lần retry (mỗi lần expand radius +2km) trước khi báo "No drivers available".

Surge Pricing — Cân bằng cung cầu

Surge pricing là price signal để điều phối supply, không phải để "tăng giá":

Surge Multiplier = f(demand, supply, zone, time)

    demand_rate = ride_requests_per_minute_in_zone
    supply_rate = available_drivers_in_zone
    ratio = demand_rate / supply_rate

    if ratio <= 1.0:   multiplier = 1.0                            # No surge
    elif ratio <= 2.0:  multiplier = 1.0 + (ratio - 1.0) * 0.5     # 1.0x → 1.5x
    elif ratio <= 4.0:  multiplier = 1.5 + (ratio - 2.0) * 0.75    # 1.5x → 3.0x
    else:               multiplier = min(ratio * 0.8, 8.0)          # Cap at 8.0x

Anti-gaming: Track device ID (không chỉ session) để chống reset zone. Detect cluster offline behavior để chống fake shortage. Surge zones cập nhật mỗi 2 phút với rolling average 5 phút.

Trip Management — State Machine

Trip lifecycle được quản lý bằng event-sourced state machine:

python

class TripStateMachine:
    VALID_TRANSITIONS = {
        TripState.REQUESTED:  [TripState.MATCHED, TripState.CANCELLED],
        TripState.MATCHED:    [TripState.DRIVER_ARRIVING, TripState.CANCELLED],
        TripState.DRIVER_ARRIVING: [TripState.ARRIVED, TripState.CANCELLED],
        TripState.ARRIVED:    [TripState.IN_TRIP, TripState.CANCELLED, TripState.NO_SHOW],
        TripState.IN_TRIP:    [TripState.COMPLETED, TripState.CANCELLED],
        TripState.COMPLETED:  [],  # Terminal state
        TripState.CANCELLED:  [],  # Terminal state
    }

    def transition(self, trip_id: str, new_state: TripState) -> Trip:
        trip = self.trip_store.get(trip_id)
        if trip is None:
            raise TripNotFoundError(trip_id)
        if new_state not in self.VALID_TRANSITIONS[trip.state]:
            raise InvalidTransitionError(f"{trip.state} → {new_state}")

        event = TripStateEvent(
            trip_id=trip_id, from_state=trip.state,
            to_state=new_state, timestamp=utc_now(),
        )
        self.event_store.append(event)       # Event sourcing
        trip.state = new_state
        trip.updated_at = event.timestamp
        self.trip_store.update(trip)          # Materialized view
        self.event_bus.publish(event)         # Downstream services
        return trip

Kiến trúc tổng thể

Thực chiến

Tình huống: Giờ cao điểm Sài Gòn — 500K ride requests trong 2 giờ

Bối cảnh: 6-8 PM, mưa lớn bất ngờ tại TP.HCM. Demand tăng 5x, 30% tài xế tắt app. Matching latency: 3s → 25s. Surge: 4.5x. Users hủy-retry liên tục tạo load giả.

Vấn đề: (1) Matching timeout cascade — retry tạo double load. (2) Empty cells ở ngoại thành. (3) Surge oscillation. (4) GPS drift do mưa lớn.

Giải pháp multi-layered:

Layer 1 — Dynamic Cell Resizing:

python

def find_drivers_adaptive(rider_location: Location, max_attempts: int = 4) -> list[Driver]:
    """Progressively expand search radius when local supply is empty."""
    base_level = 12
    for attempt in range(max_attempts):
        current_level = base_level - attempt  # 12→11→10: area ×4 mỗi level
        cell_id = s2.cell_id_from_lat_lng(rider_location.lat, rider_location.lng, level=current_level)
        candidates = query_drivers_in_cells([cell_id] + s2.get_neighbors(cell_id))
        if len(candidates) >= 3:
            return rank_and_filter(candidates, rider_location)
    return []

Layer 2 — Batch Matching (2-second windows):

Thay vì match từng request riêng lẻ (greedy), gom requests trong window 2 giây rồi giải assignment problem tối ưu:

python

def batch_match(requests: list[RideRequest], drivers: list[Driver]) -> list[Assignment]:
    """Solve optimal assignment using Hungarian algorithm.
    Minimizes total ETA instead of greedy closest-driver.
    """
    if not requests or not drivers:
        return []

    cost_matrix = np.full((len(requests), len(drivers)), fill_value=INF_COST)
    for i, req in enumerate(requests):
        for j, drv in enumerate(drivers):
            if drv.status == DriverStatus.AVAILABLE:
                eta = estimate_eta(drv.location, req.pickup)
                if eta <= MAX_ACCEPTABLE_ETA:
                    cost_matrix[i][j] = eta

    # Hungarian algorithm: O(n³) — n small per batch (~100-500)
    row_idx, col_idx = linear_sum_assignment(cost_matrix)
    return [
        Assignment(request=requests[i], driver=drivers[j])
        for i, j in zip(row_idx, col_idx) if cost_matrix[i][j] < INF_COST
    ]

Layer 3 — Surge Smoothing:

python

def calculate_smooth_surge(zone_id: str) -> float:
    """EMA surge smoothing — 5min window, updated every 30s."""
    current_ratio = get_demand_supply_ratio(zone_id)
    previous_surge = cache.get(f"surge:{zone_id}", default=1.0)

    # EMA with alpha=0.3 → 70% weight on previous value
    alpha = 0.3
    smoothed_ratio = alpha * current_ratio + (1 - alpha) * previous_surge
    multiplier = ratio_to_multiplier(smoothed_ratio)

    max_delta = 0.5  # Rate limit: max 0.5x change per cycle
    clamped = max(previous_surge - max_delta, min(previous_surge + max_delta, multiplier))

    cache.set(f"surge:{zone_id}", clamped, ttl=300)
    return round(clamped, 1)

Layer 4 — Demand Prediction & Pre-positioning:

python

def predict_demand_next_30min(city: str, current_time: datetime) -> dict[str, float]:
    """Predict demand per zone. Features: day_of_week, hour, weather, events."""
    features = build_feature_vector(city, current_time)
    predictions = demand_model.predict(features)
    for zone_id, mult in predictions.items():
        if mult > 2.0:
            notify_nearby_drivers(zone_id, message=f"Nhu cầu cao dự kiến tại {zone_name(zone_id)}")
    return predictions

Kết quả đo lường:

Metric	Trước tối ưu	Sau tối ưu	Cải thiện
Matching latency (P95)	25s	6s	76% giảm
Rider cancel rate	35%	12%	66% giảm
Driver utilization	45%	72%	60% tăng
Surge peak multiplier	4.5x	2.8x	38% giảm
Trips completed/hour	85K	140K	65% tăng

Sai lầm điển hình

❌ Sai lầm 1: Dùng SQL range query cho location matching

Vấn đề: Query trực tiếp trên bảng relational với WHERE clause cho lat/lng.

sql

-- SAI: Full table scan trên millions of rows
SELECT driver_id, lat, lng
FROM drivers
WHERE lat BETWEEN 10.75 AND 10.85
  AND lng BETWEEN 106.60 AND 106.70
  AND status = 'available'
ORDER BY ST_Distance(point, ST_MakePoint(10.80, 106.65))
LIMIT 10;

Tại sao sai: Dù có spatial index (R-tree), query mất 50-500ms trên 2M rows × 170 lần/giây → CPU saturate trong vài phút. Driver locations thay đổi 4 lần/giây — mỗi UPDATE invalidates index pages, gây write amplification.

python

# ĐÚNG: In-memory geospatial index với O(1) cell lookup
cell_id = s2.cell_from_latlng(rider_lat, rider_lng, level=12)
nearby_cells = s2.get_neighbors(cell_id)  # 8 adjacent cells
drivers = redis_geo.members_in_cells(nearby_cells)  # < 2ms

Production impact: Chuyển từ SQL sang Redis Geo giảm matching latency từ 200ms xuống 2ms — improvement 100x.

❌ Sai lầm 2: Persist mọi location update vào disk

Vấn đề: Ghi mỗi GPS update trực tiếp vào Cassandra/PostgreSQL.

python

# SAI: 8M writes/sec trực tiếp vào persistent store
async def handle_location_update(driver_id: str, lat: float, lng: float):
    await cassandra.execute(
        "INSERT INTO driver_locations (driver_id, lat, lng, ts) VALUES (?, ?, ?, ?)",
        (driver_id, lat, lng, now())
    )

Tại sao sai: 8M writes/sec × 50 bytes = 400 MB/s sustained write. Cần Cassandra 50 nodes, ~$500K/năm — và 99% data stale sau 3 giây, hoàn toàn lãng phí.

python

# ĐÚNG: In-memory buffer → batch aggregate → persist summary only
class LocationBuffer:
    def __init__(self, flush_interval_ms: int = 500):
        self.buffer: dict[str, Location] = {}
        self.flush_interval = flush_interval_ms

    def update(self, driver_id: str, location: Location) -> None:
        self.buffer[driver_id] = location  # O(1) overwrite — chỉ giữ latest
        redis_geo.update(driver_id, location.lat, location.lng)

    async def flush(self) -> None:
        # Mỗi 500ms, batch persist thay vì 4M individual writes
        await cassandra.batch_insert("driver_location_snapshots", list(self.buffer.items()))

Production impact: Giảm write load từ 8M/sec xuống 4K/sec (aggregated snapshots mỗi 500ms), tiết kiệm ~$480K/năm infra cost.

❌ Sai lầm 3: Greedy matching thay vì global optimization

Vấn đề: Assign driver gần nhất cho mỗi request ngay lập tức.

python

# SAI: Greedy — driver gần nhất chưa chắc là optimal globally
def match_greedy(request: RideRequest) -> Driver:
    drivers = find_nearby_drivers(request.pickup)
    return min(drivers, key=lambda d: distance(d.location, request.pickup))

Tại sao sai: 2 requests (A, B), 2 drivers (X, Y). X gần A (1km) và B (1.5km), Y gần B (2km) xa A (5km). Greedy: X→A, Y→B = 3km. Nhưng batch matching xét ETA thực tế với traffic có thể tốt hơn 20-30% tổng thời gian chờ.

python

# ĐÚNG: Batch matching với Hungarian algorithm
def match_batch(requests: list[RideRequest], drivers: list[Driver]) -> list[tuple]:
    cost_matrix = build_eta_cost_matrix(requests, drivers)
    assignments = hungarian_algorithm(cost_matrix)
    return [(requests[i], drivers[j]) for i, j in assignments]

Production impact: Batch matching giảm average wait time 25% và tăng driver utilization 15% so với greedy approach.

❌ Sai lầm 4: Static surge pricing zones

Vấn đề: Chia thành phố thành fixed zones cho surge pricing.

python

# SAI: Fixed zones không phản ánh demand patterns thay đổi
SURGE_ZONES = {
    "district_1": Polygon([...]),   # Quận 1
    "district_7": Polygon([...]),   # Quận 7
    "tan_binh": Polygon([...]),     # Tân Bình
}

def get_surge(location: Location) -> float:
    zone = find_zone(location, SURGE_ZONES)
    return zone_surge_multiplier[zone]

Tại sao sai: Demand patterns thay đổi liên tục. Concert ở Phú Mỹ Hưng tạo hotspot nhỏ trong zone lớn — surge áp dụng cho cả zone dù chỉ 1 góc quá tải → giá không công bằng, drivers không navigate chính xác.

python

# ĐÚNG: Dynamic zones dựa trên S2 cells với adaptive granularity
def calculate_dynamic_surge(location: Location) -> float:
    cell_id = s2.cell_from_latlng(location.lat, location.lng, level=14)  # ~0.8 km²
    demand = get_recent_requests(cell_id, window_minutes=5)
    supply = get_available_drivers(cell_id)
    # Smooth boundaries: weighted contribution từ adjacent cells
    for n in s2.get_neighbors(cell_id):
        demand += get_recent_requests(n, window_minutes=5) * 0.3
        supply += get_available_drivers(n) * 0.3

    return compute_multiplier(demand, supply)

Production impact: Dynamic zones giảm price unfairness complaints 40% và cải thiện driver positioning accuracy 35%.

Under the Hood

S2 Geometry Deep Dive

S2 chiếu Trái Đất lên 6 mặt hình lập phương, chia bằng Hilbert curve — đường cong lấp đầy không gian giữ nguyên locality.

Hilbert Curve Level 3 trên một mặt:

┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 1 │14 │15 │16 │19 │20 │21 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│ 3 │ 2 │13 │12 │17 │18 │23 │22 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│ 4 │ 7 │ 8 │11 │30 │29 │24 │25 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│ 5 │ 6 │ 9 │10 │31 │28 │27 │26 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│58 │57 │54 │53 │32 │35 │36 │37 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│59 │56 │55 │52 │33 │34 │39 │38 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│60 │61 │50 │51 │46 │45 │40 │41 │
├───┼───┼───┼───┼───┼───┼───┼───┤
│63 │62 │49 │48 │47 │44 │43 │42 │
└───┴───┴───┴───┴───┴───┴───┴───┘

Cells 0-7 liền nhau trên curve VÀ liền nhau trong không gian
→ Range query [0,7] = một vùng liên tục

Cell ID structure (64-bit integer):

[3 bits: face] [2*level bits: position on Hilbert curve] [1 bit: leaf] [padding]

Tìm neighbors: Mỗi cell có 8 neighbors. S2 library cung cấp GetAllNeighbors() trong O(1) — bit manipulation trên cell ID.

Location Update Pipeline — Write Path Optimization

Ring Buffer design: Circular buffer 1M entries/partition, overwrite oldest khi đầy (data > 5s vô giá trị). Lockless CAS writes. Flush mỗi 500ms: deduplicate per driver → bulk update Redis + publish Kafka.

Trade-offs Table — Chọn Geospatial Index

Criteria	S2 Geometry	Geohash	Quadtree
Precision control	30 levels, arbitrary	12 fixed levels	Adaptive per node
Query speed	O(1) cell + O(k) neighbors	O(prefix match)	O(log n) traversal
Update cost	O(1) — just change cell ID	O(1) — recompute hash	O(log n) — rebalance tree
Memory overhead	8 bytes/cell ID	4-12 bytes/hash	32+ bytes/node + pointers
Boundary handling	Excellent — cells overlap at edges	Poor — edge cells miss neighbors	Good — split at boundaries
Multi-resolution	Native — parent/child cells	String prefix	Native — tree depth
Implementation	Complex — need S2 library	Simple — string ops	Medium — tree management
Production track record	Google Maps, Uber	Elasticsearch, MongoDB	Legacy GIS systems

Kết luận: S2 thắng cho high-frequency updates + proximity queries. Geohash cho indexing trong existing databases. Quadtree cho static spatial data.

CAP Theorem Considerations

Không phải mọi data trong Uber đều cần cùng consistency level:

Data Type	CAP Choice	Rationale
Driver Location	AP	Stale 3 giây chấp nhận được. Location service down = matching down → availability ưu tiên.
Trip State	CP	Không thể 2 drivers nhận 1 trip. Optimistic locking + event sourcing. Partition → reject writes.
Payment	CP	Money phải chính xác. Idempotent transactions, exactly-once. Saga pattern cho cross-service flows.
Surge Pricing	AP	Sai ±0.2x vài giây chấp nhận được. Available with stale price > unavailable.
User Profile	CP	Updates hiếm nhưng phải consistent — tên, phone, payment method.

Cost Estimation — Chạy Uber ở quy mô thành phố lớn

Ước tính cho 1 thành phố lớn (2M rides/day, 200K drivers):

Component	Specs	Monthly Cost (USD)
Redis Geo Cluster	10 nodes × 64GB RAM, location + matching	$15,000
Cassandra Cluster	20 nodes × 2TB SSD, trip history + analytics	$25,000
Compute (Matching)	50 instances × c5.2xlarge, scoring + dispatch	$18,000
Compute (Location)	30 instances × c5.xlarge, gateway + buffer	$8,000
Kafka Cluster	15 brokers × 1TB SSD, event streaming	$12,000
Maps/ETA API	500M API calls/month (internal + Google fallback)	$35,000
Load Balancers	4 NLBs, 2 ALBs	$5,000
Monitoring & Logging	Datadog/Prometheus, 50TB logs/month	$8,000
Network Transfer	~200TB/month cross-AZ	$10,000
Total		~$136,000/month

🎓 Giáo sư Tom

$136K/month vs 2M rides/day × $5 = $10M revenue/day → infra chỉ 0.045% revenue. Ride-sharing là tech business — bottleneck thực sự là driver acquisition và regulatory compliance, không phải server cost.

Checklist ghi nhớ

✅ Checklist triển khai

Geospatial & Location

[ ] Chọn S2 Geometry hoặc Geohash cho spatial indexing — không dùng SQL range queries
[ ] In-memory store (Redis Geo) cho real-time location — không persist mọi update
[ ] Cell-level indexing: convert lat/lng → cell ID, query drivers trong cell + neighbors
[ ] Adjacent cell query cho boundary cases — driver ở biên cell vẫn phải được tìm thấy
[ ] Dynamic cell level: expand search radius khi local supply thấp

Matching & Dispatch

[ ] Multi-factor scoring: ETA (40%) + distance (15%) + rating (20%) + acceptance (15%) + type (10%)
[ ] Dispatch timeout 15 giây với retry mechanism (tối đa 3 lần)
[ ] Batch matching trong time windows 2 giây cho global optimization (Hungarian algorithm)
[ ] Pre-dispatch dựa trên demand prediction — đưa drivers tới high-demand zones trước

Pricing & Economics

[ ] Dynamic surge pricing theo supply-demand ratio per S2 cell (level 14)
[ ] Surge smoothing: exponential moving average, max delta per update cycle
[ ] Anti-gaming: track device ID, detect cluster offline behavior

Trip & Reliability

[ ] Trip state machine với explicit valid transitions — reject invalid state changes
[ ] Event sourcing cho trip lifecycle — audit trail và replay capability
[ ] Idempotent payment processing — exactly-once semantics
[ ] Graceful degradation: khi location service lag, dùng last-known position + distance fallback
[ ] Circuit breaker cho Maps/ETA service — fallback sang haversine distance

Bài tập luyện tập

Bài 1: S2 Cell Level Selection — Foundation

Đề bài: Cho ride request tại (10.7769, 106.7009) — Quận 1, HCM. Xác định:

S2 Cell level phù hợp cho matching radius 3km, số cells cần query, ước tính drivers (density = 50/km²)

🧠 Quiz

Với S2 Level 12 (cell area ~3.3 km²), query center + 8 neighbors bao phủ bao nhiêu diện tích?

[ ] A. 3.3 km² — chỉ center cell
[ ] B. 16.5 km² — center + 4 edge neighbors
[x] C. ~29.7 km² — center + 8 neighbors (3.3 × 9), nhưng thực tế ~25 km² do cell shapes
[ ] D. 33 km² — exactly 3.3 × 10 Giải thích: 9 cells × 3.3 km² = 29.7 km² lý thuyết, nhưng S2 cells là hình tứ giác cầu nên actual coverage ~25 km². Với 50 drivers/km² → ~1,250 candidates.

💡 Gợi ý

S2 Level 12 cell ≈ 3.3 km², tương đương hình vuông cạnh ~1.8km
Matching radius 3km → cần cover area π × 3² ≈ 28 km²
9 cells × 3.3 km² ≈ 29.7 km² → vừa đủ cover radius 3km
Nếu cần radius lớn hơn → dùng level 11 (~13 km²/cell)

✅ Lời giải

python

import s2geometry as s2

def determine_search_params(lat: float, lng: float, radius_km: float) -> dict:
    LEVEL_AREA_KM2 = {10: 53.0, 11: 13.2, 12: 3.31, 13: 0.83, 14: 0.21, 15: 0.05}
    required_area = 3.14159 * (radius_km ** 2) * 1.2  # 20% buffer

    chosen_level = 12
    for level in sorted(LEVEL_AREA_KM2.keys(), reverse=True):
        if LEVEL_AREA_KM2[level] * 9 >= required_area:
            chosen_level = level
            break

    center_cell = s2.CellId.from_lat_lng(s2.LatLng.from_degrees(lat, lng)).parent(chosen_level)
    total_cells = [center_cell] + center_cell.get_all_neighbors(chosen_level)
    area = len(total_cells) * LEVEL_AREA_KM2[chosen_level]
    return {"level": chosen_level, "total_cells": len(total_cells),
            "approx_area_km2": area, "estimated_drivers": int(area * 50)}

# → level=12, total_cells=9, approx_area=29.7km², estimated_drivers=1485

Phân tích: O(1) cell + O(1) neighbor lookup. Giảm search space từ 200K drivers xuống ~1,500 — improvement 133x.

Bài 2: Thiết kế ETA Service — Intermediate

Đề bài: Thiết kế ETA service từ driver tới pickup. Requirements: latency < 50ms, accuracy ±2 phút (trips < 10km), real-time traffic, 10K queries/sec.

💡 Gợi ý

Road network = weighted directed graph (nodes = intersections, edges = segments)
Dijkstra/A* cho shortest path, cache popular routes (top 10% → 60% queries)

✅ Lời giải

python

class ETAService:
    def __init__(self, road_graph: RoadGraph, traffic_store: TrafficStore):
        self.graph = road_graph
        self.traffic = traffic_store
        self.route_cache = LRUCache(max_size=100_000, ttl_seconds=300)

    def calculate_eta(self, origin: Location, destination: Location) -> ETAResult:
        cache_key = self._make_cache_key(origin, destination)
        cached = self.route_cache.get(cache_key)
        if cached and cached.age_seconds < 120:
            return cached

        origin_node = self.graph.snap_to_road(origin.lat, origin.lng)
        dest_node = self.graph.snap_to_road(destination.lat, destination.lng)
        if origin_node is None or dest_node is None:
            distance_km = haversine(origin, destination)
            return ETAResult(seconds=int(distance_km / 25 * 3600), confidence=0.5, source="fallback")

        path = self._astar_search(origin_node, dest_node)
        if path is None:
            return ETAResult(seconds=-1, confidence=0, source="unreachable")

        total_seconds = sum(
            self.graph.get_edge(path[i], path[i+1]).base_travel_seconds
            * self.traffic.get_multiplier(self.graph.get_edge(path[i], path[i+1]).id)
            for i in range(len(path) - 1)
        )
        result = ETAResult(seconds=int(total_seconds), confidence=0.85, source="traffic_aware")
        self.route_cache.set(cache_key, result)
        return result

    def _astar_search(self, start: int, goal: int) -> list[int] | None:
        open_set = [(0, start)]
        came_from, g_score = {}, {start: 0}
        while open_set:
            _, current = heapq.heappop(open_set)
            if current == goal:
                return self._reconstruct_path(came_from, current)
            for neighbor, edge in self.graph.get_neighbors(current):
                tentative_g = g_score[current] + edge.base_travel_seconds * self.traffic.get_multiplier(edge.id)
                if tentative_g < g_score.get(neighbor, float('inf')):
                    came_from[neighbor] = current
                    g_score[neighbor] = tentative_g
                    heapq.heappush(open_set, (tentative_g + self._haversine_heuristic(neighbor, goal), neighbor))
        return None

Phân tích: A* search O(E log V). Road network HCM: ~500K nodes, ~1.2M edges. Good heuristic explore ~5% graph → ~20ms. Cache hit rate 40% → average ~15ms. Traffic multiplier update mỗi 5 phút từ fleet GPS data.

Bài 3: Thiết kế Surge Pricing Engine — Advanced

Đề bài: Thiết kế surge pricing system: update mỗi 30s per zone (~10K zones/city), smooth transitions, anti-gaming, A/B testing support.

🧠 Quiz

Tại sao surge pricing dùng exponential moving average (EMA) thay vì simple moving average (SMA)?

[ ] A. EMA tính toán nhanh hơn SMA
[x] B. EMA phản ứng nhanh hơn với thay đổi gần đây nhưng vẫn smooth — weight giảm exponentially cho data cũ
[ ] C. EMA luôn cho giá trị thấp hơn SMA, giúp giảm surge
[ ] D. EMA không cần lưu historical data, tiết kiệm memory Giải thích: EMA alpha=0.3 → 30% weight data mới nhất, 21% trước đó, 14.7% trước nữa... Phản ứng nhanh hơn SMA với demand spike nhưng không gây oscillation. D sai vì EMA cần lưu previous value.

💡 Gợi ý

Mỗi zone = 1 S2 cell level 14. Dùng Redis sorted set cho monitoring.
A/B testing: hash(rider_id) % 100 < 50 → formula A, else B
Anti-gaming: compare online/offline patterns vs historical baseline

✅ Lời giải

python

class SurgePricingEngine:
    def __init__(self, config: SurgeConfig):
        self.config = config
        self.zone_states: dict[str, ZoneState] = {}
        self.ab_test_router = ABTestRouter()
        self.anomaly_detector = DriverAnomalyDetector()

    def update_all_zones(self, city: str) -> dict[str, float]:
        results = {}
        for zone_id in get_active_zones(city):
            demand = count_requests_last_5min(zone_id)
            suspicious = self.anomaly_detector.get_suspicious_count(zone_id)
            adjusted_supply = max(1, count_available_drivers(zone_id) - suspicious)
            state = self.zone_states.setdefault(zone_id, ZoneState(multiplier=1.0))

            ratio = demand / adjusted_supply if adjusted_supply > 0 else demand
            raw_multiplier = self.ab_test_router.get_formula(zone_id).compute(ratio)

            # EMA smoothing + rate limiting
            smoothed = self.config.ema_alpha * raw_multiplier + (1 - self.config.ema_alpha) * state.multiplier
            clamped = max(state.multiplier - self.config.max_delta_per_cycle,
                          min(state.multiplier + self.config.max_delta_per_cycle, smoothed))
            final = round(max(1.0, min(self.config.max_surge, clamped)), 1)

            state.multiplier = final
            state.last_updated = utc_now()
            results[zone_id] = final
        return results

class DriverAnomalyDetector:
    def get_suspicious_count(self, zone_id: str) -> int:
        recent_offlines = get_drivers_gone_offline(zone_id, window_minutes=10)
        if len(recent_offlines) > get_historical_offline_rate(zone_id) * 3:
            clusters = dbscan_cluster([d.last_location for d in recent_offlines], eps_km=0.5, min_samples=3)
            return sum(len(c) for c in clusters)
        return 0

Phân tích: 10K zones/30s = 333 zones/sec — trivial compute. EMA + rate limiting: max 0.5x/cycle → 1x→2x mất 60s, smooth cho riders và drivers. DBSCAN clustering O(n log n) chạy background, không ảnh hưởng critical path.

Liên kết học tiếp

Từ khóa glossary: Uber system design, geospatial indexing, S2 Geometry, Geohash, Quadtree, driver matching, dispatch algorithm, surge pricing, real-time location tracking, batch matching, Hungarian algorithm, ETA calculation, trip state machine, event sourcing, ring buffer

Tìm kiếm liên quan: thiết kế hệ thống gọi xe, spatial indexing, tìm tài xế gần nhất, định giá động, ride-sharing architecture, location service at scale

Thiết kế Uber — Geospatial Matching và Dispatch ở quy mô toàn cầu ​

Bức tranh tư duy ​

Cốt lõi kỹ thuật ​

Geospatial Indexing — Chia nhỏ Trái Đất ​

Location Service — 8 triệu writes/giây ​

Matching & Dispatch Engine — Bộ não của Uber ​

Surge Pricing — Cân bằng cung cầu ​

Trip Management — State Machine ​

Kiến trúc tổng thể ​

Thực chiến ​

Tình huống: Giờ cao điểm Sài Gòn — 500K ride requests trong 2 giờ ​

Sai lầm điển hình ​

❌ Sai lầm 1: Dùng SQL range query cho location matching ​

❌ Sai lầm 2: Persist mọi location update vào disk ​

❌ Sai lầm 3: Greedy matching thay vì global optimization ​

❌ Sai lầm 4: Static surge pricing zones ​

Under the Hood ​

S2 Geometry Deep Dive ​

Location Update Pipeline — Write Path Optimization ​

Trade-offs Table — Chọn Geospatial Index ​

CAP Theorem Considerations ​

Cost Estimation — Chạy Uber ở quy mô thành phố lớn ​

Checklist ghi nhớ ​

Bài tập luyện tập ​

Bài 1: S2 Cell Level Selection — Foundation ​

Bài 2: Thiết kế ETA Service — Intermediate ​

Bài 3: Thiết kế Surge Pricing Engine — Advanced ​

Liên kết học tiếp ​

Thiết kế Uber — Geospatial Matching và Dispatch ở quy mô toàn cầu

Bức tranh tư duy

Cốt lõi kỹ thuật

Geospatial Indexing — Chia nhỏ Trái Đất

Location Service — 8 triệu writes/giây

Matching & Dispatch Engine — Bộ não của Uber

Surge Pricing — Cân bằng cung cầu

Trip Management — State Machine

Kiến trúc tổng thể

Thực chiến

Tình huống: Giờ cao điểm Sài Gòn — 500K ride requests trong 2 giờ

Sai lầm điển hình

❌ Sai lầm 1: Dùng SQL range query cho location matching

❌ Sai lầm 2: Persist mọi location update vào disk

❌ Sai lầm 3: Greedy matching thay vì global optimization

❌ Sai lầm 4: Static surge pricing zones

Under the Hood

S2 Geometry Deep Dive

Location Update Pipeline — Write Path Optimization

Trade-offs Table — Chọn Geospatial Index

CAP Theorem Considerations

Cost Estimation — Chạy Uber ở quy mô thành phố lớn

Checklist ghi nhớ

Bài tập luyện tập

Bài 1: S2 Cell Level Selection — Foundation

Bài 2: Thiết kế ETA Service — Intermediate

Bài 3: Thiết kế Surge Pricing Engine — Advanced

Liên kết học tiếp