Giao diện
Quadtree - Spatial Indexing Engine
"Uber tìm tài xế gần bạn trong O(log N), không phải O(N)." - HPN
Định nghĩa
Quadtree là cấu trúc dữ liệu dạng cây, trong đó mỗi node có tối đa 4 children, tương ứng với 4 góc phần tư của không gian 2D:
- NW (Northwest - Tây Bắc)
- NE (Northeast - Đông Bắc)
- SW (Southwest - Tây Nam)
- SE (Southeast - Đông Nam)
📘 Recursive Decomposition
Quadtree hoạt động theo nguyên lý "Chia để trị" (Divide & Conquer):
- Nếu vùng hiện tại có quá nhiều points → Chia làm 4
- Lặp lại cho đến khi mỗi vùng đủ nhỏ
Tại sao cần Quadtree?
Bài toán: Tìm Tài xế Gần Nhất
Uber có 1 triệu tài xế online tại một thành phố.
| Approach | Complexity | 1000 queries/sec |
|---|---|---|
| Brute-force (duyệt list) | 1 tỷ operations/sec | |
| Quadtree | 20 triệu operations/sec |
🎯 CRITICAL
Brute-force: 1 triệu × 1000 = 1 tỷ distance calculations/second Quadtree: Chỉ cần xét vài chục candidates trong vùng lân cận.
Mô hình hóa
Recursive Subdivision
Bước 1: Toàn bộ bản đồ là 1 ô lớn
┌─────────────────────────────┐
│ │
│ • • │
│ • │
│ • │
│ • │
│ │
│ • • • │
│ │
└─────────────────────────────┘
Bước 2: Quá nhiều points → Chia làm 4
┌──────────────┬──────────────┐
│ │ │
│ • • │ │
│ • │ │
├──────────────┼──────────────┤
│ │ • │
│ │ • │
│ • │ • • │
│ │ │
└──────────────┴──────────────┘
NW NE
SW SE
Bước 3: Tiếp tục chia nếu cần...Mermaid Diagram
Cơ chế Hoạt động
Insert Point
1. Bắt đầu từ ROOT
2. Nếu node chưa đầy (< capacity) → Thêm point vào node
3. Nếu node đầy:
a. Chia node thành 4 children (nếu chưa có)
b. Tìm quadrant phù hợp cho point
c. Đệ quy insert vào child đóQuery Range (Find Nearby)
1. Kiểm tra boundary của node có giao với search area không
2. Nếu KHÔNG giao → Bỏ qua hoàn toàn (PRUNING!)
3. Nếu có giao:
a. Kiểm tra tất cả points trong node
b. Đệ quy vào các childrenProduction Implementation
python
# HPN Engineering Standard
# Implementation: Quadtree for Spatial Indexing
from typing import List, Optional, Tuple
from dataclasses import dataclass
@dataclass
class Point:
"""Represents a 2D point with optional data payload."""
x: float
y: float
data: Optional[any] = None # e.g., driver_id, restaurant_id
@dataclass
class Rectangle:
"""Axis-aligned bounding box (AABB)."""
x: float # Center x
y: float # Center y
width: float # Half-width
height: float # Half-height
def contains(self, point: Point) -> bool:
"""Check if point lies within this rectangle."""
return (
self.x - self.width <= point.x <= self.x + self.width and
self.y - self.height <= point.y <= self.y + self.height
)
def intersects(self, other: 'Rectangle') -> bool:
"""Check if two rectangles overlap."""
return not (
other.x - other.width > self.x + self.width or
other.x + other.width < self.x - self.width or
other.y - other.height > self.y + self.height or
other.y + other.height < self.y - self.height
)
class Quadtree:
"""
Quadtree for efficient 2D spatial queries.
Time Complexity:
- Insert: O(log N) average, O(N) worst case (degenerate)
- Query: O(log N + K) where K = number of results
Space Complexity: O(N)
Use Cases:
- Uber/Grab: Find nearby drivers
- Game engines: Collision detection
- GIS: Spatial database indexing
"""
MAX_CAPACITY = 4 # Points per node before subdivision
MAX_DEPTH = 10 # Prevent infinite recursion
def __init__(
self,
boundary: Rectangle,
depth: int = 0
) -> None:
self.boundary = boundary
self.depth = depth
self.points: List[Point] = []
self.divided = False
# Children (lazily initialized)
self.northwest: Optional['Quadtree'] = None
self.northeast: Optional['Quadtree'] = None
self.southwest: Optional['Quadtree'] = None
self.southeast: Optional['Quadtree'] = None
def insert(self, point: Point) -> bool:
"""
Insert a point into the quadtree.
Returns:
True if point was inserted, False if outside boundary
"""
# Ignore points outside boundary
if not self.boundary.contains(point):
return False
# If we have room and haven't subdivided, add here
if len(self.points) < self.MAX_CAPACITY and not self.divided:
self.points.append(point)
return True
# Need to subdivide (if we haven't already)
if not self.divided:
if self.depth >= self.MAX_DEPTH:
# At max depth, just store here
self.points.append(point)
return True
self._subdivide()
# Try to insert into children
return (
self.northwest.insert(point) or
self.northeast.insert(point) or
self.southwest.insert(point) or
self.southeast.insert(point)
)
def query(self, search_range: Rectangle) -> List[Point]:
"""
Find all points within the search range.
This is the core "find nearby" operation.
Args:
search_range: Rectangle defining the search area
Returns:
List of points within the range
"""
found: List[Point] = []
# PRUNING: If search range doesn't intersect, skip entirely
if not self.boundary.intersects(search_range):
return found
# Check points in this node
for point in self.points:
if search_range.contains(point):
found.append(point)
# Recursively check children
if self.divided:
found.extend(self.northwest.query(search_range))
found.extend(self.northeast.query(search_range))
found.extend(self.southwest.query(search_range))
found.extend(self.southeast.query(search_range))
return found
def query_radius(
self,
center: Point,
radius: float
) -> List[Tuple[Point, float]]:
"""
Find all points within a circular radius.
Returns:
List of (point, distance) tuples, sorted by distance
"""
# First, use rectangular query (fast pruning)
search_rect = Rectangle(center.x, center.y, radius, radius)
candidates = self.query(search_rect)
# Then, filter by actual circular distance
results = []
for point in candidates:
dist = ((point.x - center.x) ** 2 +
(point.y - center.y) ** 2) ** 0.5
if dist <= radius:
results.append((point, dist))
# Sort by distance (nearest first)
results.sort(key=lambda x: x[1])
return results
def _subdivide(self) -> None:
"""Create 4 children quadrants."""
x, y = self.boundary.x, self.boundary.y
hw, hh = self.boundary.width / 2, self.boundary.height / 2
self.northwest = Quadtree(
Rectangle(x - hw, y - hh, hw, hh),
self.depth + 1
)
self.northeast = Quadtree(
Rectangle(x + hw, y - hh, hw, hh),
self.depth + 1
)
self.southwest = Quadtree(
Rectangle(x - hw, y + hh, hw, hh),
self.depth + 1
)
self.southeast = Quadtree(
Rectangle(x + hw, y + hh, hw, hh),
self.depth + 1
)
self.divided = True
# Re-insert existing points into children
for point in self.points:
(self.northwest.insert(point) or
self.northeast.insert(point) or
self.southwest.insert(point) or
self.southeast.insert(point))
self.points = [] # Clear parent's points
def __len__(self) -> int:
"""Count total points in tree."""
count = len(self.points)
if self.divided:
count += len(self.northwest)
count += len(self.northeast)
count += len(self.southwest)
count += len(self.southeast)
return count
# ============================================
# USAGE EXAMPLE: Find Nearby Drivers
# ============================================
if __name__ == "__main__":
import random
# Create quadtree for a 100x100 km city
city_bounds = Rectangle(50, 50, 50, 50) # Center at (50,50)
qt = Quadtree(city_bounds)
# Insert 10,000 drivers at random locations
drivers = []
for i in range(10000):
driver = Point(
x=random.uniform(0, 100),
y=random.uniform(0, 100),
data=f"driver_{i}"
)
qt.insert(driver)
drivers.append(driver)
print(f"Inserted {len(qt)} drivers into quadtree")
# User at location (25, 30) wants to find drivers within 5km
user_location = Point(25, 30)
nearby = qt.query_radius(user_location, radius=5.0)
print(f"\nFound {len(nearby)} drivers within 5km:")
for driver, dist in nearby[:5]: # Show first 5
print(f" {driver.data}: {dist:.2f} km away")
# Performance comparison
import time
# Quadtree query
start = time.perf_counter()
for _ in range(1000):
qt.query_radius(user_location, 5.0)
qt_time = time.perf_counter() - start
# Brute-force
start = time.perf_counter()
for _ in range(1000):
results = []
for d in drivers:
dist = ((d.x - user_location.x)**2 + (d.y - user_location.y)**2)**0.5
if dist <= 5.0:
results.append((d, dist))
bf_time = time.perf_counter() - start
print(f"\n=== Performance (1000 queries) ===")
print(f"Quadtree: {qt_time*1000:.2f} ms")
print(f"Brute-force: {bf_time*1000:.2f} ms")
print(f"Speedup: {bf_time/qt_time:.1f}x faster")Phân tích Độ phức tạp
Time Complexity
| Operation | Average | Worst Case |
|---|---|---|
| Insert | ||
| Query Range | ||
| Query Radius |
Worst case xảy ra khi tất cả points tập trung ở một góc → Tree bị lệch (degenerate).
Space Complexity
Geohash: Alternative Approach
Geohash encode tọa độ (lat, lon) thành string có thể sort:
(10.762622, 106.660172) → "w3gvk1t5"
Độ dài string ↔ Độ chính xác:
- 4 chars: ~20km precision
- 6 chars: ~1km precision
- 8 chars: ~20m precisionGeohash vs Quadtree
| Tiêu chí | Quadtree | Geohash |
|---|---|---|
| Storage | Tree structure | String |
| Query type | Range/Circle | Prefix match |
| Database | In-memory | SQL/Redis LIKE |
| Complexity | More code | Simpler |
💡 HPN's Rule
Geohash tốt cho database indexing (SQL WHERE geohash LIKE 'w3g%'). Quadtree tốt cho in-memory spatial queries.
Ứng dụng Thực tế
| System | Use Case |
|---|---|
| Uber/Grab | Match riders với nearby drivers |
| Game Engines | Collision detection, frustum culling |
| Google Maps | Poi search, tile rendering |
| Elasticsearch | Geo queries với geo_point type |
Anti-patterns
❌ TRÁNH
1. Không giới hạn MAX_DEPTH
python
# ❌ WRONG: Infinite recursion nếu points trùng tọa độ
def subdivide(self):
self._subdivide() # Loop forever with identical points
# ✅ RIGHT: Limit depth
if self.depth >= self.MAX_DEPTH:
self.points.append(point)
return2. Rectangular query cho circular search
python
# ❌ WRONG: Return points outside radius
def find_nearby(center, radius):
return qt.query(Rectangle(center.x, center.y, radius, radius))
# Corners of rectangle are outside circle!
# ✅ RIGHT: Filter by actual distance
candidates = qt.query(rect)
return [p for p in candidates if distance(p, center) <= radius]💡 HPN's Insight
"Quadtree là cấu trúc dữ liệu spatial cơ bản nhất. Master nó trước khi học R-Tree, KD-Tree."