Skip to content

Quadtree - Spatial Indexing Engine

"Uber tìm tài xế gần bạn trong O(log N), không phải O(N)." - HPN

Định nghĩa

Quadtree là cấu trúc dữ liệu dạng cây, trong đó mỗi node có tối đa 4 children, tương ứng với 4 góc phần tư của không gian 2D:

  • NW (Northwest - Tây Bắc)
  • NE (Northeast - Đông Bắc)
  • SW (Southwest - Tây Nam)
  • SE (Southeast - Đông Nam)

📘 Recursive Decomposition

Quadtree hoạt động theo nguyên lý "Chia để trị" (Divide & Conquer):

  1. Nếu vùng hiện tại có quá nhiều points → Chia làm 4
  2. Lặp lại cho đến khi mỗi vùng đủ nhỏ

Tại sao cần Quadtree?

Bài toán: Tìm Tài xế Gần Nhất

Uber có 1 triệu tài xế online tại một thành phố.

ApproachComplexity1000 queries/sec
Brute-force (duyệt list)O(N) per query1 tỷ operations/sec
QuadtreeO(logN) per query20 triệu operations/sec

🎯 CRITICAL

Brute-force: 1 triệu × 1000 = 1 tỷ distance calculations/second Quadtree: Chỉ cần xét vài chục candidates trong vùng lân cận.

Mô hình hóa

Recursive Subdivision

Bước 1: Toàn bộ bản đồ là 1 ô lớn
┌─────────────────────────────┐
│                             │
│      •  •                   │
│         •                   │
│                    •        │
│              •              │
│                             │
│    •            •     •     │
│                             │
└─────────────────────────────┘

Bước 2: Quá nhiều points → Chia làm 4
┌──────────────┬──────────────┐
│              │              │
│      •  •    │              │
│         •    │              │
├──────────────┼──────────────┤
│              │     •        │
│              │  •           │
│    •         │    •     •   │
│              │              │
└──────────────┴──────────────┘
       NW            NE
       SW            SE

Bước 3: Tiếp tục chia nếu cần...

Mermaid Diagram

Cơ chế Hoạt động

Insert Point

1. Bắt đầu từ ROOT
2. Nếu node chưa đầy (< capacity) → Thêm point vào node
3. Nếu node đầy:
   a. Chia node thành 4 children (nếu chưa có)
   b. Tìm quadrant phù hợp cho point
   c. Đệ quy insert vào child đó

Query Range (Find Nearby)

1. Kiểm tra boundary của node có giao với search area không
2. Nếu KHÔNG giao → Bỏ qua hoàn toàn (PRUNING!)
3. Nếu có giao:
   a. Kiểm tra tất cả points trong node
   b. Đệ quy vào các children

Production Implementation

python
# HPN Engineering Standard
# Implementation: Quadtree for Spatial Indexing

from typing import List, Optional, Tuple
from dataclasses import dataclass


@dataclass
class Point:
    """Represents a 2D point with optional data payload."""
    x: float
    y: float
    data: Optional[any] = None  # e.g., driver_id, restaurant_id


@dataclass
class Rectangle:
    """Axis-aligned bounding box (AABB)."""
    x: float      # Center x
    y: float      # Center y
    width: float  # Half-width
    height: float # Half-height
    
    def contains(self, point: Point) -> bool:
        """Check if point lies within this rectangle."""
        return (
            self.x - self.width <= point.x <= self.x + self.width and
            self.y - self.height <= point.y <= self.y + self.height
        )
    
    def intersects(self, other: 'Rectangle') -> bool:
        """Check if two rectangles overlap."""
        return not (
            other.x - other.width > self.x + self.width or
            other.x + other.width < self.x - self.width or
            other.y - other.height > self.y + self.height or
            other.y + other.height < self.y - self.height
        )


class Quadtree:
    """
    Quadtree for efficient 2D spatial queries.
    
    Time Complexity:
        - Insert: O(log N) average, O(N) worst case (degenerate)
        - Query: O(log N + K) where K = number of results
    
    Space Complexity: O(N)
    
    Use Cases:
        - Uber/Grab: Find nearby drivers
        - Game engines: Collision detection
        - GIS: Spatial database indexing
    """
    
    MAX_CAPACITY = 4  # Points per node before subdivision
    MAX_DEPTH = 10    # Prevent infinite recursion
    
    def __init__(
        self, 
        boundary: Rectangle, 
        depth: int = 0
    ) -> None:
        self.boundary = boundary
        self.depth = depth
        self.points: List[Point] = []
        self.divided = False
        
        # Children (lazily initialized)
        self.northwest: Optional['Quadtree'] = None
        self.northeast: Optional['Quadtree'] = None
        self.southwest: Optional['Quadtree'] = None
        self.southeast: Optional['Quadtree'] = None
    
    def insert(self, point: Point) -> bool:
        """
        Insert a point into the quadtree.
        
        Returns:
            True if point was inserted, False if outside boundary
        """
        # Ignore points outside boundary
        if not self.boundary.contains(point):
            return False
        
        # If we have room and haven't subdivided, add here
        if len(self.points) < self.MAX_CAPACITY and not self.divided:
            self.points.append(point)
            return True
        
        # Need to subdivide (if we haven't already)
        if not self.divided:
            if self.depth >= self.MAX_DEPTH:
                # At max depth, just store here
                self.points.append(point)
                return True
            self._subdivide()
        
        # Try to insert into children
        return (
            self.northwest.insert(point) or
            self.northeast.insert(point) or
            self.southwest.insert(point) or
            self.southeast.insert(point)
        )
    
    def query(self, search_range: Rectangle) -> List[Point]:
        """
        Find all points within the search range.
        
        This is the core "find nearby" operation.
        
        Args:
            search_range: Rectangle defining the search area
            
        Returns:
            List of points within the range
        """
        found: List[Point] = []
        
        # PRUNING: If search range doesn't intersect, skip entirely
        if not self.boundary.intersects(search_range):
            return found
        
        # Check points in this node
        for point in self.points:
            if search_range.contains(point):
                found.append(point)
        
        # Recursively check children
        if self.divided:
            found.extend(self.northwest.query(search_range))
            found.extend(self.northeast.query(search_range))
            found.extend(self.southwest.query(search_range))
            found.extend(self.southeast.query(search_range))
        
        return found
    
    def query_radius(
        self, 
        center: Point, 
        radius: float
    ) -> List[Tuple[Point, float]]:
        """
        Find all points within a circular radius.
        
        Returns:
            List of (point, distance) tuples, sorted by distance
        """
        # First, use rectangular query (fast pruning)
        search_rect = Rectangle(center.x, center.y, radius, radius)
        candidates = self.query(search_rect)
        
        # Then, filter by actual circular distance
        results = []
        for point in candidates:
            dist = ((point.x - center.x) ** 2 + 
                    (point.y - center.y) ** 2) ** 0.5
            if dist <= radius:
                results.append((point, dist))
        
        # Sort by distance (nearest first)
        results.sort(key=lambda x: x[1])
        return results
    
    def _subdivide(self) -> None:
        """Create 4 children quadrants."""
        x, y = self.boundary.x, self.boundary.y
        hw, hh = self.boundary.width / 2, self.boundary.height / 2
        
        self.northwest = Quadtree(
            Rectangle(x - hw, y - hh, hw, hh), 
            self.depth + 1
        )
        self.northeast = Quadtree(
            Rectangle(x + hw, y - hh, hw, hh), 
            self.depth + 1
        )
        self.southwest = Quadtree(
            Rectangle(x - hw, y + hh, hw, hh), 
            self.depth + 1
        )
        self.southeast = Quadtree(
            Rectangle(x + hw, y + hh, hw, hh), 
            self.depth + 1
        )
        
        self.divided = True
        
        # Re-insert existing points into children
        for point in self.points:
            (self.northwest.insert(point) or
             self.northeast.insert(point) or
             self.southwest.insert(point) or
             self.southeast.insert(point))
        
        self.points = []  # Clear parent's points
    
    def __len__(self) -> int:
        """Count total points in tree."""
        count = len(self.points)
        if self.divided:
            count += len(self.northwest)
            count += len(self.northeast)
            count += len(self.southwest)
            count += len(self.southeast)
        return count


# ============================================
# USAGE EXAMPLE: Find Nearby Drivers
# ============================================
if __name__ == "__main__":
    import random
    
    # Create quadtree for a 100x100 km city
    city_bounds = Rectangle(50, 50, 50, 50)  # Center at (50,50)
    qt = Quadtree(city_bounds)
    
    # Insert 10,000 drivers at random locations
    drivers = []
    for i in range(10000):
        driver = Point(
            x=random.uniform(0, 100),
            y=random.uniform(0, 100),
            data=f"driver_{i}"
        )
        qt.insert(driver)
        drivers.append(driver)
    
    print(f"Inserted {len(qt)} drivers into quadtree")
    
    # User at location (25, 30) wants to find drivers within 5km
    user_location = Point(25, 30)
    nearby = qt.query_radius(user_location, radius=5.0)
    
    print(f"\nFound {len(nearby)} drivers within 5km:")
    for driver, dist in nearby[:5]:  # Show first 5
        print(f"  {driver.data}: {dist:.2f} km away")
    
    # Performance comparison
    import time
    
    # Quadtree query
    start = time.perf_counter()
    for _ in range(1000):
        qt.query_radius(user_location, 5.0)
    qt_time = time.perf_counter() - start
    
    # Brute-force
    start = time.perf_counter()
    for _ in range(1000):
        results = []
        for d in drivers:
            dist = ((d.x - user_location.x)**2 + (d.y - user_location.y)**2)**0.5
            if dist <= 5.0:
                results.append((d, dist))
    bf_time = time.perf_counter() - start
    
    print(f"\n=== Performance (1000 queries) ===")
    print(f"Quadtree: {qt_time*1000:.2f} ms")
    print(f"Brute-force: {bf_time*1000:.2f} ms")
    print(f"Speedup: {bf_time/qt_time:.1f}x faster")

Phân tích Độ phức tạp

Time Complexity

OperationAverageWorst Case
InsertO(logN)O(N)
Query RangeO(logN+K)O(N)
Query RadiusO(logN+K)O(N)

Worst case xảy ra khi tất cả points tập trung ở một góc → Tree bị lệch (degenerate).

Space Complexity

O(N) để lưu N points + overhead cho tree structure.

Geohash: Alternative Approach

Geohash encode tọa độ (lat, lon) thành string có thể sort:

(10.762622, 106.660172) → "w3gvk1t5"

Độ dài string ↔ Độ chính xác:
- 4 chars: ~20km precision
- 6 chars: ~1km precision
- 8 chars: ~20m precision

Geohash vs Quadtree

Tiêu chíQuadtreeGeohash
StorageTree structureString
Query typeRange/CirclePrefix match
DatabaseIn-memorySQL/Redis LIKE
ComplexityMore codeSimpler

💡 HPN's Rule

Geohash tốt cho database indexing (SQL WHERE geohash LIKE 'w3g%'). Quadtree tốt cho in-memory spatial queries.

Ứng dụng Thực tế

SystemUse Case
Uber/GrabMatch riders với nearby drivers
Game EnginesCollision detection, frustum culling
Google MapsPoi search, tile rendering
ElasticsearchGeo queries với geo_point type

Anti-patterns

❌ TRÁNH

1. Không giới hạn MAX_DEPTH

python
# ❌ WRONG: Infinite recursion nếu points trùng tọa độ
def subdivide(self):
    self._subdivide()  # Loop forever with identical points

# ✅ RIGHT: Limit depth
if self.depth >= self.MAX_DEPTH:
    self.points.append(point)
    return

2. Rectangular query cho circular search

python
# ❌ WRONG: Return points outside radius
def find_nearby(center, radius):
    return qt.query(Rectangle(center.x, center.y, radius, radius))
    # Corners of rectangle are outside circle!

# ✅ RIGHT: Filter by actual distance
candidates = qt.query(rect)
return [p for p in candidates if distance(p, center) <= radius]

💡 HPN's Insight

"Quadtree là cấu trúc dữ liệu spatial cơ bản nhất. Master nó trước khi học R-Tree, KD-Tree."