Skip to content

Timsort - The Real-World Champion

"Timsort là lý do Python sort() và Java Arrays.sort() nhanh đến vậy trên dữ liệu thực tế." - HPN

What is Timsort?

Timsort là thuật toán hybrid kết hợp Merge SortInsertion Sort, được thiết kế bởi Tim Peters năm 2002 cho Python.

Nó được sử dụng làm thuật toán sort mặc định trong:

  • 🐍 Python (list.sort(), sorted())
  • Java (Arrays.sort() cho objects)
  • 🦀 Rust (slice::sort)
  • 🍎 Swift (Array.sort())
  • 📱 Android (Java collections)

📘 Tại sao không dùng Quick Sort?

Quick Sort có worst case O(N²) và không stable.

Timsort: O(N log N) guaranteed + stable + tối ưu cho real-world data.

Why Timsort is Fast on Real Data

Key Insight: Real Data Has "Runs"

Dữ liệu thực tế thường có runs - các đoạn đã sorted sẵn.

Real-world data examples:
- Log files: Mostly sorted by timestamp
- Database exports: Often partially sorted
- User lists: Alphabetically grouped
- Stock prices: Runs of increasing/decreasing

Random data: [7, 2, 9, 1, 5, 3, 8]  → No runs
Real data:   [1, 2, 3, 9, 8, 7, 4, 5, 6]  → Runs: [1,2,3], [9,8,7], [4,5,6]

Timsort detects và exploits các runs này!

The Algorithm

Phase 1: Identify Runs

python
def find_run(arr, start):
    """
    Find a run starting at 'start'.
    A run is either:
    - Ascending: a[i] <= a[i+1]
    - Descending: a[i] > a[i+1] (will be reversed)
    """
    if start >= len(arr) - 1:
        return start + 1
    
    pos = start + 1
    
    if arr[pos] < arr[start]:
        # Descending run
        while pos < len(arr) and arr[pos] < arr[pos - 1]:
            pos += 1
        # Reverse to make ascending
        reverse(arr, start, pos)
    else:
        # Ascending run
        while pos < len(arr) and arr[pos] >= arr[pos - 1]:
            pos += 1
    
    return pos

Phase 2: Extend Short Runs with Insertion Sort

python
MIN_RUN = 32  # Typically 32-64

def extend_run(arr, start, run_end, min_run):
    """
    If run is shorter than MIN_RUN, extend it using Insertion Sort.
    Insertion Sort is fast for small arrays (cache-friendly).
    """
    target_end = min(start + min_run, len(arr))
    
    # Binary Insertion Sort for the extension
    for i in range(run_end, target_end):
        # Use binary search to find insertion point
        # Then shift and insert
        ...

Phase 3: Merge Runs with Merge Sort

Key Optimizations

1. Galloping Mode

Khi merge 2 runs mà các elements từ một run liên tục thắng, switch sang galloping mode (tìm kiếm nhị phân thay vì so sánh từng cái).

python
def gallop_right(arr, key, base, length, hint):
    """
    Find position where key should be inserted using galloping.
    
    Instead of: 1, 2, 3, 4, 5, 6, 7, 8... (linear)
    Gallop:     1, 2, 4, 8, 16... then binary search
    
    Very fast when runs have different "densities"!
    """
    ...

2. Memory-Efficient Merge

Chỉ cần O(min(run1, run2)) extra memory thay vì O(N).

python
def merge_low(arr, run1_start, run1_len, run2_start, run2_len):
    """
    Copy smaller run to temp, merge in-place.
    """
    # If run1 is smaller, copy run1 to temp
    temp = arr[run1_start:run1_start + run1_len]
    # Merge temp and run2 back into arr
    ...

3. Stack-Based Merge Order

Dùng stack để ensure merge order tối ưu:

  • Merge runs có size tương tự trước
  • Avoid expensive large-small merges
python
# Merge invariants (maintain on stack):
# 1. run[i-2] > run[i-1] + run[i]
# 2. run[i-1] > run[i]

Implementation

python
# HPN Engineering Standard
# Implementation: Simplified Timsort

from typing import List, Callable, TypeVar
from functools import cmp_to_key

T = TypeVar('T')

MIN_RUN = 32


def calc_min_run(n: int) -> int:
    """
    Calculate minimum run length.
    Returns value in [32, 64] such that n/min_run is a power of 2 or close.
    """
    r = 0
    while n >= MIN_RUN:
        r |= n & 1
        n >>= 1
    return n + r


def insertion_sort_range(arr: List[T], left: int, right: int):
    """Insertion sort for [left, right)."""
    for i in range(left + 1, right):
        key = arr[i]
        j = i - 1
        while j >= left and arr[j] > key:
            arr[j + 1] = arr[j]
            j -= 1
        arr[j + 1] = key


def merge(arr: List[T], left: int, mid: int, right: int):
    """Merge two sorted runs [left, mid) and [mid, right)."""
    left_part = arr[left:mid]
    right_part = arr[mid:right]
    
    i = j = 0
    k = left
    
    while i < len(left_part) and j < len(right_part):
        if left_part[i] <= right_part[j]:  # <= for stability
            arr[k] = left_part[i]
            i += 1
        else:
            arr[k] = right_part[j]
            j += 1
        k += 1
    
    while i < len(left_part):
        arr[k] = left_part[i]
        i += 1
        k += 1
    
    while j < len(right_part):
        arr[k] = right_part[j]
        j += 1
        k += 1


def timsort(arr: List[T]) -> None:
    """
    Simplified Timsort implementation.
    
    Time: O(N log N) worst case
    Space: O(N) for merging
    Stable: Yes
    
    Note: Real Timsort has more optimizations (galloping, etc.)
    """
    n = len(arr)
    if n <= 1:
        return
    
    min_run = calc_min_run(n)
    
    # Step 1: Create runs of min_run using insertion sort
    for start in range(0, n, min_run):
        end = min(start + min_run, n)
        insertion_sort_range(arr, start, end)
    
    # Step 2: Merge runs
    size = min_run
    while size < n:
        for left in range(0, n, 2 * size):
            mid = min(left + size, n)
            right = min(left + 2 * size, n)
            
            if mid < right:
                merge(arr, left, mid, right)
        
        size *= 2


# ============================================
# COMPARISON WITH BUILT-IN
# ============================================

def benchmark_sorts(arr_size: int = 10000, runs: int = 100):
    """Compare timsort with built-in sort."""
    import time
    import random
    
    # Partially sorted data (realistic)
    def generate_partial_sorted():
        arr = list(range(arr_size))
        # Shuffle 10% of elements
        for _ in range(arr_size // 10):
            i, j = random.randint(0, arr_size-1), random.randint(0, arr_size-1)
            arr[i], arr[j] = arr[j], arr[i]
        return arr
    
    # Our timsort
    total_ours = 0
    for _ in range(runs):
        arr = generate_partial_sorted()
        start = time.perf_counter()
        timsort(arr)
        total_ours += time.perf_counter() - start
    
    # Built-in sort
    total_builtin = 0
    for _ in range(runs):
        arr = generate_partial_sorted()
        start = time.perf_counter()
        arr.sort()
        total_builtin += time.perf_counter() - start
    
    print(f"Our Timsort: {total_ours/runs*1000:.3f} ms/run")
    print(f"Built-in:    {total_builtin/runs*1000:.3f} ms/run")


# ============================================
# USAGE EXAMPLE
# ============================================

if __name__ == "__main__":
    # Basic usage
    arr = [5, 2, 4, 6, 1, 3, 2, 6]
    print(f"Before: {arr}")
    timsort(arr)
    print(f"After:  {arr}")
    
    # Stability test
    print("\n=== Stability Test ===")
    data = [('apple', 3), ('banana', 1), ('apple', 1), ('cherry', 2)]
    # Sort by name (stable should preserve order of same names)
    data_sorted = sorted(data, key=lambda x: x[0])
    print(f"Sorted by name: {data_sorted}")
    # apple(3) should still come before apple(1) if stable
    
    # Benchmark
    print("\n=== Benchmark (partial sorted data) ===")
    benchmark_sorts(10000, 50)

Complexity Analysis

CaseTimeNotes
BestO(N)Already sorted!
AverageO(N log N)Typical case
WorstO(N log N)Guaranteed
SpaceO(N)For merge buffer

Timsort vs QuickSort

AspectTimsortQuickSort
Worst caseO(N log N) ✅O(N²) ❌
StableYes ✅No ❌
AdaptiveYes (fast on partial sorted)No
Cache-friendlyYesYes
In-placeNo (O(N) space)Yes (O(log N))
Best forReal-world dataRandom data, memory-constrained

💡 HPN's Rule

"Dùng built-in sort() trong 99% cases. Nó đã là Timsort và được tối ưu hàng thập kỷ."