Giao diện
Radix Sort & Counting Sort - Breaking the O(N log N) Barrier
"Không so sánh = Không bị giới hạn bởi O(N log N). Đây là sức mạnh của non-comparison sorts." - HPN
The Comparison Sort Barrier
Mọi comparison-based sort (Quick, Merge, Heap, Tim) có lower bound Ω(N log N).
📘 Tại sao O(N log N)?
Với N elements, có N! permutations. Mỗi comparison loại bỏ tối đa 1/2 possibilities. → Cần ít nhất log₂(N!) ≈ N log N comparisons.
Non-comparison sorts bypass barrier này bằng cách không so sánh elements, mà dựa vào giá trị của chúng.
Counting Sort - The Foundation
When to Use
- Integers trong range nhỏ [0, K)
- K = O(N) hoặc nhỏ hơn
Algorithm
Input: [4, 2, 2, 8, 3, 3, 1]
Range: 0-8 (K = 9)
Step 1: Count occurrences
count = [0, 1, 2, 1, 1, 0, 0, 0, 1]
0 1 2 3 4 5 6 7 8
Step 2: Cumulative sum (for stable sort)
count = [0, 1, 3, 4, 5, 5, 5, 5, 6]
0 1 2 3 4 5 6 7 8
Step 3: Place elements (right to left for stability)
Output: [1, 2, 2, 3, 3, 4, 8]Implementation
python
from typing import List
def counting_sort(arr: List[int], max_val: int = None) -> List[int]:
"""
Counting Sort - Stable, O(N + K).
Args:
arr: Array of non-negative integers
max_val: Maximum value in array (K)
Time: O(N + K)
Space: O(N + K)
Stable: Yes
"""
if not arr:
return []
if max_val is None:
max_val = max(arr)
n = len(arr)
k = max_val + 1
# Step 1: Count occurrences
count = [0] * k
for num in arr:
count[num] += 1
# Step 2: Cumulative sum
for i in range(1, k):
count[i] += count[i - 1]
# Step 3: Build output (right to left for stability)
output = [0] * n
for i in range(n - 1, -1, -1):
num = arr[i]
count[num] -= 1
output[count[num]] = num
return output
def counting_sort_inplace(arr: List[int], max_val: int) -> None:
"""
In-place version (not stable, but O(1) extra space).
Only works for arrays without duplicates or when order doesn't matter.
"""
count = [0] * (max_val + 1)
for num in arr:
count[num] += 1
idx = 0
for val in range(max_val + 1):
for _ in range(count[val]):
arr[idx] = val
idx += 1Radix Sort - Sorting Large Integers
When to Use
- Integers (or fixed-length strings)
- Counting Sort would be too expensive (K >> N)
Key Insight
Sort digit by digit, least significant digit (LSD) first.
Input: [170, 45, 75, 90, 802, 24, 2, 66]
Pass 1 (ones digit):
170, 90, 802, 2, 24, 45, 75, 66
0 0 2,2 2 4 5 5 6
Pass 2 (tens digit):
802, 2, 24, 45, 66, 170, 75, 90
0 0 2 4 6 7 7 9
Pass 3 (hundreds digit):
2, 24, 45, 66, 75, 90, 170, 802
0 0 0 0 0 0 1 8
Sorted! ✅Implementation
python
from typing import List
def radix_sort(arr: List[int]) -> List[int]:
"""
Radix Sort using Counting Sort as subroutine (LSD).
Time: O(D × (N + K)) where D = number of digits, K = base
Space: O(N + K)
Stable: Yes (crucial for correctness!)
With base 10 and 32-bit integers: D = 10, K = 10
→ O(10 × (N + 10)) = O(N) for practical purposes
"""
if not arr:
return []
# Find maximum to determine number of digits
max_val = max(arr)
# Process each digit
exp = 1 # 1, 10, 100, 1000, ...
result = arr[:]
while max_val // exp > 0:
result = counting_sort_by_digit(result, exp)
exp *= 10
return result
def counting_sort_by_digit(arr: List[int], exp: int) -> List[int]:
"""
Counting sort based on digit at position exp.
MUST be stable for Radix Sort to work!
"""
n = len(arr)
output = [0] * n
count = [0] * 10 # Digits 0-9
# Count occurrences of each digit
for num in arr:
digit = (num // exp) % 10
count[digit] += 1
# Cumulative sum
for i in range(1, 10):
count[i] += count[i - 1]
# Build output (right to left for stability)
for i in range(n - 1, -1, -1):
num = arr[i]
digit = (num // exp) % 10
count[digit] -= 1
output[count[digit]] = num
return output
def radix_sort_base256(arr: List[int]) -> List[int]:
"""
Radix Sort with base 256 (faster for 32-bit integers).
Only 4 passes for 32-bit integers instead of 10!
"""
if not arr:
return []
result = arr[:]
for byte_idx in range(4): # 4 bytes in 32-bit int
result = counting_sort_by_byte(result, byte_idx)
return result
def counting_sort_by_byte(arr: List[int], byte_idx: int) -> List[int]:
"""Sort by specific byte (0 = least significant)."""
n = len(arr)
output = [0] * n
count = [0] * 256
shift = byte_idx * 8
for num in arr:
byte_val = (num >> shift) & 0xFF
count[byte_val] += 1
for i in range(1, 256):
count[i] += count[i - 1]
for i in range(n - 1, -1, -1):
num = arr[i]
byte_val = (num >> shift) & 0xFF
count[byte_val] -= 1
output[count[byte_val]] = num
return output
))Production Code
HPN Engineering Standard
Implementation: Non-Comparison Sorts
from typing import List, Tuple, Callable import time import random
class NonComparisonSort: """Production-ready non-comparison sorting utilities."""
@staticmethod
def counting_sort(arr: List[int]) -> List[int]:
"""O(N + K) counting sort."""
if not arr:
return []
min_val = min(arr)
max_val = max(arr)
k = max_val - min_val + 1
count = [0] * k
for num in arr:
count[num - min_val] += 1
for i in range(1, k):
count[i] += count[i - 1]
output = [0] * len(arr)
for i in range(len(arr) - 1, -1, -1):
num = arr[i]
count[num - min_val] -= 1
output[count[num - min_val]] = num
return output
@staticmethod
def radix_sort_lsd(arr: List[int]) -> List[int]:
"""LSD Radix Sort (base 10)."""
if not arr:
return []
max_val = max(arr)
exp = 1
result = arr[:]
while max_val // exp > 0:
count = [0] * 10
output = [0] * len(result)
for num in result:
count[(num // exp) % 10] += 1
for i in range(1, 10):
count[i] += count[i - 1]
for i in range(len(result) - 1, -1, -1):
digit = (result[i] // exp) % 10
count[digit] -= 1
output[count[digit]] = result[i]
result = output
exp *= 10
return result
@staticmethod
def radix_sort_strings(arr: List[str], max_len: int = None) -> List[str]:
"""
Radix Sort for fixed-length strings (MSD).
Pads shorter strings with null character.
"""
if not arr:
return []
if max_len is None:
max_len = max(len(s) for s in arr)
# Pad strings
padded = [s.ljust(max_len, '\0') for s in arr]
# Sort by each character position (right to left = LSD)
for pos in range(max_len - 1, -1, -1):
# Counting sort by character at position
count = [0] * 256
output = [''] * len(padded)
for s in padded:
count[ord(s[pos])] += 1
for i in range(1, 256):
count[i] += count[i - 1]
for i in range(len(padded) - 1, -1, -1):
char = ord(padded[i][pos])
count[char] -= 1
output[count[char]] = padded[i]
padded = output
# Remove padding
return [s.rstrip('\0') for s in padded]
============================================
WHEN TO USE WHICH?
============================================
def analyze_and_recommend(arr: List[int]) -> str: """ Analyze array and recommend sorting algorithm. """ n = len(arr) if n == 0: return "Empty array"
min_val = min(arr)
max_val = max(arr)
range_size = max_val - min_val + 1
# Check if counting sort is viable
if min_val >= 0 and range_size <= 2 * n:
return f"Counting Sort (range {range_size} ≤ 2N)"
# Check if radix sort is viable
if min_val >= 0:
digits = len(str(max_val))
if digits <= 10: # 32-bit integers
return f"Radix Sort ({digits} digits)"
# Default to comparison sort
return "Timsort (built-in) - negative numbers or large range"
============================================
BENCHMARK
============================================
def benchmark_sorts(n: int = 100000, k: int = 1000): """Benchmark different sorting algorithms.""" print(f"Benchmarking on N={n}, K={k}")
# Generate data
arr = [random.randint(0, k) for _ in range(n)]
# Counting Sort
arr_copy = arr[:]
start = time.perf_counter()
NonComparisonSort.counting_sort(arr_copy)
counting_time = time.perf_counter() - start
print(f"Counting Sort: {counting_time*1000:.2f} ms")
# Radix Sort
arr_copy = arr[:]
start = time.perf_counter()
NonComparisonSort.radix_sort_lsd(arr_copy)
radix_time = time.perf_counter() - start
print(f"Radix Sort: {radix_time*1000:.2f} ms")
# Built-in (Timsort)
arr_copy = arr[:]
start = time.perf_counter()
arr_copy.sort()
timsort_time = time.perf_counter() - start
print(f"Timsort: {timsort_time*1000:.2f} ms")
return counting_time, radix_time, timsort_time
============================================
USAGE EXAMPLE
============================================
if name == "main": # Counting Sort print("=== Counting Sort ===") arr = [4, 2, 2, 8, 3, 3, 1] print(f"Input: {arr}") print(f"Output: {NonComparisonSort.counting_sort(arr)}")
# Radix Sort
print("\n=== Radix Sort ===")
arr = [170, 45, 75, 90, 802, 24, 2, 66]
print(f"Input: {arr}")
print(f"Output: {NonComparisonSort.radix_sort_lsd(arr)}")
# String Radix Sort
print("\n=== String Radix Sort ===")
strings = ["abc", "ab", "xyz", "xy", "a", "aaa"]
print(f"Input: {strings}")
print(f"Output: {NonComparisonSort.radix_sort_strings(strings)}")
# Recommendation
print("\n=== Algorithm Recommendation ===")
test_cases = [
[1, 5, 3, 2, 4],
list(range(1000000)),
[random.randint(0, 100) for _ in range(1000)],
[random.randint(-1000, 1000) for _ in range(1000)],
]
for arr in test_cases:
print(f"Array (N={len(arr)}, range=[{min(arr)}, {max(arr)}]):")
print(f" → {analyze_and_recommend(arr)}")
# Benchmark
print("\n=== Benchmark ===")
benchmark_sorts(100000, 1000)
))] benchmark_sorts(100000, 1000)
## Complexity Comparison
| Algorithm | Time | Space | Stable | In-place |
|-----------|------|-------|--------|----------|
| **Counting Sort** | O(N + K) | O(N + K) | ✅ | ❌ |
| **Radix Sort (base 10)** | O(D × N) | O(N) | ✅ | ❌ |
| **Radix Sort (base 256)** | O(4N) = O(N) | O(N) | ✅ | ❌ |
| **QuickSort** | O(N log N) | O(log N) | ❌ | ✅ |
| **Timsort** | O(N log N) | O(N) | ✅ | ❌ |
**D** = số digits, **K** = range của values
## When to Use Which?
```mermaid
graph TD
A["Sorting Problem"] --> B{"Values are integers?"}
B -->|"No"| C["Timsort (built-in)"]
B -->|"Yes"| D{"Range K <= 2N?"}
D -->|"Yes"| E["Counting Sort ✅"]
D -->|"No"| F{"Need negative numbers?"}
F -->|"Yes"| G["Timsort (built-in)"]
F -->|"No"| H{"Number of digits <= 10?"}
H -->|"Yes"| I["Radix Sort ✅"]
H -->|"No"| J["Timsort (built-in)"]
style E fill:#4ecdc4,color:#000
style I fill:#4ecdc4,color:#000Summary
| Use Case | Best Algorithm |
|---|---|
| Small integer range (K ≤ 2N) | Counting Sort |
| Large integers, positive | Radix Sort |
| Negative numbers | Timsort |
| Floating point | Timsort |
| Generic objects | Timsort |
| Fixed-length strings | Radix Sort (MSD) |
💡 HPN's Rule
"Counting khi range nhỏ. Radix khi integers lớn. Timsort cho mọi thứ khác. Đừng reinvent the wheel."