Giao diện
Profiling Intermediate
"Premature optimization is the root of all evil" - Nhưng profiling trước khi optimize là bắt buộc
Learning Outcomes
Sau khi hoàn thành trang này, bạn sẽ:
- 🎯 Sử dụng cProfile để tìm CPU bottlenecks
- 🎯 Dùng line_profiler để profile từng dòng code
- 🎯 Đọc và phân tích flame graphs
- 🎯 Xác định bottlenecks và quyết định optimize ở đâu
- 🎯 Tránh các Production Pitfalls khi profiling
Tại Sao Cần Profiling?
Profiling là quá trình đo lường hiệu năng của code để tìm ra phần nào chậm nhất. Đây là bước BẮT BUỘC trước khi optimize.
┌─────────────────────────────────────────────────────────┐
│ Profiling Workflow │
├─────────────────────────────────────────────────────────┤
│ 1. Measure First → Đừng đoán, hãy đo │
│ 2. Find Bottleneck → 80% time ở 20% code │
│ 3. Optimize → Chỉ optimize bottleneck │
│ 4. Measure Again → Verify improvement │
│ 5. Repeat → Cho đến khi đủ nhanh │
└─────────────────────────────────────────────────────────┘Pareto Principle (80/20 Rule)
Trong hầu hết các chương trình:
- 80% thời gian chạy ở 20% code
- Optimize 20% đó = cải thiện đáng kể
- Optimize 80% còn lại = lãng phí công sức
cProfile - Built-in Profiler
Basic Usage
python
import cProfile
import pstats
def slow_function():
total = 0
for i in range(1000000):
total += i ** 2
return total
def main():
for _ in range(10):
slow_function()
# Method 1: Command line
# python -m cProfile -s cumtime script.py
# Method 2: In code
if __name__ == "__main__":
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functionsOutput Explained
1000003 function calls in 2.345 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
10 2.300 0.230 2.300 0.230 script.py:4(slow_function)
1 0.045 0.045 2.345 2.345 script.py:10(main)
1 0.000 0.000 2.345 2.345 {built-in method builtins.exec}| Column | Meaning |
|---|---|
ncalls | Số lần function được gọi |
tottime | Tổng thời gian trong function (không tính sub-calls) |
percall | tottime / ncalls |
cumtime | Tổng thời gian bao gồm sub-calls |
percall | cumtime / ncalls |
Sorting Options
python
import pstats
# Sort by different criteria
stats.sort_stats('cumulative') # Tổng thời gian (bao gồm sub-calls)
stats.sort_stats('tottime') # Thời gian trong function
stats.sort_stats('calls') # Số lần gọi
stats.sort_stats('ncalls') # Alias cho calls
stats.sort_stats('time') # Alias cho tottime
# Multiple sort keys
stats.sort_stats('cumulative', 'calls')Filtering Results
python
import pstats
stats = pstats.Stats(profiler)
# Chỉ hiển thị functions từ file cụ thể
stats.print_stats('mymodule.py')
# Chỉ hiển thị functions match pattern
stats.print_stats('slow')
# Top N functions
stats.print_stats(10)
# Combine filters
stats.print_stats('mymodule', 10)Save và Load Profile Data
python
import cProfile
import pstats
# Save to file
cProfile.run('main()', 'profile_output.prof')
# Load và analyze
stats = pstats.Stats('profile_output.prof')
stats.strip_dirs() # Remove path prefixes
stats.sort_stats('cumulative')
stats.print_stats()
# Merge multiple profiles
stats = pstats.Stats('profile1.prof')
stats.add('profile2.prof')
stats.add('profile3.prof')Context Manager Pattern
python
import cProfile
import pstats
from contextlib import contextmanager
@contextmanager
def profile_block(name: str = "profile"):
"""Profile a block of code."""
profiler = cProfile.Profile()
profiler.enable()
try:
yield profiler
finally:
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
print(f"\n=== Profile: {name} ===")
stats.print_stats(15)
# Usage
with profile_block("data_processing"):
process_large_dataset()line_profiler - Line-by-Line Profiling
Installation
bash
pip install line_profilerBasic Usage
python
# script.py
from line_profiler import profile
@profile
def slow_function(n: int) -> int:
total = 0 # Line 1
for i in range(n): # Line 2
total += i ** 2 # Line 3
if i % 1000 == 0: # Line 4
total = int(total) # Line 5
return total # Line 6
if __name__ == "__main__":
result = slow_function(100000)
print(result)bash
# Run with kernprof
kernprof -l -v script.pyOutput Explained
Timer unit: 1e-06 s
Total time: 0.234567 s
File: script.py
Function: slow_function at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def slow_function(n: int) -> int:
6 1 2.0 2.0 0.0 total = 0
7 100001 45000.0 0.4 19.2 for i in range(n):
8 100000 150000.0 1.5 63.9 total += i ** 2
9 100000 35000.0 0.4 14.9 if i % 1000 == 0:
10 100 4567.0 45.7 1.9 total = int(total)
11 1 0.0 0.0 0.0 return total| Column | Meaning |
|---|---|
Hits | Số lần dòng được execute |
Time | Tổng thời gian (microseconds) |
Per Hit | Time / Hits |
% Time | Phần trăm tổng thời gian |
Programmatic Usage
python
from line_profiler import LineProfiler
def function_to_profile(n):
result = []
for i in range(n):
result.append(i ** 2)
return result
# Create profiler
lp = LineProfiler()
lp.add_function(function_to_profile)
# Wrap và run
lp_wrapper = lp(function_to_profile)
lp_wrapper(10000)
# Print results
lp.print_stats()Profile Multiple Functions
python
from line_profiler import LineProfiler
def helper_function(x):
return x ** 2
def main_function(n):
total = 0
for i in range(n):
total += helper_function(i)
return total
# Profile cả hai functions
lp = LineProfiler()
lp.add_function(helper_function)
lp.add_function(main_function)
lp_wrapper = lp(main_function)
lp_wrapper(10000)
lp.print_stats()Flame Graphs
Tại Sao Flame Graphs?
Flame graphs là cách trực quan hóa call stack và thời gian. Dễ đọc hơn text output.
┌─────────────────────────────────────────────────────────┐
│ main() │
├───────────────────────────────┬─────────────────────────┤
│ process_data() │ save_results() │
├─────────────┬─────────────────┤ │
│ parse() │ transform() │ │
├─────────────┴─────────────────┴─────────────────────────┤
│ Width = Time spent in function │
│ Height = Call stack depth │
└─────────────────────────────────────────────────────────┘py-spy - Sampling Profiler
bash
# Install
pip install py-spy
# Generate flame graph (SVG)
py-spy record -o profile.svg -- python script.py
# Profile running process
py-spy record -o profile.svg --pid 12345
# Top-like view
py-spy top --pid 12345snakeviz - Interactive Visualization
bash
# Install
pip install snakeviz
# Generate profile
python -m cProfile -o profile.prof script.py
# Visualize in browser
snakeviz profile.profScalene - Modern Profiler
bash
# Install
pip install scalene
# Run
scalene script.py
# Output includes:
# - CPU time (Python vs native)
# - Memory allocation
# - GPU time (if applicable)Bottleneck Identification
Common Bottleneck Patterns
Pattern 1: Nested Loops
python
# ❌ O(n²) - Bottleneck!
def find_duplicates_slow(items: list) -> list:
duplicates = []
for i, item in enumerate(items):
for j, other in enumerate(items):
if i != j and item == other and item not in duplicates:
duplicates.append(item)
return duplicates
# ✅ O(n) - Fixed
from collections import Counter
def find_duplicates_fast(items: list) -> list:
counts = Counter(items)
return [item for item, count in counts.items() if count > 1]Pattern 2: String Concatenation
python
# ❌ O(n²) - String immutability
def build_string_slow(n: int) -> str:
result = ""
for i in range(n):
result += str(i) # Creates new string each time!
return result
# ✅ O(n) - Use list then join
def build_string_fast(n: int) -> str:
parts = []
for i in range(n):
parts.append(str(i))
return "".join(parts)
# ✅ Even better - generator
def build_string_fastest(n: int) -> str:
return "".join(str(i) for i in range(n))Pattern 3: Repeated Lookups
python
# ❌ O(n) lookup in list
def count_in_list(items: list, targets: list) -> dict:
counts = {}
for target in targets:
counts[target] = items.count(target) # O(n) each time!
return counts
# ✅ O(1) lookup with Counter
from collections import Counter
def count_in_list_fast(items: list, targets: list) -> dict:
counter = Counter(items) # O(n) once
return {target: counter[target] for target in targets} # O(1) eachPattern 4: I/O in Loop
python
# ❌ Many small I/O operations
def write_lines_slow(lines: list[str], path: str) -> None:
for line in lines:
with open(path, 'a') as f: # Open/close each iteration!
f.write(line + '\n')
# ✅ Batch I/O
def write_lines_fast(lines: list[str], path: str) -> None:
with open(path, 'w') as f:
f.writelines(line + '\n' for line in lines)Profiling Decision Tree
┌─────────────────────────────────────────────────────────┐
│ Is it slow enough to matter? │
│ │ │
│ ┌───────────┴───────────┐ │
│ ▼ ▼ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ Profile with cProfile Don't optimize! │
│ │ │
│ ▼ │
│ Is bottleneck in your code? │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ line_profiler Check I/O, DB, │
│ to find line network, etc. │
│ │ │
│ ▼ │
│ Optimize that specific line │
└─────────────────────────────────────────────────────────┘Benchmarking với timeit
Basic Usage
python
import timeit
# Time a simple expression
time = timeit.timeit('sum(range(1000))', number=10000)
print(f"Total: {time:.4f}s, Per call: {time/10000*1000:.4f}ms")
# Time with setup
time = timeit.timeit(
'sorted(data)',
setup='import random; data = [random.random() for _ in range(1000)]',
number=1000
)Compare Implementations
python
import timeit
def method1(n):
return sum(i ** 2 for i in range(n))
def method2(n):
total = 0
for i in range(n):
total += i ** 2
return total
def method3(n):
return sum(map(lambda x: x ** 2, range(n)))
# Compare
n = 10000
for method in [method1, method2, method3]:
time = timeit.timeit(lambda: method(n), number=100)
print(f"{method.__name__}: {time:.4f}s")IPython Magic
python
# In Jupyter/IPython
%timeit sum(range(1000))
# 10000 loops, best of 5: 23.4 µs per loop
%timeit -n 1000 -r 5 expensive_function()
# -n: number of executions
# -r: number of repeats
# Time a cell
%%timeit
data = [i ** 2 for i in range(1000)]
sorted(data)Production Pitfalls
Pitfall 1: Profiling Overhead
python
# ❌ PROBLEM: Profiler adds overhead
# cProfile có overhead ~10-20%
# line_profiler có overhead ~100-500%
# ✅ FIX: Dùng sampling profiler cho production
# py-spy có overhead < 1%
# py-spy record --pid $(pgrep -f myapp) -o profile.svgPitfall 2: Profiling Wrong Environment
python
# ❌ PROBLEM: Profile trên dev machine
# - Khác hardware
# - Khác data size
# - Khác concurrent load
# ✅ FIX: Profile với production-like conditions
# - Same hardware specs
# - Same data volume
# - Same concurrent users
# Hoặc dùng py-spy trên production (low overhead)Pitfall 3: Micro-benchmarking Pitfalls
python
import timeit
# ❌ PROBLEM: Benchmark không realistic
def benchmark_bad():
# List quá nhỏ
data = [1, 2, 3]
return timeit.timeit(lambda: sorted(data), number=1000000)
# ✅ FIX: Benchmark với realistic data
def benchmark_good():
import random
data = [random.random() for _ in range(10000)]
return timeit.timeit(lambda: sorted(data.copy()), number=1000)Pitfall 4: Ignoring Warmup
python
# ❌ PROBLEM: First run includes JIT warmup, cache misses
def benchmark_no_warmup():
return timeit.timeit('func()', globals=globals(), number=1)
# ✅ FIX: Warmup trước khi benchmark
def benchmark_with_warmup():
# Warmup
for _ in range(10):
func()
# Actual benchmark
return timeit.timeit('func()', globals=globals(), number=100)Pitfall 5: Optimizing Wrong Thing
python
# ❌ PROBLEM: Optimize CPU khi bottleneck là I/O
def process_files_wrong():
for path in paths:
data = read_file(path) # 95% time here (I/O)
result = transform(data) # 5% time here (CPU)
write_file(result) # I/O
# Profile sẽ show transform() chiếm 5% time
# Optimize transform() = waste of time!
# ✅ FIX: Optimize I/O với async hoặc threading
import asyncio
import aiofiles
async def process_files_right():
tasks = [process_file(path) for path in paths]
await asyncio.gather(*tasks)Quick Reference
python
# === cProfile ===
python -m cProfile -s cumtime script.py
python -m cProfile -o output.prof script.py
import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
# ... code ...
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative').print_stats(20)
# === line_profiler ===
# Add @profile decorator to functions
kernprof -l -v script.py
# === py-spy (sampling) ===
py-spy record -o profile.svg -- python script.py
py-spy top --pid 12345
# === snakeviz (visualization) ===
python -m cProfile -o profile.prof script.py
snakeviz profile.prof
# === timeit ===
python -m timeit "sum(range(1000))"
timeit.timeit('func()', number=1000)
# === IPython ===
%timeit expression
%%timeit
# cell contentCross-links
- Prerequisites: Functions & Closures
- Next: Memory Optimization - slots, generators, memory_profiler
- Related: GIL & Threading - CPU-bound profiling
- Related: functools - lru_cache performance