Skip to content

Profiling Intermediate

"Premature optimization is the root of all evil" - Nhưng profiling trước khi optimize là bắt buộc

Learning Outcomes

Sau khi hoàn thành trang này, bạn sẽ:

  • 🎯 Sử dụng cProfile để tìm CPU bottlenecks
  • 🎯 Dùng line_profiler để profile từng dòng code
  • 🎯 Đọc và phân tích flame graphs
  • 🎯 Xác định bottlenecks và quyết định optimize ở đâu
  • 🎯 Tránh các Production Pitfalls khi profiling

Tại Sao Cần Profiling?

Profiling là quá trình đo lường hiệu năng của code để tìm ra phần nào chậm nhất. Đây là bước BẮT BUỘC trước khi optimize.

┌─────────────────────────────────────────────────────────┐
│                    Profiling Workflow                   │
├─────────────────────────────────────────────────────────┤
│  1. Measure First    →  Đừng đoán, hãy đo              │
│  2. Find Bottleneck  →  80% time ở 20% code            │
│  3. Optimize         →  Chỉ optimize bottleneck        │
│  4. Measure Again    →  Verify improvement             │
│  5. Repeat           →  Cho đến khi đủ nhanh           │
└─────────────────────────────────────────────────────────┘

Pareto Principle (80/20 Rule)

Trong hầu hết các chương trình:

  • 80% thời gian chạy ở 20% code
  • Optimize 20% đó = cải thiện đáng kể
  • Optimize 80% còn lại = lãng phí công sức

cProfile - Built-in Profiler

Basic Usage

python
import cProfile
import pstats

def slow_function():
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

def main():
    for _ in range(10):
        slow_function()

# Method 1: Command line
# python -m cProfile -s cumtime script.py

# Method 2: In code
if __name__ == "__main__":
    profiler = cProfile.Profile()
    profiler.enable()
    
    main()
    
    profiler.disable()
    stats = pstats.Stats(profiler)
    stats.sort_stats('cumulative')
    stats.print_stats(20)  # Top 20 functions

Output Explained

         1000003 function calls in 2.345 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    2.300    0.230    2.300    0.230 script.py:4(slow_function)
        1    0.045    0.045    2.345    2.345 script.py:10(main)
        1    0.000    0.000    2.345    2.345 {built-in method builtins.exec}
ColumnMeaning
ncallsSố lần function được gọi
tottimeTổng thời gian trong function (không tính sub-calls)
percalltottime / ncalls
cumtimeTổng thời gian bao gồm sub-calls
percallcumtime / ncalls

Sorting Options

python
import pstats

# Sort by different criteria
stats.sort_stats('cumulative')  # Tổng thời gian (bao gồm sub-calls)
stats.sort_stats('tottime')     # Thời gian trong function
stats.sort_stats('calls')       # Số lần gọi
stats.sort_stats('ncalls')      # Alias cho calls
stats.sort_stats('time')        # Alias cho tottime

# Multiple sort keys
stats.sort_stats('cumulative', 'calls')

Filtering Results

python
import pstats

stats = pstats.Stats(profiler)

# Chỉ hiển thị functions từ file cụ thể
stats.print_stats('mymodule.py')

# Chỉ hiển thị functions match pattern
stats.print_stats('slow')

# Top N functions
stats.print_stats(10)

# Combine filters
stats.print_stats('mymodule', 10)

Save và Load Profile Data

python
import cProfile
import pstats

# Save to file
cProfile.run('main()', 'profile_output.prof')

# Load và analyze
stats = pstats.Stats('profile_output.prof')
stats.strip_dirs()  # Remove path prefixes
stats.sort_stats('cumulative')
stats.print_stats()

# Merge multiple profiles
stats = pstats.Stats('profile1.prof')
stats.add('profile2.prof')
stats.add('profile3.prof')

Context Manager Pattern

python
import cProfile
import pstats
from contextlib import contextmanager

@contextmanager
def profile_block(name: str = "profile"):
    """Profile a block of code."""
    profiler = cProfile.Profile()
    profiler.enable()
    try:
        yield profiler
    finally:
        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        print(f"\n=== Profile: {name} ===")
        stats.print_stats(15)

# Usage
with profile_block("data_processing"):
    process_large_dataset()

line_profiler - Line-by-Line Profiling

Installation

bash
pip install line_profiler

Basic Usage

python
# script.py
from line_profiler import profile

@profile
def slow_function(n: int) -> int:
    total = 0                    # Line 1
    for i in range(n):           # Line 2
        total += i ** 2          # Line 3
        if i % 1000 == 0:        # Line 4
            total = int(total)   # Line 5
    return total                 # Line 6

if __name__ == "__main__":
    result = slow_function(100000)
    print(result)
bash
# Run with kernprof
kernprof -l -v script.py

Output Explained

Timer unit: 1e-06 s

Total time: 0.234567 s
File: script.py
Function: slow_function at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def slow_function(n: int) -> int:
     6         1          2.0      2.0      0.0      total = 0
     7    100001      45000.0      0.4     19.2      for i in range(n):
     8    100000     150000.0      1.5     63.9          total += i ** 2
     9    100000      35000.0      0.4     14.9          if i % 1000 == 0:
    10       100       4567.0     45.7      1.9              total = int(total)
    11         1          0.0      0.0      0.0      return total
ColumnMeaning
HitsSố lần dòng được execute
TimeTổng thời gian (microseconds)
Per HitTime / Hits
% TimePhần trăm tổng thời gian

Programmatic Usage

python
from line_profiler import LineProfiler

def function_to_profile(n):
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result

# Create profiler
lp = LineProfiler()
lp.add_function(function_to_profile)

# Wrap và run
lp_wrapper = lp(function_to_profile)
lp_wrapper(10000)

# Print results
lp.print_stats()

Profile Multiple Functions

python
from line_profiler import LineProfiler

def helper_function(x):
    return x ** 2

def main_function(n):
    total = 0
    for i in range(n):
        total += helper_function(i)
    return total

# Profile cả hai functions
lp = LineProfiler()
lp.add_function(helper_function)
lp.add_function(main_function)

lp_wrapper = lp(main_function)
lp_wrapper(10000)
lp.print_stats()

Flame Graphs

Tại Sao Flame Graphs?

Flame graphs là cách trực quan hóa call stack và thời gian. Dễ đọc hơn text output.

┌─────────────────────────────────────────────────────────┐
│                        main()                           │
├───────────────────────────────┬─────────────────────────┤
│        process_data()         │      save_results()     │
├─────────────┬─────────────────┤                         │
│  parse()    │   transform()   │                         │
├─────────────┴─────────────────┴─────────────────────────┤
│  Width = Time spent in function                         │
│  Height = Call stack depth                              │
└─────────────────────────────────────────────────────────┘

py-spy - Sampling Profiler

bash
# Install
pip install py-spy

# Generate flame graph (SVG)
py-spy record -o profile.svg -- python script.py

# Profile running process
py-spy record -o profile.svg --pid 12345

# Top-like view
py-spy top --pid 12345

snakeviz - Interactive Visualization

bash
# Install
pip install snakeviz

# Generate profile
python -m cProfile -o profile.prof script.py

# Visualize in browser
snakeviz profile.prof

Scalene - Modern Profiler

bash
# Install
pip install scalene

# Run
scalene script.py

# Output includes:
# - CPU time (Python vs native)
# - Memory allocation
# - GPU time (if applicable)

Bottleneck Identification

Common Bottleneck Patterns

Pattern 1: Nested Loops

python
# ❌ O(n²) - Bottleneck!
def find_duplicates_slow(items: list) -> list:
    duplicates = []
    for i, item in enumerate(items):
        for j, other in enumerate(items):
            if i != j and item == other and item not in duplicates:
                duplicates.append(item)
    return duplicates

# ✅ O(n) - Fixed
from collections import Counter

def find_duplicates_fast(items: list) -> list:
    counts = Counter(items)
    return [item for item, count in counts.items() if count > 1]

Pattern 2: String Concatenation

python
# ❌ O(n²) - String immutability
def build_string_slow(n: int) -> str:
    result = ""
    for i in range(n):
        result += str(i)  # Creates new string each time!
    return result

# ✅ O(n) - Use list then join
def build_string_fast(n: int) -> str:
    parts = []
    for i in range(n):
        parts.append(str(i))
    return "".join(parts)

# ✅ Even better - generator
def build_string_fastest(n: int) -> str:
    return "".join(str(i) for i in range(n))

Pattern 3: Repeated Lookups

python
# ❌ O(n) lookup in list
def count_in_list(items: list, targets: list) -> dict:
    counts = {}
    for target in targets:
        counts[target] = items.count(target)  # O(n) each time!
    return counts

# ✅ O(1) lookup with Counter
from collections import Counter

def count_in_list_fast(items: list, targets: list) -> dict:
    counter = Counter(items)  # O(n) once
    return {target: counter[target] for target in targets}  # O(1) each

Pattern 4: I/O in Loop

python
# ❌ Many small I/O operations
def write_lines_slow(lines: list[str], path: str) -> None:
    for line in lines:
        with open(path, 'a') as f:  # Open/close each iteration!
            f.write(line + '\n')

# ✅ Batch I/O
def write_lines_fast(lines: list[str], path: str) -> None:
    with open(path, 'w') as f:
        f.writelines(line + '\n' for line in lines)

Profiling Decision Tree

┌─────────────────────────────────────────────────────────┐
│                 Is it slow enough to matter?            │
│                          │                              │
│              ┌───────────┴───────────┐                  │
│              ▼                       ▼                  │
│            YES                      NO                  │
│              │                       │                  │
│              ▼                       ▼                  │
│     Profile with cProfile      Don't optimize!         │
│              │                                          │
│              ▼                                          │
│     Is bottleneck in your code?                        │
│              │                                          │
│     ┌────────┴────────┐                                │
│     ▼                 ▼                                │
│    YES               NO                                │
│     │                 │                                │
│     ▼                 ▼                                │
│  line_profiler    Check I/O, DB,                      │
│  to find line     network, etc.                       │
│     │                                                  │
│     ▼                                                  │
│  Optimize that specific line                          │
└─────────────────────────────────────────────────────────┘

Benchmarking với timeit

Basic Usage

python
import timeit

# Time a simple expression
time = timeit.timeit('sum(range(1000))', number=10000)
print(f"Total: {time:.4f}s, Per call: {time/10000*1000:.4f}ms")

# Time with setup
time = timeit.timeit(
    'sorted(data)',
    setup='import random; data = [random.random() for _ in range(1000)]',
    number=1000
)

Compare Implementations

python
import timeit

def method1(n):
    return sum(i ** 2 for i in range(n))

def method2(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

def method3(n):
    return sum(map(lambda x: x ** 2, range(n)))

# Compare
n = 10000
for method in [method1, method2, method3]:
    time = timeit.timeit(lambda: method(n), number=100)
    print(f"{method.__name__}: {time:.4f}s")

IPython Magic

python
# In Jupyter/IPython
%timeit sum(range(1000))
# 10000 loops, best of 5: 23.4 µs per loop

%timeit -n 1000 -r 5 expensive_function()
# -n: number of executions
# -r: number of repeats

# Time a cell
%%timeit
data = [i ** 2 for i in range(1000)]
sorted(data)

Production Pitfalls

Pitfall 1: Profiling Overhead

python
# ❌ PROBLEM: Profiler adds overhead
# cProfile có overhead ~10-20%
# line_profiler có overhead ~100-500%

# ✅ FIX: Dùng sampling profiler cho production
# py-spy có overhead < 1%

# py-spy record --pid $(pgrep -f myapp) -o profile.svg

Pitfall 2: Profiling Wrong Environment

python
# ❌ PROBLEM: Profile trên dev machine
# - Khác hardware
# - Khác data size
# - Khác concurrent load

# ✅ FIX: Profile với production-like conditions
# - Same hardware specs
# - Same data volume
# - Same concurrent users

# Hoặc dùng py-spy trên production (low overhead)

Pitfall 3: Micro-benchmarking Pitfalls

python
import timeit

# ❌ PROBLEM: Benchmark không realistic
def benchmark_bad():
    # List quá nhỏ
    data = [1, 2, 3]
    return timeit.timeit(lambda: sorted(data), number=1000000)

# ✅ FIX: Benchmark với realistic data
def benchmark_good():
    import random
    data = [random.random() for _ in range(10000)]
    return timeit.timeit(lambda: sorted(data.copy()), number=1000)

Pitfall 4: Ignoring Warmup

python
# ❌ PROBLEM: First run includes JIT warmup, cache misses
def benchmark_no_warmup():
    return timeit.timeit('func()', globals=globals(), number=1)

# ✅ FIX: Warmup trước khi benchmark
def benchmark_with_warmup():
    # Warmup
    for _ in range(10):
        func()
    
    # Actual benchmark
    return timeit.timeit('func()', globals=globals(), number=100)

Pitfall 5: Optimizing Wrong Thing

python
# ❌ PROBLEM: Optimize CPU khi bottleneck là I/O
def process_files_wrong():
    for path in paths:
        data = read_file(path)      # 95% time here (I/O)
        result = transform(data)     # 5% time here (CPU)
        write_file(result)           # I/O

# Profile sẽ show transform() chiếm 5% time
# Optimize transform() = waste of time!

# ✅ FIX: Optimize I/O với async hoặc threading
import asyncio
import aiofiles

async def process_files_right():
    tasks = [process_file(path) for path in paths]
    await asyncio.gather(*tasks)

Quick Reference

python
# === cProfile ===
python -m cProfile -s cumtime script.py
python -m cProfile -o output.prof script.py

import cProfile, pstats
profiler = cProfile.Profile()
profiler.enable()
# ... code ...
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative').print_stats(20)

# === line_profiler ===
# Add @profile decorator to functions
kernprof -l -v script.py

# === py-spy (sampling) ===
py-spy record -o profile.svg -- python script.py
py-spy top --pid 12345

# === snakeviz (visualization) ===
python -m cProfile -o profile.prof script.py
snakeviz profile.prof

# === timeit ===
python -m timeit "sum(range(1000))"
timeit.timeit('func()', number=1000)

# === IPython ===
%timeit expression
%%timeit
# cell content