Memory Optimization — Khi mỗi megabyte đều có giá

Một microservice quản lý session chạy ổn định 3 tháng đầu. Tháng thứ 4, Kubernetes liên tục restart pod vì OOMKilled — bộ nhớ tăng dần từ 512MB lên 2GB rồi bị kill. Đội ngũ dành 2 tuần debug, cuối cùng phát hiện: mỗi UserSession object tốn 400 bytes vì __dict__, nhân với 2 triệu session đồng thời = 800MB chỉ riêng dict overhead. Thêm __slots__ vào class, bộ nhớ giảm 60%, pod ổn định.

Bộ nhớ trong Python là tài nguyên "vô hình nhưng đắt đỏ". Bạn không thấy nó cho đến khi container bị kill, GC pause 200ms làm tăng p99 latency, hoặc cloud bill cuối tháng nhảy vọt vì phải scale vertical. Bài này trang bị cho bạn toàn bộ kỹ thuật kiểm soát bộ nhớ — từ hiểu memory layout CPython, đến phát hiện leak, đến tối ưu cho service chạy dài hạn.

Bức tranh tư duy

Hãy hình dung bộ nhớ Python như một kho hàng logistics. Mỗi object Python là một kiện hàng, nhưng mỗi kiện không chỉ chứa hàng thật (data) mà còn đi kèm thùng carton (overhead). Một số float 3.14 chỉ cần 8 bytes để lưu — nhưng Python bọc nó trong object header 16 bytes + type pointer 8 bytes = tổng 28 bytes. Overhead gấp 3.5 lần data thật.

┌──────────────────────────────────────────────────┐
│  Object Python thông thường (có __dict__)        │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐   │
│  │ ob_refcnt│  │ ob_type  │  │ __dict__     │   │
│  │ (8 bytes)│  │ (8 bytes)│  │ (hash table  │   │
│  │          │  │          │  │  ~100 bytes)  │   │
│  └──────────┘  └──────────┘  └──────────────┘   │
│  + instance data (x, y, z...)                    │
├──────────────────────────────────────────────────┤
│  Object Python với __slots__                     │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ ob_refcnt│  │ ob_type  │  │ slot x   │       │
│  │ (8 bytes)│  │ (8 bytes)│  │ (8 bytes)│       │
│  └──────────┘  └──────────┘  └──────────┘       │
│  KHÔNG có __dict__ → tiết kiệm ~100 bytes/obj   │
└──────────────────────────────────────────────────┘

Analogy kho hàng: __dict__ như mỗi kiện hàng đều kèm danh sách tên trường riêng (hash table). __slots__ như kho hàng in sẵn form trên kệ — tất cả kiện cùng loại dùng chung layout, không cần danh sách riêng. Analogy này breakdown khi bạn cần thêm attribute động (ví dụ plugin system) — lúc đó __dict__ linh hoạt hơn.

Cốt lõi kỹ thuật

Memory layout trong CPython

Mọi object Python đều bắt đầu bằng PyObject header: ob_refcnt (reference count) + ob_type (con trỏ đến type). Object có thêm __dict__ cho mỗi instance attribute.

python

import sys

class Point:
    def __init__(self, x: float, y: float):
        self.x = x
        self.y = y

p = Point(1.0, 2.0)
print(f"Object size (shallow): {sys.getsizeof(p)} bytes")      # ~48 bytes
print(f"__dict__ size:         {sys.getsizeof(p.__dict__)} bytes")  # ~104 bytes
# Tổng thực tế: ~152 bytes cho 2 số float (16 bytes data)

slots — Loại bỏ dict overhead

__slots__ khai báo danh sách attribute cố định. CPython lưu chúng dưới dạng descriptor trên class, trỏ thẳng vào offset cố định trong instance — không cần hash table.

python

import sys

class PointSlotted:
    __slots__ = ("x", "y")

    def __init__(self, x: float, y: float):
        self.x = x
        self.y = y

p = PointSlotted(1.0, 2.0)
print(f"Object size: {sys.getsizeof(p)} bytes")  # ~48 bytes tổng
# Không có __dict__ → tiết kiệm ~104 bytes/object

So sánh ở quy mô lớn:

python

import sys

class UserDict:
    def __init__(self, uid: int, name: str, email: str, role: str):
        self.uid = uid
        self.name = name
        self.email = email
        self.role = role

class UserSlots:
    __slots__ = ("uid", "name", "email", "role")

    def __init__(self, uid: int, name: str, email: str, role: str):
        self.uid = uid
        self.name = name
        self.email = email
        self.role = role

N = 1_000_000
users_dict = [UserDict(i, f"user{i}", f"u{i}@ex.com", "member") for i in range(N)]
users_slots = [UserSlots(i, f"user{i}", f"u{i}@ex.com", "member") for i in range(N)]

# Ước lượng: __dict__ version ~400MB, __slots__ version ~150MB
# Giảm ~60% bộ nhớ chỉ bằng 1 dòng __slots__

slots với kế thừa

python

class Base:
    __slots__ = ("x",)

class Derived(Base):
    __slots__ = ("y",)  # CHỈ khai báo attribute MỚI

    def __init__(self, x: float, y: float):
        self.x = x  # Từ Base
        self.y = y  # Từ Derived

# Cạm bẫy: parent không có __slots__
class ParentNoSlots:
    pass

class ChildSlots(ParentNoSlots):
    __slots__ = ("a",)

c = ChildSlots()
c.a = 1
c.b = 2  # Vẫn được! __dict__ kế thừa từ parent

weakref — Tham chiếu không giữ sống object

weakref tạo tham chiếu mà không tăng reference count — cho phép garbage collector thu hồi object khi không còn strong reference nào.

python

import weakref

class ExpensiveResource:
    __slots__ = ("data", "__weakref__")  # Phải khai báo __weakref__ nếu dùng __slots__

    def __init__(self, data: bytes):
        self.data = data

# Cache không ngăn GC thu hồi
cache: dict[str, weakref.ref] = {}

def get_resource(key: str) -> ExpensiveResource:
    ref = cache.get(key)
    if ref is not None:
        obj = ref()
        if obj is not None:
            return obj

    obj = ExpensiveResource(load_from_disk(key))
    cache[key] = weakref.ref(obj)
    return obj

WeakValueDictionary đơn giản hóa pattern này:

python

import weakref

class Connection:
    __slots__ = ("host", "port", "__weakref__")

    def __init__(self, host: str, port: int):
        self.host = host
        self.port = port

pool = weakref.WeakValueDictionary()

def get_connection(host: str, port: int) -> Connection:
    key = f"{host}:{port}"
    conn = pool.get(key)
    if conn is None:
        conn = Connection(host, port)
        pool[key] = conn
    return conn

# Khi không còn strong reference → connection tự bị GC

gc module — Kiểm soát garbage collector

CPython dùng reference counting + cycle detector (cho circular reference). Module gc cho phép kiểm soát cycle detector.

python

import gc

# Thống kê GC
print(gc.get_stats())
# [{'collections': 120, 'collected': 450, 'uncollectable': 0}, ...]
# Generation 0, 1, 2

# Ngưỡng trigger GC
print(gc.get_threshold())  # (700, 10, 10) mặc định
# Gen 0 trigger khi alloc - dealloc > 700
# Gen 1 trigger sau mỗi 10 lần gen 0
# Gen 2 trigger sau mỗi 10 lần gen 1

# Tắt GC tạm thời (tránh GC pause trong latency-critical code)
gc.disable()
process_latency_sensitive_batch()
gc.enable()
gc.collect()  # Thu gom thủ công sau khi xong

tracemalloc — Theo dõi cấp phát bộ nhớ

tracemalloc là module built-in theo dõi từng dòng code cấp phát bao nhiêu bộ nhớ. Không cần cài thêm package.

python

import tracemalloc

tracemalloc.start()

# Code cần theo dõi
data = [dict(id=i, name=f"item_{i}") for i in range(100_000)]

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

print("Top 5 dòng cấp phát bộ nhớ nhiều nhất:")
for stat in top_stats[:5]:
    print(f"  {stat}")

So sánh 2 snapshot để phát hiện leak:

python

import tracemalloc

tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()

# Đoạn code nghi ngờ leak
for _ in range(1000):
    process_request()

snapshot_after = tracemalloc.take_snapshot()
diff = snapshot_after.compare_to(snapshot_before, "lineno")

print("Tăng bộ nhớ nhiều nhất:")
for stat in diff[:10]:
    if stat.size_diff > 0:
        print(f"  +{stat.size_diff // 1024}KB: {stat}")

Generator vs List — Tiết kiệm bộ nhớ triệt để

python

import sys

# List: cấp phát TOÀN BỘ vào RAM
squares_list = [i ** 2 for i in range(1_000_000)]
print(f"List: {sys.getsizeof(squares_list) / 1024 / 1024:.1f} MB")  # ~8 MB

# Generator: chỉ tính khi cần, bộ nhớ hằng số
squares_gen = (i ** 2 for i in range(1_000_000))
print(f"Generator: {sys.getsizeof(squares_gen)} bytes")  # ~200 bytes

Pipeline generator cho xử lý file lớn:

python

from typing import Iterator

def read_lines(path: str) -> Iterator[str]:
    with open(path, "r") as f:
        for line in f:
            yield line.strip()

def filter_nonempty(lines: Iterator[str]) -> Iterator[str]:
    return (line for line in lines if line)

def parse_records(lines: Iterator[str]) -> Iterator[dict]:
    for line in lines:
        parts = line.split(",")
        yield {"id": parts[0], "value": parts[1]}

# Xử lý file 10GB với bộ nhớ hằng số
pipeline = parse_records(filter_nonempty(read_lines("huge_file.csv")))
for record in pipeline:
    process(record)

Thực chiến

Tình huống: Service quản lý session OOMKilled sau 72 giờ

Bối cảnh: Microservice Python xử lý websocket connections, lưu trạng thái session trong memory. Container giới hạn 1GB RAM, sau 72 giờ liên tục bị OOMKilled.

Mục tiêu: Giảm memory footprint để service chạy ổn định vô thời hạn.

python

import tracemalloc
import weakref
import gc
import logging
from typing import Optional

logger = logging.getLogger(__name__)

# TRƯỚC: Mỗi session tốn ~400 bytes (có __dict__)
# class Session:
#     def __init__(self, sid, user_id, data):
#         self.sid = sid
#         self.user_id = user_id
#         self.data = data
#         self.created_at = time.time()
#         self.last_active = time.time()

# SAU: ~160 bytes/session (có __slots__)
class Session:
    __slots__ = (
        "sid", "user_id", "data", "created_at",
        "last_active", "__weakref__",
    )

    def __init__(
        self,
        sid: str,
        user_id: int,
        data: dict,
    ) -> None:
        self.sid = sid
        self.user_id = user_id
        self.data = data
        self.created_at = _now()
        self.last_active = _now()


class SessionManager:
    """Quản lý session với memory monitoring tích hợp."""

    def __init__(self, max_sessions: int = 500_000) -> None:
        self._sessions: dict[str, Session] = {}
        self._max = max_sessions
        tracemalloc.start()
        self._baseline = tracemalloc.take_snapshot()

    def create(self, sid: str, user_id: int, data: dict) -> Session:
        if len(self._sessions) >= self._max:
            self._evict_oldest(count=len(self._sessions) // 10)

        session = Session(sid, user_id, data)
        self._sessions[sid] = session
        return session

    def get(self, sid: str) -> Optional[Session]:
        session = self._sessions.get(sid)
        if session is not None:
            session.last_active = _now()
        return session

    def remove(self, sid: str) -> None:
        self._sessions.pop(sid, None)

    def _evict_oldest(self, count: int) -> None:
        sorted_sessions = sorted(
            self._sessions.items(),
            key=lambda kv: kv[1].last_active,
        )
        for sid, _ in sorted_sessions[:count]:
            del self._sessions[sid]
        gc.collect()  # Thu hồi ngay sau eviction
        logger.info("Evicted %d sessions, active: %d", count, len(self._sessions))

    def memory_report(self) -> str:
        snapshot = tracemalloc.take_snapshot()
        diff = snapshot.compare_to(self._baseline, "lineno")
        lines = [f"Sessions: {len(self._sessions)}"]
        for stat in diff[:5]:
            if stat.size_diff > 0:
                lines.append(f"  +{stat.size_diff // 1024}KB: {stat}")
        return "\n".join(lines)


def _now() -> float:
    import time
    return time.time()

Phân tích:

__slots__ giảm mỗi session từ ~400 bytes xuống ~160 bytes → tiết kiệm 60%
Với 500K session: từ 200MB xuống 80MB chỉ riêng session objects
_evict_oldest loại bỏ 10% session cũ nhất khi chạm giới hạn
tracemalloc tích hợp cho phép export memory report qua health check endpoint
gc.collect() sau eviction đảm bảo thu hồi bộ nhớ ngay lập tức

Sai lầm điển hình

❌ Sai lầm 1: Cache không giới hạn kích thước

Vấn đề: Dict cache tăng vô hạn theo thời gian.

python

# SAI: Cache không có giới hạn
_cache: dict[str, dict] = {}

def get_user_profile(user_id: str) -> dict:
    if user_id not in _cache:
        _cache[user_id] = fetch_from_db(user_id)
    return _cache[user_id]

# Sau 6 tháng: _cache chứa 5 triệu entries → 2GB RAM

Tại sao sai: Trong production, số lượng unique key tăng liên tục. Không có eviction = memory leak có chủ đích. Đây là nguyên nhân OOMKilled phổ biến nhất cho Python service.

python

# ĐÚNG: Cache có giới hạn
from functools import lru_cache

@lru_cache(maxsize=10_000)
def get_user_profile(user_id: str) -> dict:
    return fetch_from_db(user_id)

# Hoặc TTL cache
from cachetools import TTLCache

_cache = TTLCache(maxsize=10_000, ttl=300)  # 5 phút TTL

def get_user_profile(user_id: str) -> dict:
    if user_id not in _cache:
        _cache[user_id] = fetch_from_db(user_id)
    return _cache[user_id]

❌ Sai lầm 2: Circular reference với del

Vấn đề: Circular reference kết hợp __del__ khiến GC không thu hồi được.

python

# SAI: Circular reference
class Parent:
    def __init__(self) -> None:
        self.children: list[Child] = []

    def __del__(self) -> None:
        print("Parent deleted")

class Child:
    def __init__(self, parent: Parent) -> None:
        self.parent = parent  # Strong reference ngược → cycle!
        parent.children.append(self)

p = Parent()
c = Child(p)
del p, c
# Với Python < 3.4: KHÔNG thu hồi được!
# Python >= 3.4: GC xử lý được nhưng __del__ gây unpredictable order

Tại sao sai: Cycle detector phải phá vòng tham chiếu, nhưng thứ tự gọi __del__ không xác định — có thể truy cập object đã bị thu hồi một phần.

python

# ĐÚNG: weakref phá vòng
import weakref

class Parent:
    def __init__(self) -> None:
        self.children: list[Child] = []

class Child:
    __slots__ = ("_parent_ref", "__weakref__")

    def __init__(self, parent: Parent) -> None:
        self._parent_ref = weakref.ref(parent)
        parent.children.append(self)

    @property
    def parent(self) -> Parent | None:
        return self._parent_ref()

❌ Sai lầm 3: Load toàn bộ file vào RAM

Vấn đề: Đọc file lớn bằng .read() hoặc .readlines().

python

# SAI: Load toàn bộ 5GB file vào RAM
def count_errors(log_path: str) -> int:
    content = open(log_path).read()  # 5GB vào RAM!
    return content.count("ERROR")

Tại sao sai: File 5GB = 5GB RAM + overhead. Container 2GB sẽ bị OOMKilled ngay lập tức.

python

# ĐÚNG: Stream từng dòng, bộ nhớ hằng số
def count_errors(log_path: str) -> int:
    count = 0
    with open(log_path, "r") as f:
        for line in f:  # Iterator, không load toàn bộ
            if "ERROR" in line:
                count += 1
    return count

# Hoặc dùng mmap cho tìm kiếm nhanh
import mmap

def count_errors_mmap(log_path: str) -> int:
    with open(log_path, "r+b") as f:
        mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        count = 0
        pos = 0
        while True:
            pos = mm.find(b"ERROR", pos)
            if pos == -1:
                break
            count += 1
            pos += 5
        mm.close()
    return count

❌ Sai lầm 4: Exception giữ reference đến biến lớn

Vấn đề: Traceback trong exception object giữ strong reference đến tất cả local variable trong frame.

python

# SAI: huge_data sống sót qua exception
def process() -> None:
    huge_data = load_data()  # 500MB
    try:
        transform(huge_data)
    except Exception as exc:
        logger.error("Failed: %s", exc)
        # exc.__traceback__ → frame → huge_data vẫn sống!
        raise

Tại sao sai: Biến exc (hoặc sys.exc_info()) giữ traceback chứa reference đến toàn bộ locals của frame. Nếu exception bị lưu ở đâu đó, 500MB không bao giờ được thu hồi.

python

# ĐÚNG: Xóa data trước khi raise, hoặc dùng raise from
def process() -> None:
    huge_data = load_data()
    try:
        result = transform(huge_data)
    except Exception:
        del huge_data  # Giải phóng trước khi raise
        raise
    del huge_data
    return result

Under the Hood

CPython object memory layout chi tiết

Mọi object CPython bắt đầu bằng PyObject_HEAD:

PyObject (cơ bản):
┌─────────────┬─────────────┐
│ ob_refcnt   │ *ob_type    │   = 16 bytes header
│ (Py_ssize_t)│ (PyTypeObj*)│
└─────────────┴─────────────┘

PyVarObject (có kích thước biến đổi: list, tuple, str...):
┌─────────────┬─────────────┬─────────────┐
│ ob_refcnt   │ *ob_type    │ ob_size     │   = 24 bytes header
│             │             │ (Py_ssize_t)│
└─────────────┴─────────────┴─────────────┘

Chi phí thực đo cho các kiểu phổ biến:

python

import sys

# Kích thước object (shallow — không tính referenced objects)
print(f"int (0):       {sys.getsizeof(0)} bytes")        # 28
print(f"int (2**30):   {sys.getsizeof(2**30)} bytes")    # 32
print(f"float:         {sys.getsizeof(3.14)} bytes")     # 24
print(f"str (rỗng):    {sys.getsizeof('')} bytes")       # 49
print(f"str (10 char): {sys.getsizeof('a' * 10)} bytes") # 59
print(f"list (rỗng):   {sys.getsizeof([])} bytes")       # 56
print(f"dict (rỗng):   {sys.getsizeof(dict())} bytes")   # 64
print(f"tuple (rỗng):  {sys.getsizeof(())} bytes")       # 40
print(f"set (rỗng):    {sys.getsizeof(set())} bytes")    # 216

Garbage collection generations

CPython chia object thành 3 generation:

Gen 0: Object mới tạo → GC chạy thường xuyên nhất
Gen 1: Sống sót qua gen 0 → GC chạy ít hơn
Gen 2: Sống sót qua gen 1 → GC chạy rất hiếm

Hypothesis: "hầu hết object chết trẻ" (generational hypothesis). Object sống qua nhiều cycle GC thường là long-lived → không cần kiểm tra thường xuyên.

python

import gc

# GC trigger threshold
thresholds = gc.get_threshold()
print(f"Gen 0: mỗi {thresholds[0]} alloc")  # 700
print(f"Gen 1: mỗi {thresholds[1]} lần gen 0")  # 10
print(f"Gen 2: mỗi {thresholds[2]} lần gen 1")  # 10

# Trong latency-critical code, có thể tăng threshold
gc.set_threshold(50_000, 50, 20)

# Hoặc tắt GC, chạy manual trong maintenance window
gc.disable()
# ... xử lý latency-critical ...
gc.enable()
collected = gc.collect()
print(f"Thu hồi {collected} object")

So sánh chi phí bộ nhớ các pattern

Pattern	Memory/object	Ghi chú
`class` (có `__dict__`)	~300-400 bytes	Linh hoạt, overhead cao
`class` (có `__slots__`)	~100-160 bytes	Cố định attribute, tiết kiệm 60%
`namedtuple`	~72 bytes	Immutable, nhẹ hơn class
`dataclass(slots=True)`	~100-160 bytes	Như slots + tiện ích
`dict`	~200-400 bytes	Linh hoạt nhất, nặng nhất
`tuple`	~56 + 8/item	Nhẹ nhất, không có key

Trade-offs

Kỹ thuật	Ưu điểm	Hạn chế
`__slots__`	Giảm 60% RAM, truy cập nhanh hơn	Không thêm attribute động, phức tạp với MI
Generator	Bộ nhớ hằng số	Không random access, chỉ duyệt 1 lần
`weakref`	Không ngăn GC	Phải kiểm tra None, thêm complexity
`gc.disable()`	Không GC pause	Phải manual collect, circular ref tích tụ
`mmap`	Không load toàn bộ file	Chỉ binary, platform-dependent behavior

Checklist ghi nhớ

✅ Checklist triển khai

Thiết kế class tiết kiệm bộ nhớ

[ ] Dùng __slots__ cho class tạo hàng nghìn instance trở lên
[ ] Khai báo __weakref__ trong slots nếu cần weak reference
[ ] Dùng dataclass(slots=True) thay vì tự viết slots boilerplate
[ ] Ưu tiên namedtuple hoặc tuple cho data immutable nhỏ

Xử lý dữ liệu lớn

[ ] Stream file bằng generator/iterator thay vì .read() hoặc .readlines()
[ ] Dùng itertools.chain, islice thay vì nối list
[ ] Chỉ định dtype chính xác khi dùng pandas (int32, category)

Phát hiện và ngăn memory leak

[ ] Giới hạn kích thước mọi cache (lru_cache(maxsize=...), TTLCache)
[ ] Dùng weakref phá circular reference trong tree/graph structures
[ ] Chạy tracemalloc so sánh snapshot trước/sau đoạn code nghi ngờ
[ ] Giám sát RSS qua health check endpoint trong production

Garbage collection

[ ] Hiểu threshold mặc định: (700, 10, 10) — đủ cho hầu hết ứng dụng
[ ] Gọi gc.collect() sau khi xóa batch lớn object
[ ] Cân nhắc gc.disable() cho đoạn latency-critical, collect sau

Bài tập luyện tập

Bài 1: Tính toán tiết kiệm bộ nhớ — Foundation

Đề bài: Service lưu trữ 2 triệu Product object. Mỗi object có 5 attribute: sku, name, price, stock, category. Tính bộ nhớ tiết kiệm khi chuyển từ class thường sang __slots__.

🧠 Quiz

Câu hỏi: Với 2 triệu object, __dict__ overhead khoảng 104 bytes/object. Chuyển sang __slots__ tiết kiệm bao nhiêu?

[ ] A. ~50 MB
[ ] B. ~100 MB
[x] C. ~200 MB
[ ] D. ~400 MB Giải thích: 2.000.000 × 104 bytes = 208.000.000 bytes ≈ 198 MB. Đáp án C (~200 MB) là gần nhất. Thực tế có thể cao hơn vì __dict__ resize khi thêm key.

Bài 2: Phát hiện memory leak bằng tracemalloc — Intermediate

Đề bài: Viết hàm detect_leak nhận một callable, chạy nó 1000 lần, dùng tracemalloc so sánh snapshot trước/sau, trả về top 3 dòng tăng bộ nhớ nhiều nhất.

💡 Gợi ý

tracemalloc.start() trước snapshot đầu
take_snapshot() trước và sau vòng lặp
snapshot2.compare_to(snapshot1, "lineno") cho diff

✅ Lời giải

python

import tracemalloc
from typing import Callable

def detect_leak(
    func: Callable[[], None],
    iterations: int = 1000,
    top_n: int = 3,
) -> list[str]:
    tracemalloc.start()
    func()  # Warmup
    snapshot_before = tracemalloc.take_snapshot()

    for _ in range(iterations):
        func()

    snapshot_after = tracemalloc.take_snapshot()
    tracemalloc.stop()

    diff = snapshot_after.compare_to(snapshot_before, "lineno")
    results: list[str] = []
    for stat in diff[:top_n]:
        if stat.size_diff > 0:
            results.append(
                f"+{stat.size_diff // 1024}KB "
                f"({stat.count_diff} allocs): {stat}"
            )
    return results

# Test:
# _leak_store = []
# def leaky(): _leak_store.append("x" * 1000)
# print(detect_leak(leaky))

Phân tích: Warmup 1 lần trước snapshot đầu để loại bỏ one-time allocation (import, class creation). So sánh lineno cho biết chính xác dòng nào cấp phát tăng. Time complexity O(iterations × cost_of_func), space complexity tùy vào tracemalloc trace depth.

Bài 3: Thiết kế LRU cache với memory monitoring — Advanced

Đề bài: Implement class MonitoredCache với: bounded LRU eviction, memory_usage() trả về bytes hiện tại, và stats() trả về hit/miss ratio. Dùng __slots__ cho cache entry.

💡 Gợi ý

Dùng collections.OrderedDict cho LRU behavior
Cache entry dùng __slots__ để tiết kiệm bộ nhớ
sys.getsizeof cho shallow size estimation

✅ Lời giải

python

import sys
from collections import OrderedDict
from typing import Any, Optional

class CacheEntry:
    __slots__ = ("key", "value", "size_bytes")

    def __init__(self, key: str, value: Any) -> None:
        self.key = key
        self.value = value
        self.size_bytes = sys.getsizeof(key) + sys.getsizeof(value)


class MonitoredCache:
    __slots__ = ("_store", "_max", "_hits", "_misses")

    def __init__(self, maxsize: int = 1000) -> None:
        self._store: OrderedDict[str, CacheEntry] = OrderedDict()
        self._max = maxsize
        self._hits = 0
        self._misses = 0

    def get(self, key: str) -> Optional[Any]:
        entry = self._store.get(key)
        if entry is not None:
            self._store.move_to_end(key)
            self._hits += 1
            return entry.value
        self._misses += 1
        return None

    def put(self, key: str, value: Any) -> None:
        if key in self._store:
            self._store.move_to_end(key)
            self._store[key] = CacheEntry(key, value)
        else:
            if len(self._store) >= self._max:
                self._store.popitem(last=False)
            self._store[key] = CacheEntry(key, value)

    def memory_usage(self) -> int:
        return sum(e.size_bytes for e in self._store.values())

    def stats(self) -> dict[str, float]:
        total = self._hits + self._misses
        return {
            "entries": len(self._store),
            "memory_bytes": self.memory_usage(),
            "hit_ratio": self._hits / total if total > 0 else 0.0,
            "miss_ratio": self._misses / total if total > 0 else 0.0,
        }

Phân tích: CacheEntry dùng __slots__ tiết kiệm ~100 bytes/entry. OrderedDict cho O(1) access và LRU eviction. memory_usage() tính shallow size — trong thực tế, bạn có thể dùng pympler.asizeof cho deep size. Trade-off: sys.getsizeof nhanh nhưng không tính referenced objects.

Liên kết học tiếp

Từ khóa glossary: memory optimization, slots, weakref, garbage collection, tracemalloc, memory leak, reference counting, generational GC, mmap, generator

Tìm kiếm liên quan: tối ưu bộ nhớ python, python memory leak, python slots, python garbage collector, tracemalloc hướng dẫn

Memory Optimization — Khi mỗi megabyte đều có giá ​

Bức tranh tư duy ​

Cốt lõi kỹ thuật ​

Memory layout trong CPython ​

slots — Loại bỏ dict overhead ​

slots với kế thừa ​

weakref — Tham chiếu không giữ sống object ​

gc module — Kiểm soát garbage collector ​

tracemalloc — Theo dõi cấp phát bộ nhớ ​

Generator vs List — Tiết kiệm bộ nhớ triệt để ​

Thực chiến ​

Tình huống: Service quản lý session OOMKilled sau 72 giờ ​

Sai lầm điển hình ​

❌ Sai lầm 1: Cache không giới hạn kích thước ​

❌ Sai lầm 2: Circular reference với del ​

❌ Sai lầm 3: Load toàn bộ file vào RAM ​

❌ Sai lầm 4: Exception giữ reference đến biến lớn ​

Under the Hood ​

CPython object memory layout chi tiết ​

Garbage collection generations ​

So sánh chi phí bộ nhớ các pattern ​

Trade-offs ​

Checklist ghi nhớ ​

Bài tập luyện tập ​

Bài 1: Tính toán tiết kiệm bộ nhớ — Foundation ​

Bài 2: Phát hiện memory leak bằng tracemalloc — Intermediate ​

Bài 3: Thiết kế LRU cache với memory monitoring — Advanced ​

Liên kết học tiếp ​

Memory Optimization — Khi mỗi megabyte đều có giá

Bức tranh tư duy

Cốt lõi kỹ thuật

Memory layout trong CPython

slots — Loại bỏ dict overhead

slots với kế thừa

weakref — Tham chiếu không giữ sống object

gc module — Kiểm soát garbage collector

tracemalloc — Theo dõi cấp phát bộ nhớ

Generator vs List — Tiết kiệm bộ nhớ triệt để

Thực chiến

Tình huống: Service quản lý session OOMKilled sau 72 giờ

Sai lầm điển hình

❌ Sai lầm 1: Cache không giới hạn kích thước

❌ Sai lầm 2: Circular reference với del

❌ Sai lầm 3: Load toàn bộ file vào RAM

❌ Sai lầm 4: Exception giữ reference đến biến lớn

Under the Hood

CPython object memory layout chi tiết

Garbage collection generations

So sánh chi phí bộ nhớ các pattern

Trade-offs

Checklist ghi nhớ

Bài tập luyện tập

Bài 1: Tính toán tiết kiệm bộ nhớ — Foundation

Bài 2: Phát hiện memory leak bằng tracemalloc — Intermediate

Bài 3: Thiết kế LRU cache với memory monitoring — Advanced

Liên kết học tiếp