Skip to content

Containerize ML Model


1. Purpose

Ship the code AND the environment. "It works on my machine" is unacceptable in ML. Docker ensures the precise version of PyTorch/TensorFlow, CUDA drivers, and System libs travel with the model.


2. When to Use / When Not to Use

Use This Workflow When

  • Deploying to K8s / SageMaker / Cloud Run.
  • Sharing a demo with colleagues.
  • Ensuring consistent inference results.

Do NOT Use This Workflow When

  • Training huge models on bare metal (Container overhead can be annoyance if not configured right).
  • Distributing a Python library (Use PyPi).

3. Inputs

Required Inputs

  • [[MODEL_ARTIFACT]]: .pt, .pkl, .onnx file.
  • [[INFERENCE_CODE]]: predict.py or FastAPI app.
  • [[HARDWARE_TARGET]]: CPU or GPU (NVIDIA).

4. Outputs

  • Image: my-model:latest.
  • Scan: Vulnerability report.
  • Test: docker run success proof.

5. Preconditions

  • Docker Desktop / Daemon installed.
  • Model artifact exists locally.

6. Procedure

Phase 1: Preparation

  1. Action: Pin Dependencies.

    • Expected Output: requirements.txt with EXACT versions (torch==2.1.0).
    • Notes: Do not use latest. ML libs break API frequently.
  2. Action: Select Base Image.

    • Expected Output:
      • CPU: python:3.9-slim.
      • GPU: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04.
    • Notes: Avoid "devel" images for prod (too big).

Phase 2: Dockerfile

  1. Action: Write Instructions.

    • Expected Output: COPY, RUN pip install, CMD.
    • Notes: Bake the model into the image (if <2GB) or download on startup (if >2GB). Baking checks integrity but slows build.
  2. Action: Optimization.

    • Expected Output: Multi-stage build to remove build tools (gcc). .dockerignore excludes training data/venv.

Phase 3: Build & Test

  1. Action: Build.

    • Expected Output: docker build -t [[MODEL_ARTIFACT]] ..
  2. Action: Smoke Test.

    • Expected Output: Container starts, listens on port, and responds to 1 request.

7. Quality Gates

  • [ ] Size: Image is reasonably sized (remove apt cache, use slim base).
  • [ ] Security: No root user (USER appuser).
  • [ ] GPU: Use nvidia-docker runtime if GPU required.

8. Failure Handling

Image too large

  • Symptoms: 10GB image. Slow Deploy.
  • Recovery: Do not include training data in image. Use .dockerignore. Download model weights from S3 at runtime instead of COPY.

CUDA Mismatch

  • Symptoms: RuntimeError: Found no NVIDIA driver on your system.
  • Recovery: Ensure host has drivers. Ensure container uses nvidia/cuda base. Ensure --gpus all flag passed.

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as an MLOps Engineer.
Task: Execute the Containerize Model workflow.

## Objective
Build Docker image for [[MODEL_ARTIFACT]] running on [[HARDWARE_TARGET]].

## Inputs
- **Base**: Python Slim (CPU) or NVIDIA CUDA (GPU).

## Procedure
Execute the following phases:

1. **Manifest**:
   - Create `requirements.txt` (Pinned).
   - Create `.dockerignore` (Exclude .venv, data).

2. **Dockerfile**:
   - Base Image selection.
   - Install System Deps (libgl1 etc).
   - Install Python Deps.
   - COPY code/model.
   - Define ENTRYPOINT.

3. **Build**:
   - Run `docker build`.
   - Verify size.

## Quality Gates
- [ ] Non-root user defined.
- [ ] Cache cleanup (`rm -rf /var/lib/apt/lists/*`).
- [ ] Healthcheck defined.

## Constraints
- Output: Dockerfile content.
- Optimize for Size.

## Command
Draft the Dockerfile.

Cập nhật lần cuối: