Giao diện
Containerize ML Model
1. Purpose
Ship the code AND the environment. "It works on my machine" is unacceptable in ML. Docker ensures the precise version of PyTorch/TensorFlow, CUDA drivers, and System libs travel with the model.
2. When to Use / When Not to Use
Use This Workflow When
- Deploying to K8s / SageMaker / Cloud Run.
- Sharing a demo with colleagues.
- Ensuring consistent inference results.
Do NOT Use This Workflow When
- Training huge models on bare metal (Container overhead can be annoyance if not configured right).
- Distributing a Python library (Use PyPi).
3. Inputs
Required Inputs
- [[MODEL_ARTIFACT]]:
.pt,.pkl,.onnxfile. - [[INFERENCE_CODE]]:
predict.pyor FastAPI app. - [[HARDWARE_TARGET]]: CPU or GPU (NVIDIA).
4. Outputs
- Image:
my-model:latest. - Scan: Vulnerability report.
- Test:
docker runsuccess proof.
5. Preconditions
- Docker Desktop / Daemon installed.
- Model artifact exists locally.
6. Procedure
Phase 1: Preparation
Action: Pin Dependencies.
- Expected Output:
requirements.txtwith EXACT versions (torch==2.1.0). - Notes: Do not use
latest. ML libs break API frequently.
- Expected Output:
Action: Select Base Image.
- Expected Output:
- CPU:
python:3.9-slim. - GPU:
nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04.
- CPU:
- Notes: Avoid "devel" images for prod (too big).
- Expected Output:
Phase 2: Dockerfile
Action: Write Instructions.
- Expected Output:
COPY,RUN pip install,CMD. - Notes: Bake the model into the image (if <2GB) or download on startup (if >2GB). Baking checks integrity but slows build.
- Expected Output:
Action: Optimization.
- Expected Output: Multi-stage build to remove build tools (gcc).
.dockerignoreexcludes training data/venv.
- Expected Output: Multi-stage build to remove build tools (gcc).
Phase 3: Build & Test
Action: Build.
- Expected Output:
docker build -t [[MODEL_ARTIFACT]] ..
- Expected Output:
Action: Smoke Test.
- Expected Output: Container starts, listens on port, and responds to 1 request.
7. Quality Gates
- [ ] Size: Image is reasonably sized (remove apt cache, use slim base).
- [ ] Security: No root user (
USER appuser). - [ ] GPU: Use
nvidia-dockerruntime if GPU required.
8. Failure Handling
Image too large
- Symptoms: 10GB image. Slow Deploy.
- Recovery: Do not include training data in image. Use
.dockerignore. Download model weights from S3 at runtime instead of COPY.
CUDA Mismatch
- Symptoms:
RuntimeError: Found no NVIDIA driver on your system. - Recovery: Ensure host has drivers. Ensure container uses
nvidia/cudabase. Ensure--gpus allflag passed.
9. Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as an MLOps Engineer.
Task: Execute the Containerize Model workflow.
## Objective
Build Docker image for [[MODEL_ARTIFACT]] running on [[HARDWARE_TARGET]].
## Inputs
- **Base**: Python Slim (CPU) or NVIDIA CUDA (GPU).
## Procedure
Execute the following phases:
1. **Manifest**:
- Create `requirements.txt` (Pinned).
- Create `.dockerignore` (Exclude .venv, data).
2. **Dockerfile**:
- Base Image selection.
- Install System Deps (libgl1 etc).
- Install Python Deps.
- COPY code/model.
- Define ENTRYPOINT.
3. **Build**:
- Run `docker build`.
- Verify size.
## Quality Gates
- [ ] Non-root user defined.
- [ ] Cache cleanup (`rm -rf /var/lib/apt/lists/*`).
- [ ] Healthcheck defined.
## Constraints
- Output: Dockerfile content.
- Optimize for Size.
## Command
Draft the Dockerfile.