Containerize ML Model

1. Purpose

Ship the code AND the environment. "It works on my machine" is unacceptable in ML. Docker ensures the precise version of PyTorch/TensorFlow, CUDA drivers, and System libs travel with the model.

2. When to Use / When Not to Use

Use This Workflow When

Deploying to K8s / SageMaker / Cloud Run.
Sharing a demo with colleagues.
Ensuring consistent inference results.

Do NOT Use This Workflow When

Training huge models on bare metal (Container overhead can be annoyance if not configured right).
Distributing a Python library (Use PyPi).

3. Inputs

Required Inputs

[[MODEL_ARTIFACT]]: .pt, .pkl, .onnx file.
[[INFERENCE_CODE]]: predict.py or FastAPI app.
[[HARDWARE_TARGET]]: CPU or GPU (NVIDIA).

4. Outputs

Image: my-model:latest.
Scan: Vulnerability report.
Test: docker run success proof.

5. Preconditions

Docker Desktop / Daemon installed.
Model artifact exists locally.

6. Procedure

Phase 1: Preparation

Action: Pin Dependencies.
- Expected Output: requirements.txt with EXACT versions (torch==2.1.0).
- Notes: Do not use latest. ML libs break API frequently.
Action: Select Base Image.
- Expected Output:
  - CPU: python:3.9-slim.
  - GPU: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04.
- Notes: Avoid "devel" images for prod (too big).

Phase 2: Dockerfile

Action: Write Instructions.
- Expected Output: COPY, RUN pip install, CMD.
- Notes: Bake the model into the image (if <2GB) or download on startup (if >2GB). Baking checks integrity but slows build.
Action: Optimization.
- Expected Output: Multi-stage build to remove build tools (gcc). .dockerignore excludes training data/venv.

Phase 3: Build & Test

Action: Build.
- Expected Output: docker build -t [[MODEL_ARTIFACT]] ..
Action: Smoke Test.
- Expected Output: Container starts, listens on port, and responds to 1 request.

7. Quality Gates

[ ] Size: Image is reasonably sized (remove apt cache, use slim base).
[ ] Security: No root user (USER appuser).
[ ] GPU: Use nvidia-docker runtime if GPU required.

8. Failure Handling

Image too large

Symptoms: 10GB image. Slow Deploy.
Recovery: Do not include training data in image. Use .dockerignore. Download model weights from S3 at runtime instead of COPY.

CUDA Mismatch

Symptoms: RuntimeError: Found no NVIDIA driver on your system.
Recovery: Ensure host has drivers. Ensure container uses nvidia/cuda base. Ensure --gpus all flag passed.

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as an MLOps Engineer.
Task: Execute the Containerize Model workflow.

## Objective
Build Docker image for [[MODEL_ARTIFACT]] running on [[HARDWARE_TARGET]].

## Inputs
- **Base**: Python Slim (CPU) or NVIDIA CUDA (GPU).

## Procedure
Execute the following phases:

1. **Manifest**:
   - Create `requirements.txt` (Pinned).
   - Create `.dockerignore` (Exclude .venv, data).

2. **Dockerfile**:
   - Base Image selection.
   - Install System Deps (libgl1 etc).
   - Install Python Deps.
   - COPY code/model.
   - Define ENTRYPOINT.

3. **Build**:
   - Run `docker build`.
   - Verify size.

## Quality Gates
- [ ] Non-root user defined.
- [ ] Cache cleanup (`rm -rf /var/lib/apt/lists/*`).
- [ ] Healthcheck defined.

## Constraints
- Output: Dockerfile content.
- Optimize for Size.

## Command
Draft the Dockerfile.

Containerize ML Model ​

1. Purpose ​

2. When to Use / When Not to Use ​

Use This Workflow When ​

Do NOT Use This Workflow When ​

3. Inputs ​

Required Inputs ​

4. Outputs ​

5. Preconditions ​

6. Procedure ​

Phase 1: Preparation ​

Phase 2: Dockerfile ​

Phase 3: Build & Test ​

7. Quality Gates ​

8. Failure Handling ​

Image too large ​

CUDA Mismatch ​

9. Paste Prompt ​