Deploy to SageMaker

1. Purpose

Enterprise-grade serving. SageMaker handles the heavy lifting: patching OS, auto-scaling instances based on traffic, and multi-model hosting. Use this for production workloads requiring SLAs.

2. When to Use / When Not to Use

Use This Workflow When

Moving from "Laptop Demo" to "Production Service".
You need Auto-Scaling (Scale to zero at night, scale up during Black Friday).
Compliance requires AWS PrivateLink (No public internet access).

Do NOT Use This Workflow When

Quick prototyping (Use Local Docker).
Budget is $0 (SageMaker has markup over EC2).
You are locked into Google/Azure (Use Vertex AI / Azure ML).

3. Inputs

Required Inputs

[[MODEL_DATA_URL]]: S3 URI where model.tar.gz is stored.
[[INSTANCE_TYPE]]: e.g., ml.m5.xlarge (CPU) or ml.g4dn.xlarge (GPU).
[[ENDPOINT_NAME]]: Unique identifier.

4. Outputs

HTTPS Endpoint: https://runtime.sagemaker.../endpoints/[[ENDPOINT_NAME]]/invocations.
CloudWatch Logs: /aws/sagemaker/Endpoints/[[ENDPOINT_NAME]].

5. Preconditions

AWS CLI configured with AmazonSageMakerFullAccess.
Docker image (ECR) if using custom container, OR Standard Framework choice (PyTorch/Sklearn).

6. Procedure

Phase 1: Model Creation

Action: Define Model Object.
- Expected Output: SageMaker Model resource linking the S3 Artifact + Container Image.
- Notes: Specify execution_role_arn with correct S3 permissions.

Phase 2: Configuration

Action: Create Endpoint Config.
- Expected Output: Defines "Production Variant".
- Notes:
  - Instance Count: 1 (Dev) or >=2 (Prod HA).
  - Weight: 1.0 (Traffic split).
Action: Deploy.
- Expected Output: create_endpoint(). Status moves Creating -> InService.
- Notes: This takes 5-10 minutes.

Phase 3: Validation

Action: Invoke Endpoint.
- Expected Output: 200 OK with prediction payload.
- Notes: Use boto3.client('sagemaker-runtime').invoke_endpoint().

7. Quality Gates

[ ] Multi-AZ: At least 2 instances for Prod.
[ ] Auto-Scaling: Policy defined (Scale out when CPU > 50%).
[ ] Security: Data Capture (Model Monitor) enabled if drifting is a concern.

8. Failure Handling

Deployment Failed

Symptoms: Status Failed.
Recovery: Check CloudWatch Logs. Common error: "Container crashed on startup" (Missing dependency) or "Health check timeout" (Model takes > 60s to load).

Latency Spike

Symptoms: Inference takes 5s.
Recovery: Use "SageMaker Inference Recommender" to pick the right instance size. Switch to GPU if Compute bound.

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as an AWS ML Engineer.
Task: Execute the SageMaker Deployment workflow.

## Objective
Deploy [[MODEL_DATA_URL]] to Endpoint [[ENDPOINT_NAME]].

## Inputs
- **Instance**: [[INSTANCE_TYPE]]

## Procedure
Execute the following phases:

1. **Model**:
   - Use PyTorchModel/SKLearnModel SDK.
   - Point to `model.tar.gz` in S3.
   - Define Entrypoint `inference.py`.

2. **Config**:
   - Create Endpoint Config.
   - Set Initial Instance Count = 1.

3. **Deploy**:
   - Call `.deploy()`.
   - Wait for `InService`.

## Quality Gates
- [ ] IAM Role has S3 Read access.
- [ ] Instance type matches model needs (GPU/CPU).
- [ ] Health Check passes.

## Constraints
- Output: Python (Boto3/Sagemaker SDK) script.
- Region: current default.

## Command
Write the deployment script using Sagemaker Python SDK.

Deploy to SageMaker ​

1. Purpose ​

2. When to Use / When Not to Use ​

Use This Workflow When ​

Do NOT Use This Workflow When ​

3. Inputs ​

Required Inputs ​

4. Outputs ​

5. Preconditions ​

6. Procedure ​

Phase 1: Model Creation ​

Phase 2: Configuration ​

Phase 3: Validation ​

7. Quality Gates ​

8. Failure Handling ​

Deployment Failed ​

Latency Spike ​

9. Paste Prompt ​