Skip to content

Deploy to SageMaker


1. Purpose

Enterprise-grade serving. SageMaker handles the heavy lifting: patching OS, auto-scaling instances based on traffic, and multi-model hosting. Use this for production workloads requiring SLAs.


2. When to Use / When Not to Use

Use This Workflow When

  • Moving from "Laptop Demo" to "Production Service".
  • You need Auto-Scaling (Scale to zero at night, scale up during Black Friday).
  • Compliance requires AWS PrivateLink (No public internet access).

Do NOT Use This Workflow When

  • Quick prototyping (Use Local Docker).
  • Budget is $0 (SageMaker has markup over EC2).
  • You are locked into Google/Azure (Use Vertex AI / Azure ML).

3. Inputs

Required Inputs

  • [[MODEL_DATA_URL]]: S3 URI where model.tar.gz is stored.
  • [[INSTANCE_TYPE]]: e.g., ml.m5.xlarge (CPU) or ml.g4dn.xlarge (GPU).
  • [[ENDPOINT_NAME]]: Unique identifier.

4. Outputs

  • HTTPS Endpoint: https://runtime.sagemaker.../endpoints/[[ENDPOINT_NAME]]/invocations.
  • CloudWatch Logs: /aws/sagemaker/Endpoints/[[ENDPOINT_NAME]].

5. Preconditions

  • AWS CLI configured with AmazonSageMakerFullAccess.
  • Docker image (ECR) if using custom container, OR Standard Framework choice (PyTorch/Sklearn).

6. Procedure

Phase 1: Model Creation

  1. Action: Define Model Object.
    • Expected Output: SageMaker Model resource linking the S3 Artifact + Container Image.
    • Notes: Specify execution_role_arn with correct S3 permissions.

Phase 2: Configuration

  1. Action: Create Endpoint Config.

    • Expected Output: Defines "Production Variant".
    • Notes:
      • Instance Count: 1 (Dev) or >=2 (Prod HA).
      • Weight: 1.0 (Traffic split).
  2. Action: Deploy.

    • Expected Output: create_endpoint(). Status moves Creating -> InService.
    • Notes: This takes 5-10 minutes.

Phase 3: Validation

  1. Action: Invoke Endpoint.
    • Expected Output: 200 OK with prediction payload.
    • Notes: Use boto3.client('sagemaker-runtime').invoke_endpoint().

7. Quality Gates

  • [ ] Multi-AZ: At least 2 instances for Prod.
  • [ ] Auto-Scaling: Policy defined (Scale out when CPU > 50%).
  • [ ] Security: Data Capture (Model Monitor) enabled if drifting is a concern.

8. Failure Handling

Deployment Failed

  • Symptoms: Status Failed.
  • Recovery: Check CloudWatch Logs. Common error: "Container crashed on startup" (Missing dependency) or "Health check timeout" (Model takes > 60s to load).

Latency Spike

  • Symptoms: Inference takes 5s.
  • Recovery: Use "SageMaker Inference Recommender" to pick the right instance size. Switch to GPU if Compute bound.

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as an AWS ML Engineer.
Task: Execute the SageMaker Deployment workflow.

## Objective
Deploy [[MODEL_DATA_URL]] to Endpoint [[ENDPOINT_NAME]].

## Inputs
- **Instance**: [[INSTANCE_TYPE]]

## Procedure
Execute the following phases:

1. **Model**:
   - Use PyTorchModel/SKLearnModel SDK.
   - Point to `model.tar.gz` in S3.
   - Define Entrypoint `inference.py`.

2. **Config**:
   - Create Endpoint Config.
   - Set Initial Instance Count = 1.

3. **Deploy**:
   - Call `.deploy()`.
   - Wait for `InService`.

## Quality Gates
- [ ] IAM Role has S3 Read access.
- [ ] Instance type matches model needs (GPU/CPU).
- [ ] Health Check passes.

## Constraints
- Output: Python (Boto3/Sagemaker SDK) script.
- Region: current default.

## Command
Write the deployment script using Sagemaker Python SDK.

Cập nhật lần cuối: