Giao diện
Deploy to SageMaker
1. Purpose
Enterprise-grade serving. SageMaker handles the heavy lifting: patching OS, auto-scaling instances based on traffic, and multi-model hosting. Use this for production workloads requiring SLAs.
2. When to Use / When Not to Use
Use This Workflow When
- Moving from "Laptop Demo" to "Production Service".
- You need Auto-Scaling (Scale to zero at night, scale up during Black Friday).
- Compliance requires AWS PrivateLink (No public internet access).
Do NOT Use This Workflow When
- Quick prototyping (Use Local Docker).
- Budget is $0 (SageMaker has markup over EC2).
- You are locked into Google/Azure (Use Vertex AI / Azure ML).
3. Inputs
Required Inputs
- [[MODEL_DATA_URL]]: S3 URI where
model.tar.gzis stored. - [[INSTANCE_TYPE]]: e.g.,
ml.m5.xlarge(CPU) orml.g4dn.xlarge(GPU). - [[ENDPOINT_NAME]]: Unique identifier.
4. Outputs
- HTTPS Endpoint:
https://runtime.sagemaker.../endpoints/[[ENDPOINT_NAME]]/invocations. - CloudWatch Logs:
/aws/sagemaker/Endpoints/[[ENDPOINT_NAME]].
5. Preconditions
- AWS CLI configured with
AmazonSageMakerFullAccess. - Docker image (ECR) if using custom container, OR Standard Framework choice (PyTorch/Sklearn).
6. Procedure
Phase 1: Model Creation
- Action: Define Model Object.
- Expected Output: SageMaker Model resource linking the S3 Artifact + Container Image.
- Notes: Specify
execution_role_arnwith correct S3 permissions.
Phase 2: Configuration
Action: Create Endpoint Config.
- Expected Output: Defines "Production Variant".
- Notes:
- Instance Count: 1 (Dev) or >=2 (Prod HA).
- Weight: 1.0 (Traffic split).
Action: Deploy.
- Expected Output:
create_endpoint(). Status movesCreating->InService. - Notes: This takes 5-10 minutes.
- Expected Output:
Phase 3: Validation
- Action: Invoke Endpoint.
- Expected Output: 200 OK with prediction payload.
- Notes: Use
boto3.client('sagemaker-runtime').invoke_endpoint().
7. Quality Gates
- [ ] Multi-AZ: At least 2 instances for Prod.
- [ ] Auto-Scaling: Policy defined (Scale out when CPU > 50%).
- [ ] Security: Data Capture (Model Monitor) enabled if drifting is a concern.
8. Failure Handling
Deployment Failed
- Symptoms: Status
Failed. - Recovery: Check CloudWatch Logs. Common error: "Container crashed on startup" (Missing dependency) or "Health check timeout" (Model takes > 60s to load).
Latency Spike
- Symptoms: Inference takes 5s.
- Recovery: Use "SageMaker Inference Recommender" to pick the right instance size. Switch to GPU if Compute bound.
9. Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as an AWS ML Engineer.
Task: Execute the SageMaker Deployment workflow.
## Objective
Deploy [[MODEL_DATA_URL]] to Endpoint [[ENDPOINT_NAME]].
## Inputs
- **Instance**: [[INSTANCE_TYPE]]
## Procedure
Execute the following phases:
1. **Model**:
- Use PyTorchModel/SKLearnModel SDK.
- Point to `model.tar.gz` in S3.
- Define Entrypoint `inference.py`.
2. **Config**:
- Create Endpoint Config.
- Set Initial Instance Count = 1.
3. **Deploy**:
- Call `.deploy()`.
- Wait for `InService`.
## Quality Gates
- [ ] IAM Role has S3 Read access.
- [ ] Instance type matches model needs (GPU/CPU).
- [ ] Health Check passes.
## Constraints
- Output: Python (Boto3/Sagemaker SDK) script.
- Region: current default.
## Command
Write the deployment script using Sagemaker Python SDK.