Skip to content

Monitor Model Drift


1. Purpose

Rust never sleeps. Models degrade because the world changes (COVID hits, user behavior shifts). Drift monitoring warns you before the business complains about bad predictions.


2. When to Use / When Not to Use

Use This Workflow When

  • Model has been in production for > 1 week.
  • Input distribution is volatile (e.g., E-commerce trends).
  • You want automated triggers for retraining.

Do NOT Use This Workflow When

  • You just deployed yesterday (Too early).
  • The model is a static rule engine.
  • You have Ground Truth instantly (Use Accuracy Monitoring instead of Drift).

3. Inputs

Required Inputs

  • [[BASELINE_DATA_PATH]]: Reference set (Training Data).
  • [[CURRENT_DATA_PATH]]: New Inference logs (Production Data).
  • [[DRIFT_THRESHOLD]]: e.g., 0.05 (p-value for KS Test).

4. Outputs

  • Drift Metrics: PSI (Population Stability Index), KL Divergence.
  • Visuals: Distributions of Feature A (Train) vs Feature A (Prod).

5. Preconditions

  • evidently or alibi-detect installed.
  • Access to production logs.

6. Procedure

Phase 1: Data Prep

  1. Action: Load Datasets.

    • Expected Output: Two DataFrames: reference and current.
    • Notes: Ensure schemas match exactly. Handle missing columns in logs.
  2. Action: Select Features.

    • Expected Output: List of Numerical and Categorical columns to monitor.
    • Notes: Don't monitor IDs or timestamps (They always drift).

Phase 2: Analysis

  1. Action: Calculate Data Drift (Covariate Shift).

    • Expected Output: Statistical tests (KS-Test, Chi-Square) per column.
    • Notes: "Is the input different?".
  2. Action: Calculate Concept Drift (Prediction Shift).

    • Expected Output: Check distribution of Predictions.
    • Notes: "Is the model saying 'Yes' more often than before?".

Phase 3: Reporting

  1. Action: Generate Report.

    • Expected Output: HTML dashboard showing Red/Green per feature.
  2. Action: Alerting.

    • Expected Output: If % of drifted features > 20%, send Alert.

7. Quality Gates

  • [ ] Sample Size: current batch has enough rows (>1000) for statistical significance.
  • [ ] Sensitivity: Threshold isn't so low it alerts every day (Noise).
  • [ ] Actionable: Alert links to the Report.

8. Failure Handling

False Alarm (Seasonality)

  • Symptoms: Drift detected every Monday morning.
  • Recovery: Compare Current Monday vs Last Monday (Reference), not Current Monday vs Random Training Sample.

Schema Mismatch

  • Symptoms: Column 'age' not found in current.
  • Recovery: ETL pipeline failure. Fix logging upstream. Valid drift check requires valid schema.

9. Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as an MLOps Engineer.
Task: Execute the Drift Monitoring workflow.

## Objective
Monitor drift for [[CURRENT_DATA_PATH]] against [[BASELINE_DATA_PATH]].

## Inputs
- **Threshold**: [[DRIFT_THRESHOLD]]

## Procedure
Execute the following phases:

1. **Load**:
   - Read Reference (Train) and Current (Prod).
   - Align columns.

2. **Detect**:
   - Use `Evidently` library.
   - Run `DataDriftPreset`.
   - Check PSI/KS-Test.

3. **Report**:
   - Save HTML report.
   - Return JSON summary of failed features.

## Quality Gates
- [ ] Exclude ID columns.
- [ ] Alert if > 20% features drifted.
- [ ] Check Target distribution (Concept Drift).

## Constraints
- Output: Python Script.
- Library: `evidently`.

## Command
Generate the drift report code.

Cập nhật lần cuối: