Giao diện
Dependency Cleanup (Tech Debt)
1. Objective
The objective of this workflow is to reduce the attack surface and "Software Bloat". AI/ML projects are notorious for pip installing everything (Pandas, Numpy, Scipy, Matplotlib, Jupyter) and never removing them. Optimized Docker images require minimal dependencies. This workflow enforces hygiene.
2. Context & Scope
In Scope
This workflow covers auditing Python dependencies (unused packages), removing large unused model weights, deprecating old Feature Flags, and scanning for vulnerabilities.
Assumption: You have a requirements.txt, pyproject.toml, or poetry.lock.
Out of Scope
- Code Refactoring: We are removing Dependencies, not rewriting functions (See "Code Revamp").
- OS Updates: Upgrading Ubuntu/Alpine is an Infra task.
3. When to Use / When Not to Use
✅ Use This Workflow When
- Your Docker build takes > 10 minutes.
- You get a Security Alert (CVE) for a library you don't think you use.
- You are preparing a repository for handover/archival.
❌ Do NOT Use This Workflow When
- You are in the middle of a tight deadline (Risk of breaking the build).
- You are experimenting (Let the clutter exist until dev is done).
4. Inputs (Required/Optional)
Required Inputs
| Input | Description | Format | Example |
|---|---|---|---|
| REPO_PATH | Project root. | Path | ./ |
| ENVIRONMENT_FILE | Dep list. | File | requirements.txt |
Optional Inputs
| Input | Description | Default | Condition |
|---|---|---|---|
| AGGRESSIVE | Remove dev tools? | False | True for Production builds. |
5. Outputs (Artifacts)
| Artifact | Format | Destination | Quality Criteria |
|---|---|---|---|
| Clean Requirements | Text | Repo | Only imported libs listed. |
| Audit Log | Markdown | Issue | Vulnerabilities fixed. |
6. Operating Modes
⚡ Fast Mode
Timebox: 30 minutes Scope: Vulnerability Scan. Details: Running pip-audit or safety check to find Critical CVEs and upgrading those specific packages.
🎯 Standard Mode (Default)
Timebox: 2 hours Scope: Dead Code Removal. Details: Using deptry or pip-check-reqs to find packages in requirements.txt that are never imported in the src/ code. Uninstalling them. Re-freezing.
🔬 Deep Mode
Timebox: 1 day Scope: Artifact Purge. Details: Scanning S3/Artifactory for old model versions (e.g., v1.0 from 2 years ago) that are no longer serving traffic. Deleting them to save storage costs.
7. Constraints & Guardrails
Technical Constraints
- Transitive Deps: Don't just remove
numpyifpandasneeds it. Use tools that understand the dependency tree (pipdeptree). - Dynamic Imports: If your code does
importlib.import_module("foo"), static analysis tools won't see "foo" as used. Manually verify.
Security & Privacy
CAUTION
Supply Chain Attacks Typosquatting (installing pandda instead of pandas) is real. Review the list for weird names.
Compliance
- License Check: Ensure you haven't accidentally upgraded a library to a version with a viral license (GPL) if your policy forbids it.
8. Procedure
Phase 1: Audit
Objective: Know what you have.
Generate tree: pipdeptree > tree.txt. Check usage: deptry . (for Poetry/Pyproject) OR pip-extra-reqs src/ (for requirements.txt) Output: "Scipy is in requirements but not imported."
Verify: List of candidate removals.
Phase 2: Prune
Objective: Delete.
Remove unused lines from requirements.txt. Uninstall: pip uninstall -r to_remove.txt. Crucial Step: Run Tests. pytest. Why: Sometimes a package acts as a plugin (starts on import) without explicit import in your code.
Verify: Tests pass without the package.
Phase 3: Vulnerability Fix
Objective: Secure.
Run pip-audit. If CVE found:
- Can we upgrade?
pip install package==newer. - Can we remove? (If identified as unused in Phase 1).
- Can we suppress? (Only if it's a False Positive or Dev-only tool).
Verify: Audit returns "No known vulnerabilities".
9. Technical Considerations
Dev vs Prod: distinct requirements-dev.txt (pytest, black, jupyter) vs requirements.txt (torch, fastapi). Production images should NEVER contain dev dependencies.
Pinning: Always pin versions (==1.0.0). But also use pip-compile (pip-tools) to resolve valid combinations of versions/hashes.
Large Files: Check for .h5 or .pt files committed to Git. Use git filter-repo to scrub them.
10. Quality Gates (Definition of Done)
Checklist
- [ ] Unused packages removed.
- [ ] No Critical CVEs.
- [ ] Tests pass (Regression check).
- [ ] Docker image size reduced (optional).
Validation
| Criterion | Method | Threshold |
|---|---|---|
| Hygiene | Deptry Score | 0 unused, 0 missing |
| Security | CVSS Score | < 7.0 (No High/Critical) |
11. Failure Modes & Recovery
| Failure Mode | Symptoms | Recovery Action |
|---|---|---|
| Hidden Dependency | Code fails at runtime ImportError. | The library was used via exec() or importlib. Add it back with a comment # Required for dynamic import. |
| Version Conflict | Pip can't resolve graph. | You pinned everything too strictly. Loosen pins (>=1.0 instead of ==1.0.1) and re-compile. |
| Test Gap | Tests pass, but Prod fails. | Your test suite didn't cover the code path that used the removed library. Improve coverage. |
12. Copy-Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a Senior ML Ops Engineer.
Task: Execute the Dependency Cleanup (Tech Debt) workflow.
## Objective & Scope
- **Goal**: Clean up unused dependencies and fix security vulnerabilities (CVEs).
- **Scope**: Auditing usage, Pruning requirements, fixing CVEs, and removing old artifacts.
## Inputs
- [ ] REPO_PATH: Path to project root.
- [ ] ENVIRONMENT_FILE: Dependencies list (requirements.txt/poetry.lock).
- [ ] AGGRESSIVE: Boolean to remove dev tools in prod.
## Output Artifacts
- [ ] Cleaned Requirements File (Text)
- [ ] Audit Report (Markdown)
## Execution Steps
1. **Scan**
- Audit usage with `pip-check-reqs`/`deptry` and scan vulnerabilities with `pip-audit`.
2. **Prune**
- Remove unused libraries. Upgrade vulnerable packages. Scrub old models/files.
3. **Verify**
- Re-install clean environment. Run full regression tests (pytest).
## Quality Gates
- [ ] No unused packages in requirements.
- [ ] No Critical CVEs present.
- [ ] Regression tests pass.
- [ ] Docker build succeeds.
## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.
## Constraints
- **Security**: Verify Supply Chain (typosquatting).
- **Compliance**: Check License compatibility.
## Command
Now execute this workflow step-by-step.Appendix: Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-01-14 | AI Engineering Team | Initial release |