Skip to content

Dependency Cleanup (Tech Debt)


1. Objective

The objective of this workflow is to reduce the attack surface and "Software Bloat". AI/ML projects are notorious for pip installing everything (Pandas, Numpy, Scipy, Matplotlib, Jupyter) and never removing them. Optimized Docker images require minimal dependencies. This workflow enforces hygiene.


2. Context & Scope

In Scope

This workflow covers auditing Python dependencies (unused packages), removing large unused model weights, deprecating old Feature Flags, and scanning for vulnerabilities.

Assumption: You have a requirements.txt, pyproject.toml, or poetry.lock.

Out of Scope

  • Code Refactoring: We are removing Dependencies, not rewriting functions (See "Code Revamp").
  • OS Updates: Upgrading Ubuntu/Alpine is an Infra task.

3. When to Use / When Not to Use

Use This Workflow When

  • Your Docker build takes > 10 minutes.
  • You get a Security Alert (CVE) for a library you don't think you use.
  • You are preparing a repository for handover/archival.

Do NOT Use This Workflow When

  • You are in the middle of a tight deadline (Risk of breaking the build).
  • You are experimenting (Let the clutter exist until dev is done).

4. Inputs (Required/Optional)

Required Inputs

InputDescriptionFormatExample
REPO_PATHProject root.Path./
ENVIRONMENT_FILEDep list.Filerequirements.txt

Optional Inputs

InputDescriptionDefaultCondition
AGGRESSIVERemove dev tools?FalseTrue for Production builds.

5. Outputs (Artifacts)

ArtifactFormatDestinationQuality Criteria
Clean RequirementsTextRepoOnly imported libs listed.
Audit LogMarkdownIssueVulnerabilities fixed.

6. Operating Modes

Fast Mode

Timebox: 30 minutes Scope: Vulnerability Scan. Details: Running pip-audit or safety check to find Critical CVEs and upgrading those specific packages.

🎯 Standard Mode (Default)

Timebox: 2 hours Scope: Dead Code Removal. Details: Using deptry or pip-check-reqs to find packages in requirements.txt that are never imported in the src/ code. Uninstalling them. Re-freezing.

🔬 Deep Mode

Timebox: 1 day Scope: Artifact Purge. Details: Scanning S3/Artifactory for old model versions (e.g., v1.0 from 2 years ago) that are no longer serving traffic. Deleting them to save storage costs.


7. Constraints & Guardrails

Technical Constraints

  • Transitive Deps: Don't just remove numpy if pandas needs it. Use tools that understand the dependency tree (pipdeptree).
  • Dynamic Imports: If your code does importlib.import_module("foo"), static analysis tools won't see "foo" as used. Manually verify.

Security & Privacy

CAUTION

Supply Chain Attacks Typosquatting (installing pandda instead of pandas) is real. Review the list for weird names.

Compliance

  • License Check: Ensure you haven't accidentally upgraded a library to a version with a viral license (GPL) if your policy forbids it.

8. Procedure

Phase 1: Audit

Objective: Know what you have.

Generate tree: pipdeptree > tree.txt. Check usage: deptry . (for Poetry/Pyproject) OR pip-extra-reqs src/ (for requirements.txt) Output: "Scipy is in requirements but not imported."

Verify: List of candidate removals.

Phase 2: Prune

Objective: Delete.

Remove unused lines from requirements.txt. Uninstall: pip uninstall -r to_remove.txt. Crucial Step: Run Tests. pytest. Why: Sometimes a package acts as a plugin (starts on import) without explicit import in your code.

Verify: Tests pass without the package.

Phase 3: Vulnerability Fix

Objective: Secure.

Run pip-audit. If CVE found:

  • Can we upgrade? pip install package==newer.
  • Can we remove? (If identified as unused in Phase 1).
  • Can we suppress? (Only if it's a False Positive or Dev-only tool).

Verify: Audit returns "No known vulnerabilities".


9. Technical Considerations

Dev vs Prod: distinct requirements-dev.txt (pytest, black, jupyter) vs requirements.txt (torch, fastapi). Production images should NEVER contain dev dependencies.

Pinning: Always pin versions (==1.0.0). But also use pip-compile (pip-tools) to resolve valid combinations of versions/hashes.

Large Files: Check for .h5 or .pt files committed to Git. Use git filter-repo to scrub them.


10. Quality Gates (Definition of Done)

Checklist

  • [ ] Unused packages removed.
  • [ ] No Critical CVEs.
  • [ ] Tests pass (Regression check).
  • [ ] Docker image size reduced (optional).

Validation

CriterionMethodThreshold
HygieneDeptry Score0 unused, 0 missing
SecurityCVSS Score< 7.0 (No High/Critical)

11. Failure Modes & Recovery

Failure ModeSymptomsRecovery Action
Hidden DependencyCode fails at runtime ImportError.The library was used via exec() or importlib. Add it back with a comment # Required for dynamic import.
Version ConflictPip can't resolve graph.You pinned everything too strictly. Loosen pins (>=1.0 instead of ==1.0.1) and re-compile.
Test GapTests pass, but Prod fails.Your test suite didn't cover the code path that used the removed library. Improve coverage.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as a Senior ML Ops Engineer.
Task: Execute the Dependency Cleanup (Tech Debt) workflow.

## Objective & Scope
- **Goal**: Clean up unused dependencies and fix security vulnerabilities (CVEs).
- **Scope**: Auditing usage, Pruning requirements, fixing CVEs, and removing old artifacts.

## Inputs
- [ ] REPO_PATH: Path to project root.
- [ ] ENVIRONMENT_FILE: Dependencies list (requirements.txt/poetry.lock).
- [ ] AGGRESSIVE: Boolean to remove dev tools in prod.

## Output Artifacts
- [ ] Cleaned Requirements File (Text)
- [ ] Audit Report (Markdown)

## Execution Steps
1. **Scan**
   - Audit usage with `pip-check-reqs`/`deptry` and scan vulnerabilities with `pip-audit`.
2. **Prune**
   - Remove unused libraries. Upgrade vulnerable packages. Scrub old models/files.
3. **Verify**
   - Re-install clean environment. Run full regression tests (pytest).

## Quality Gates
- [ ] No unused packages in requirements.
- [ ] No Critical CVEs present.
- [ ] Regression tests pass.
- [ ] Docker build succeeds.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Security**: Verify Supply Chain (typosquatting).
- **Compliance**: Check License compatibility.

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

VersionDateAuthorChanges
1.0.02026-01-14AI Engineering TeamInitial release

Cập nhật lần cuối: