Code Revamp (Optimization)

1. Objective

The objective of this workflow is to elevate "working" code to "high-performance" code. In ML, a slow loop can cost days of training time or thousands of dollars in inference. This workflow focuses on performance optimization (Vectorization, Caching) and strict enforceability of standards (Mypy Strict, Ruff).

2. Context & Scope

In Scope

This workflow covers Static Analysis (Ruff/Mypy), Vectorization (replacing for loops with numpy/torch), Memory Profiling, and Dependency tightening.

Assumption: The code is already modular (functions/classes exist). If it is a messy notebook, use "ML Code Revamp (Refactor)" first.

Out of Scope

Algorithm Redesign: We are not changing the math, just the implementation efficiency.
Hardware Upgrade: We optimize for the current hardware.

3. When to Use / When Not to Use

✅ Use This Workflow When

Your data loader is the bottleneck (GPU utilization < 50%).
Inference latency is too high for production SLA.
You are releasing an open-source library and need professional quality.

❌ Do NOT Use This Workflow When

The code runs in 5 seconds and runs once a month. (Premature Optimization).
You are debugging logic errors. Fix bugs before optimizing speed.

4. Inputs (Required/Optional)

Required Inputs

Input	Description	Format	Example
REPO_PATH	Code location.	Path	`./src/my_model`

Optional Inputs

Input	Description	Default	Condition
TARGET_LATENCY	Goal.	Float	`100ms`

5. Outputs (Artifacts)

Artifact	Format	Destination	Quality Criteria
Optimized Code	Source	Repo	Passes strict checking.
Profile Report	HTML/Text	Artifacts	Shows 'Before' vs 'After'.

6. Operating Modes

⚡ Fast Mode

Timebox: 1 hour Scope: Linter Sweep. Details: Running ruff check --select ALL --fix and black. Cleaning unused imports, formatting, and simple rewrites (e.g. f-strings).

🎯 Standard Mode (Default)

Timebox: 4 hours Scope: Vectorization & Typing. Details: Identifying explicit Python loops and replacing them with Pandas/Numpy vector operations. Adding type hints to 100% of signatures.

🔬 Deep Mode

Timebox: 2 days Scope: Cython/JIT. Details: Using numba.jit or writing Cython extensions for the absolute critical path. Analyzing memory leaks with memray.

7. Constraints & Guardrails

Technical Constraints

Maintainability vs Speed: Don't write obscure "bit-hacking" numpy code if it becomes unreadable, unless absolutely necessary. Comment heavily.
Compatibility: Ensure optimized code works on all supported platforms (Windows/Linux/Mac). Numba/Cython can be tricky with wheels.

Security & Privacy

CAUTION

Safe Unpickling If modernizing data loading, switch from pickle to safetensors or json where possible. Pickle is a security hole.

Compliance

Dependencies: Do not introduce heavy dependencies (like pandas) into a lightweight inference library if numpy suffices.

8. Procedure

Phase 1: Profiling (Baseline)

Objective: Find the hotspot.

Instrument the code.

CPU: Use cProfile or py-spy. py-spy record -o profile.svg --pid 12345.
Memory: Use memray. Identify the top 3 functions consuming time/RAM. Note: Often it's i/o (loading files) or copying data, not the math.

Verify: A Flame Graph showing where time is spent.

Phase 2: Vectorization & Optimization

Objective: Hardware acceleration.

Refactor the Hotspots:

Loop Removal: Replace [x * 2 for x in list] with arr * 2.
Pre-allocation: Don't append to lists in a loop. Allocate np.zeros(N) and fill.
Parallelism: Use joblib.Parallel or ProcessPoolExecutor for CPU-bound tasks.
Caching: Use @functools.lru_cache for repeated distinct calls.

Verify: Micro-benchmark the function. Speedup > 2x.

Phase 3: Modernization (Strictness)

Objective: Bulletproof.

config mypy.ini: strict = True. Run mypy .. Fix errors. (No Any, generic lists List[str]). Run ruff. Enable SIM (Simplify) and PERF (Performance) rules.

PERF401: Use a list comprehension to create a transformed list.
SIM105: Use contextlib.suppress instead of try-except-pass.

Verify: Codebase is "Green" on strict settings.

9. Technical Considerations

Generators: Use Generators (yield) for data processing. Do not load a 10GB CSV into RAM. Stream it line by line or chunk by chunk.

Fusing Operations: In PyTorch, use torch.compile (2.0+) to fuse kernels automatically. Manual fusion is hard; let the compiler do it.

Dependency Trimming: Check pipdeptree. Remove unused transitive dependencies to speed up Docker builds and reduce attack surface.

10. Quality Gates (Definition of Done)

Checklist

[ ] Hotspots identified via Profiler.
[ ] Vectorization applied to critical path.
[ ] Mypy Strict passing (or annotated ignores).
[ ] Requirements.txt minimized.

Validation

Criterion	Method	Threshold
Latency	Benchmark	Reduced by > 20%
Lint	Ruff	0 Errors

11. Failure Modes & Recovery

Failure Mode	Symptoms	Recovery Action
OOM	Killed 9.	You optimized for speed (Caching) but blew up Memory. Reduce concurrency or cache size.
Regressions	Fast but Wrong.	Run the Unit Test suite. If tests fail, your optimized logic is flawed.
Build Fail	Structurally sound but fails CI.	Check Mypy version differences. Pin dev dependencies.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as a Senior ML Engineer specializing in Performance Optimization.
Task: Execute the Code Revamp (Optimization) workflow.

## Objective & Scope
- **Goal**: Elevate "working" code to "high-performance" production standards (Speed + Types).
- **Scope**: Profiling, Vectorization, Strict Typing (Mypy), and Linting (Ruff).

## Inputs
- [ ] REPO_PATH: Path to source code.
- [ ] TARGET_LATENCY: Latency goal (optional).

## Output Artifacts
- [ ] Optimized Codebase (Source)
- [ ] Benchmark Report (Text/HTML)

## Execution Steps
1. **Diagnosis**
   - Instrument code with cProfile/py-spy/memray to identify top 3 hotspots.
2. **Optimize**
   - Refactor hotspots using Vectorization (Numpy/Torch), Pre-allocation, or Caching.
3. **Standardize**
   - Run strict linters (Ruff/Mypy) and fix all errors. Annotate generic types.

## Quality Gates
- [ ] Hotspots identified and optimized.
- [ ] Vectorization applied.
- [ ] Mypy Strict passing.
- [ ] Speedup > 20% verified.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Technical**: Prioritize Vectorization over Complexity. Maintain readability.
- **Security**: Use safe serialization (safetensors) where possible.

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

Version	Date	Author	Changes
1.0.0	2026-01-14	AI Engineering Team	Initial release

Code Revamp (Optimization) ​

1. Objective ​

2. Context & Scope ​

In Scope ​

Out of Scope ​

3. When to Use / When Not to Use ​

✅ Use This Workflow When ​

❌ Do NOT Use This Workflow When ​

4. Inputs (Required/Optional) ​

Required Inputs ​

Optional Inputs ​

5. Outputs (Artifacts) ​

6. Operating Modes ​

⚡ Fast Mode ​

🎯 Standard Mode (Default) ​

🔬 Deep Mode ​

7. Constraints & Guardrails ​

Technical Constraints ​

Security & Privacy ​

Compliance ​

8. Procedure ​

Phase 1: Profiling (Baseline) ​

Phase 2: Vectorization & Optimization ​

Phase 3: Modernization (Strictness) ​

9. Technical Considerations ​

10. Quality Gates (Definition of Done) ​

Checklist ​

Validation ​

11. Failure Modes & Recovery ​

12. Copy-Paste Prompt ​

Appendix: Change Log ​