Giao diện
Code Revamp (Optimization)
1. Objective
The objective of this workflow is to elevate "working" code to "high-performance" code. In ML, a slow loop can cost days of training time or thousands of dollars in inference. This workflow focuses on performance optimization (Vectorization, Caching) and strict enforceability of standards (Mypy Strict, Ruff).
2. Context & Scope
In Scope
This workflow covers Static Analysis (Ruff/Mypy), Vectorization (replacing for loops with numpy/torch), Memory Profiling, and Dependency tightening.
Assumption: The code is already modular (functions/classes exist). If it is a messy notebook, use "ML Code Revamp (Refactor)" first.
Out of Scope
- Algorithm Redesign: We are not changing the math, just the implementation efficiency.
- Hardware Upgrade: We optimize for the current hardware.
3. When to Use / When Not to Use
✅ Use This Workflow When
- Your data loader is the bottleneck (GPU utilization < 50%).
- Inference latency is too high for production SLA.
- You are releasing an open-source library and need professional quality.
❌ Do NOT Use This Workflow When
- The code runs in 5 seconds and runs once a month. (Premature Optimization).
- You are debugging logic errors. Fix bugs before optimizing speed.
4. Inputs (Required/Optional)
Required Inputs
| Input | Description | Format | Example |
|---|---|---|---|
| REPO_PATH | Code location. | Path | ./src/my_model |
Optional Inputs
| Input | Description | Default | Condition |
|---|---|---|---|
| TARGET_LATENCY | Goal. | Float | 100ms |
5. Outputs (Artifacts)
| Artifact | Format | Destination | Quality Criteria |
|---|---|---|---|
| Optimized Code | Source | Repo | Passes strict checking. |
| Profile Report | HTML/Text | Artifacts | Shows 'Before' vs 'After'. |
6. Operating Modes
⚡ Fast Mode
Timebox: 1 hour Scope: Linter Sweep. Details: Running ruff check --select ALL --fix and black. Cleaning unused imports, formatting, and simple rewrites (e.g. f-strings).
🎯 Standard Mode (Default)
Timebox: 4 hours Scope: Vectorization & Typing. Details: Identifying explicit Python loops and replacing them with Pandas/Numpy vector operations. Adding type hints to 100% of signatures.
🔬 Deep Mode
Timebox: 2 days Scope: Cython/JIT. Details: Using numba.jit or writing Cython extensions for the absolute critical path. Analyzing memory leaks with memray.
7. Constraints & Guardrails
Technical Constraints
- Maintainability vs Speed: Don't write obscure "bit-hacking" numpy code if it becomes unreadable, unless absolutely necessary. Comment heavily.
- Compatibility: Ensure optimized code works on all supported platforms (Windows/Linux/Mac). Numba/Cython can be tricky with wheels.
Security & Privacy
CAUTION
Safe Unpickling If modernizing data loading, switch from pickle to safetensors or json where possible. Pickle is a security hole.
Compliance
- Dependencies: Do not introduce heavy dependencies (like
pandas) into a lightweight inference library ifnumpysuffices.
8. Procedure
Phase 1: Profiling (Baseline)
Objective: Find the hotspot.
Instrument the code.
- CPU: Use
cProfileorpy-spy.py-spy record -o profile.svg --pid 12345. - Memory: Use
memray. Identify the top 3 functions consuming time/RAM. Note: Often it'si/o(loading files) orcopyingdata, not the math.
Verify: A Flame Graph showing where time is spent.
Phase 2: Vectorization & Optimization
Objective: Hardware acceleration.
Refactor the Hotspots:
- Loop Removal: Replace
[x * 2 for x in list]witharr * 2. - Pre-allocation: Don't
appendto lists in a loop. Allocatenp.zeros(N)and fill. - Parallelism: Use
joblib.ParallelorProcessPoolExecutorfor CPU-bound tasks. - Caching: Use
@functools.lru_cachefor repeated distinct calls.
Verify: Micro-benchmark the function. Speedup > 2x.
Phase 3: Modernization (Strictness)
Objective: Bulletproof.
config mypy.ini: strict = True. Run mypy .. Fix errors. (No Any, generic lists List[str]). Run ruff. Enable SIM (Simplify) and PERF (Performance) rules.
PERF401: Use a list comprehension to create a transformed list.SIM105: Usecontextlib.suppressinstead of try-except-pass.
Verify: Codebase is "Green" on strict settings.
9. Technical Considerations
Generators: Use Generators (yield) for data processing. Do not load a 10GB CSV into RAM. Stream it line by line or chunk by chunk.
Fusing Operations: In PyTorch, use torch.compile (2.0+) to fuse kernels automatically. Manual fusion is hard; let the compiler do it.
Dependency Trimming: Check pipdeptree. Remove unused transitive dependencies to speed up Docker builds and reduce attack surface.
10. Quality Gates (Definition of Done)
Checklist
- [ ] Hotspots identified via Profiler.
- [ ] Vectorization applied to critical path.
- [ ] Mypy Strict passing (or annotated ignores).
- [ ] Requirements.txt minimized.
Validation
| Criterion | Method | Threshold |
|---|---|---|
| Latency | Benchmark | Reduced by > 20% |
| Lint | Ruff | 0 Errors |
11. Failure Modes & Recovery
| Failure Mode | Symptoms | Recovery Action |
|---|---|---|
| OOM | Killed 9. | You optimized for speed (Caching) but blew up Memory. Reduce concurrency or cache size. |
| Regressions | Fast but Wrong. | Run the Unit Test suite. If tests fail, your optimized logic is flawed. |
| Build Fail | Structurally sound but fails CI. | Check Mypy version differences. Pin dev dependencies. |
12. Copy-Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a Senior ML Engineer specializing in Performance Optimization.
Task: Execute the Code Revamp (Optimization) workflow.
## Objective & Scope
- **Goal**: Elevate "working" code to "high-performance" production standards (Speed + Types).
- **Scope**: Profiling, Vectorization, Strict Typing (Mypy), and Linting (Ruff).
## Inputs
- [ ] REPO_PATH: Path to source code.
- [ ] TARGET_LATENCY: Latency goal (optional).
## Output Artifacts
- [ ] Optimized Codebase (Source)
- [ ] Benchmark Report (Text/HTML)
## Execution Steps
1. **Diagnosis**
- Instrument code with cProfile/py-spy/memray to identify top 3 hotspots.
2. **Optimize**
- Refactor hotspots using Vectorization (Numpy/Torch), Pre-allocation, or Caching.
3. **Standardize**
- Run strict linters (Ruff/Mypy) and fix all errors. Annotate generic types.
## Quality Gates
- [ ] Hotspots identified and optimized.
- [ ] Vectorization applied.
- [ ] Mypy Strict passing.
- [ ] Speedup > 20% verified.
## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.
## Constraints
- **Technical**: Prioritize Vectorization over Complexity. Maintain readability.
- **Security**: Use safe serialization (safetensors) where possible.
## Command
Now execute this workflow step-by-step.Appendix: Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-01-14 | AI Engineering Team | Initial release |