Giao diện
Hyperparameter Tuning
1. Objective
The objective of this workflow is to move beyond "Default Parameters". While Scikit-Learn defaults are decent, tuning parameters (like Learning Rate, Tree Depth, Regularization) can often squeeze out another 5-10% performance. This process enables systematic, automated search for the global optimum, rather than manual "grad student descent".
2. Context & Scope
In Scope
This workflow covers defining a Search Space, selecting a Search Strategy (Random, Bayesian), running the optimization loop (with Cross-Validation), and analysing the results.
Assumption: You have a working training pipeline and a validation metric.
Out of Scope
- Architecture Search (NAS): Designing a Neural Network architecture from scratch is "AutoML/NAS". This workflow assumes the architecture is fixed (e.g., XGBoost) and we are just tuning the knobs.
3. When to Use / When Not to Use
✅ Use This Workflow When
- The model is working but performance has plateaued.
- You are preparing a Kaggle submission or Production model.
- You switched algorithms and don't know the sensible defaults for your data.
❌ Do NOT Use This Workflow When
- You haven't cleaned the data yet. (Clean data > Tuned Params).
- The model takes 1 week to train once. (Tuning is too expensive; use heuristics or Transfer Learning).
4. Inputs (Required/Optional)
Required Inputs
| Input | Description | Format | Example |
|---|---|---|---|
| MODEL | The algorithm. | Class | XGBClassifier |
| SEARCH_SPACE | Valid ranges. | Dict | {'lr': [0.01, 0.1], 'depth': int(3, 10)} |
| METRIC | Optimization target. | String | f1_macro, rmse |
Optional Inputs
| Input | Description | Default | Condition |
|---|---|---|---|
| TRIALS | Budget. | 50 | Stop after N attempts. |
5. Outputs (Artifacts)
| Artifact | Format | Destination | Quality Criteria |
|---|---|---|---|
| Best Params | JSON | Experiment Tracker | Replicable configuration. |
| Study Visuals | Plots | Notebook | Convergence plot showing improvement. |
6. Operating Modes
⚡ Fast Mode
Timebox: 1 hour Scope: Random Search. Details: Using RandomizedSearchCV for a fixed number of iterations. Better than Grid Search, but dumb.
🎯 Standard Mode (Default)
Timebox: 4 hours Scope: Bayesian Optimization (Optuna). Details: Using Optuna to define a dynamic search space. It uses past results (TPE - Tree-structured Parzen Estimator) to guide the search towards promising regions.
🔬 Deep Mode
Timebox: 2 days Scope: Distributed Tuning. Details: Running Ray Tune or Optuna with Pruning (Successive Halving) across a cluster of workers. If a trial looks bad at Epoch 2, kill it early to save compute.
7. Constraints & Guardrails
Technical Constraints
- Overfitting Validation: If you tune excessively on the Validation Set, you will overfit the Validation Set. You MUST have a hold-out Test Set that is never touched during tuning.
- Compute Cost: Tuning is
training runs. Be mindful of cloud costs.
Security & Privacy
CAUTION
Data Privacy Tuning creates many models. If the model memorizes PII, you now have 100 artifacts with PII. Manage artifact lifecycle carefully.
Compliance
- Reproducibility: You must save the seed and the exact parameters. A "black box" tuned model that cannot be retrained is a liability.
8. Procedure
Phase 1: Search Space Definition
Objective: Define the boundaries.
Consult documentation/literature for the algorithm. Determine:
- Critical Params: Have high impact (e.g., Learning Rate, Num Layers).
- Secondary Params: Minor impact (e.g., Seed, Verbosity). Define the Distribution:
LogUniformfor Learning Rate (explore 0.001 and 0.1 equally).Intfor Depth.Categoricalfor Solver type.
Verify: A dictionary defining the hyperparameter grid distributions.
Phase 2: Optimization Loop
Objective: Hunt for the best.
Initialize the Study (Optuna). Define the objective(trial) function:
- Sample params from space.
- Init Model with params.
- Cross-Validate (3-fold or 5-fold). Return mean score.
- (Optional) Report intermediate steps for Pruning.
Run study.optimize(n_trials=TRIALS). Monitor the logs. Is the metric improving?
Verify: The study completes. Best score > Baseline score.
Phase 3: Selection & Retrain
Objective: Finalize.
Extract study.best_params. Retrain the model on the Full Training Set (Train + Val) using these best parameters. Evaluate on the Test Set (Holdout). If Test Score << Validation Score, you overfit the hyperparameters. Simplfy the space and repeat.
Verify: Final model artifact saved with metadata.
9. Technical Considerations
Grid Search vs Random Search: Grid search is exponential
Pruning: "Median Pruning" stops a trial if it performs worse than the median of previous trials at the same step. This can speed up tuning by 2-5x.
Correlation: Some params are correlated (e.g., Learning Rate should go down as Batch Size goes down). Advanced optimizers handle this, manual tuning typically misses it.
10. Quality Gates (Definition of Done)
Checklist
- [ ] Search Space defined sensibly.
- [ ] Optimization method selected > Grid Search.
- [ ] Pruning enabled (if iterative).
- [ ] Hold-out test confirms improvement.
Validation
| Criterion | Method | Threshold |
|---|---|---|
| Improvement | Metric Delta | > 1-2% vs Defaults |
| Stability | CV Variance | Low variance across folds |
11. Failure Modes & Recovery
| Failure Mode | Symptoms | Recovery Action |
|---|---|---|
| Convergence Fail | Loss explodes (NaN). | Learning Rate upper bound is too high. Reduce it. |
| No Improvement | Tuned model == Default model. | Your search space didn't include the optimal region, or data is just noise. |
| Timeout | Optimization takes forever. | Reduce n_splits in CV; Reduce n_trials; Use a smaller data subsample for tuning. |
12. Copy-Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a Senior ML Engineer.
Task: Execute the Hyperparameter Tuning workflow.
## Objective & Scope
- **Goal**: Maximize model predictive performance by optimizing hyperparameters.
- **Scope**: Search Space definition, Bayesian Optimization (Optuna), and Final Model Training.
## Inputs
- [ ] MODEL: Model Class (e.g., XGBClassifier).
- [ ] SEARCH_SPACE: Dictionary of parameter ranges.
- [ ] METRIC: Optimization Target (e.g., F1 Score).
- [ ] TRIALS: Budget (e.g., 50 trials).
## Output Artifacts
- [ ] Best Parameters (JSON)
- [ ] Optimization History (Plot)
- [ ] Tuned Model Artifact
## Execution Steps
1. **Setup**
- Define Optuna Objective function. Implement Cross-Validation inside objective. Define Search Space.
2. **Optimize**
- Run Optuna study. Use Pruning (Median) to stop bad trials early.
3. **Finalize**
- Retrain model on full dataset using best params. Evaluate on separate Test Set.
## Quality Gates
- [ ] Optimization study completed.
- [ ] Valid Improvement over baseline.
- [ ] No leakage (CV usage).
- [ ] Best params persisted.
## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.
## Constraints
- **Resource**: Respect compute budget (Time/Cost).
- **Technical**: Avoid overfitting the Validation set (use final Test set).
## Command
Now execute this workflow step-by-step.Appendix: Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-01-14 | AI Engineering Team | Initial release |