Hyperparameter Tuning

1. Objective

The objective of this workflow is to move beyond "Default Parameters". While Scikit-Learn defaults are decent, tuning parameters (like Learning Rate, Tree Depth, Regularization) can often squeeze out another 5-10% performance. This process enables systematic, automated search for the global optimum, rather than manual "grad student descent".

2. Context & Scope

In Scope

This workflow covers defining a Search Space, selecting a Search Strategy (Random, Bayesian), running the optimization loop (with Cross-Validation), and analysing the results.

Assumption: You have a working training pipeline and a validation metric.

Out of Scope

Architecture Search (NAS): Designing a Neural Network architecture from scratch is "AutoML/NAS". This workflow assumes the architecture is fixed (e.g., XGBoost) and we are just tuning the knobs.

3. When to Use / When Not to Use

✅ Use This Workflow When

The model is working but performance has plateaued.
You are preparing a Kaggle submission or Production model.
You switched algorithms and don't know the sensible defaults for your data.

❌ Do NOT Use This Workflow When

You haven't cleaned the data yet. (Clean data > Tuned Params).
The model takes 1 week to train once. (Tuning is too expensive; use heuristics or Transfer Learning).

4. Inputs (Required/Optional)

Required Inputs

Input	Description	Format	Example
MODEL	The algorithm.	Class	`XGBClassifier`
SEARCH_SPACE	Valid ranges.	Dict	`{'lr': [0.01, 0.1], 'depth': int(3, 10)}`
METRIC	Optimization target.	String	`f1_macro`, `rmse`

Optional Inputs

Input	Description	Default	Condition
TRIALS	Budget.	`50`	Stop after N attempts.

5. Outputs (Artifacts)

Artifact	Format	Destination	Quality Criteria
Best Params	JSON	Experiment Tracker	Replicable configuration.
Study Visuals	Plots	Notebook	Convergence plot showing improvement.

6. Operating Modes

⚡ Fast Mode

Timebox: 1 hour Scope: Random Search. Details: Using RandomizedSearchCV for a fixed number of iterations. Better than Grid Search, but dumb.

🎯 Standard Mode (Default)

Timebox: 4 hours Scope: Bayesian Optimization (Optuna). Details: Using Optuna to define a dynamic search space. It uses past results (TPE - Tree-structured Parzen Estimator) to guide the search towards promising regions.

🔬 Deep Mode

Timebox: 2 days Scope: Distributed Tuning. Details: Running Ray Tune or Optuna with Pruning (Successive Halving) across a cluster of workers. If a trial looks bad at Epoch 2, kill it early to save compute.

7. Constraints & Guardrails

Technical Constraints

Overfitting Validation: If you tune excessively on the Validation Set, you will overfit the Validation Set. You MUST have a hold-out Test Set that is never touched during tuning.
Compute Cost: Tuning is $O (N)$ training runs. Be mindful of cloud costs.

Security & Privacy

CAUTION

Data Privacy Tuning creates many models. If the model memorizes PII, you now have 100 artifacts with PII. Manage artifact lifecycle carefully.

Compliance

Reproducibility: You must save the seed and the exact parameters. A "black box" tuned model that cannot be retrained is a liability.

8. Procedure

Phase 1: Search Space Definition

Objective: Define the boundaries.

Consult documentation/literature for the algorithm. Determine:

Critical Params: Have high impact (e.g., Learning Rate, Num Layers).
Secondary Params: Minor impact (e.g., Seed, Verbosity). Define the Distribution:
LogUniform for Learning Rate (explore 0.001 and 0.1 equally).
Int for Depth.
Categorical for Solver type.

Verify: A dictionary defining the hyperparameter grid distributions.

Phase 2: Optimization Loop

Objective: Hunt for the best.

Initialize the Study (Optuna). Define the objective(trial) function:

Sample params from space.
Init Model with params.
Cross-Validate (3-fold or 5-fold). Return mean score.
(Optional) Report intermediate steps for Pruning.

Run study.optimize(n_trials=TRIALS). Monitor the logs. Is the metric improving?

Verify: The study completes. Best score > Baseline score.

Phase 3: Selection & Retrain

Objective: Finalize.

Extract study.best_params. Retrain the model on the Full Training Set (Train + Val) using these best parameters. Evaluate on the Test Set (Holdout). If Test Score << Validation Score, you overfit the hyperparameters. Simplfy the space and repeat.

Verify: Final model artifact saved with metadata.

9. Technical Considerations

Grid Search vs Random Search: Grid search is exponential $O (n^{k})$ . It is terrible for high dimensions. Random Search is surprisingly effective. Bayesian (Optuna) is the modern standard.

Pruning: "Median Pruning" stops a trial if it performs worse than the median of previous trials at the same step. This can speed up tuning by 2-5x.

Correlation: Some params are correlated (e.g., Learning Rate should go down as Batch Size goes down). Advanced optimizers handle this, manual tuning typically misses it.

10. Quality Gates (Definition of Done)

Checklist

[ ] Search Space defined sensibly.
[ ] Optimization method selected > Grid Search.
[ ] Pruning enabled (if iterative).
[ ] Hold-out test confirms improvement.

Validation

Criterion	Method	Threshold
Improvement	Metric Delta	> 1-2% vs Defaults
Stability	CV Variance	Low variance across folds

11. Failure Modes & Recovery

Failure Mode	Symptoms	Recovery Action
Convergence Fail	Loss explodes (NaN).	Learning Rate upper bound is too high. Reduce it.
No Improvement	Tuned model == Default model.	Your search space didn't include the optimal region, or data is just noise.
Timeout	Optimization takes forever.	Reduce `n_splits` in CV; Reduce `n_trials`; Use a smaller data subsample for tuning.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as a Senior ML Engineer.
Task: Execute the Hyperparameter Tuning workflow.

## Objective & Scope
- **Goal**: Maximize model predictive performance by optimizing hyperparameters.
- **Scope**: Search Space definition, Bayesian Optimization (Optuna), and Final Model Training.

## Inputs
- [ ] MODEL: Model Class (e.g., XGBClassifier).
- [ ] SEARCH_SPACE: Dictionary of parameter ranges.
- [ ] METRIC: Optimization Target (e.g., F1 Score).
- [ ] TRIALS: Budget (e.g., 50 trials).

## Output Artifacts
- [ ] Best Parameters (JSON)
- [ ] Optimization History (Plot)
- [ ] Tuned Model Artifact

## Execution Steps
1. **Setup**
   - Define Optuna Objective function. Implement Cross-Validation inside objective. Define Search Space.
2. **Optimize**
   - Run Optuna study. Use Pruning (Median) to stop bad trials early.
3. **Finalize**
   - Retrain model on full dataset using best params. Evaluate on separate Test Set.

## Quality Gates
- [ ] Optimization study completed.
- [ ] Valid Improvement over baseline.
- [ ] No leakage (CV usage).
- [ ] Best params persisted.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Resource**: Respect compute budget (Time/Cost).
- **Technical**: Avoid overfitting the Validation set (use final Test set).

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

Version	Date	Author	Changes
1.0.0	2026-01-14	AI Engineering Team	Initial release

Hyperparameter Tuning ​

1. Objective ​

2. Context & Scope ​

In Scope ​

Out of Scope ​

3. When to Use / When Not to Use ​

✅ Use This Workflow When ​

❌ Do NOT Use This Workflow When ​

4. Inputs (Required/Optional) ​

Required Inputs ​

Optional Inputs ​

5. Outputs (Artifacts) ​

6. Operating Modes ​

⚡ Fast Mode ​

🎯 Standard Mode (Default) ​

🔬 Deep Mode ​

7. Constraints & Guardrails ​

Technical Constraints ​

Security & Privacy ​

Compliance ​

8. Procedure ​

Phase 1: Search Space Definition ​

Phase 2: Optimization Loop ​

Phase 3: Selection & Retrain ​

9. Technical Considerations ​

10. Quality Gates (Definition of Done) ​

Checklist ​

Validation ​

11. Failure Modes & Recovery ​

12. Copy-Paste Prompt ​

Appendix: Change Log ​