Fine-tune hyperparameters to improve performance
Fine-tune hyperparameters to improve performance
Hyperparameter Optimization Guide: Fine-Tune ML Models for Peak Performance | Step 7 Cheat Sheet 2026

⚙️ Step 7: Hyperparameter Optimization

Fine-Tune Parameters to Improve Performance

Machine Learning Pipeline – Optimization Stage

🎯What is Hyperparameter Optimization?

Hyperparameters are configuration settings that control the learning process of a model. Unlike model parameters (learned from data), hyperparameters are set before training and significantly impact model performance.

Key Difference:
Parameters: Learned during training (e.g., weights in neural networks)
Hyperparameters: Set before training (e.g., learning rate, number of trees, depth)

Why Optimize Hyperparameters?

  • Can improve model performance by 5-20% or more
  • Reduces overfitting and underfitting
  • Helps models generalize better to unseen data
  • Balances bias-variance tradeoff
  • Optimizes training speed and efficiency

📋Common Hyperparameters by Algorithm

Random Forest

n_estimators

Number of trees in the forest

Range: 50-500
max_depth

Maximum depth of each tree

Range: 10-100 or None
min_samples_split

Min samples to split a node

Range: 2-20
min_samples_leaf

Min samples in a leaf node

Range: 1-10
max_features

Features to consider for splits

Values: 'sqrt', 'log2', None

Gradient Boosting (XGBoost/LightGBM)

learning_rate

Step size for weight updates

Range: 0.001-0.3
n_estimators

Number of boosting rounds

Range: 100-1000
max_depth

Maximum tree depth

Range: 3-10
subsample

Fraction of samples per tree

Range: 0.5-1.0
colsample_bytree

Fraction of features per tree

Range: 0.5-1.0

Neural Networks

learning_rate

Optimizer step size

Range: 1e-5 to 1e-2
batch_size

Samples per gradient update

Values: 16, 32, 64, 128
hidden_layers

Number and size of layers

Example: [128, 64, 32]
dropout_rate

Regularization dropout

Range: 0.1-0.5
optimizer

Optimization algorithm

Values: Adam, SGD, RMSprop

Support Vector Machines (SVM)

C

Regularization parameter

Range: 0.1-100
kernel

Kernel type

Values: 'linear', 'rbf', 'poly'
gamma

Kernel coefficient

Range: 0.001-1.0

🔍Hyperparameter Tuning Methods

1. Grid Search

Exhaustively searches through a manually specified subset of hyperparameter space.

✅ Pros
  • Guaranteed to find best combination in search space
  • Easy to implement and understand
  • Parallelizable
  • Reproducible results
❌ Cons
  • Computationally expensive
  • Time grows exponentially with parameters
  • Wastes resources on poor regions
  • Not suitable for large search spaces

Example: Grid Search with Scikit-learn

from sklearn.model_selection import GridSearchCV from sklearn.ensemble import RandomForestClassifier # Define parameter grid param_grid = { ‘n_estimators’: [100, 200, 300], ‘max_depth’: [10, 20, 30, None], ‘min_samples_split’: [2, 5, 10], ‘min_samples_leaf’: [1, 2, 4], ‘max_features’: [‘sqrt’, ‘log2’] } # Initialize model rf = RandomForestClassifier(random_state=42) # Grid search with cross-validation grid_search = GridSearchCV( estimator=rf, param_grid=param_grid, cv=5, # 5-fold cross-validation scoring=‘accuracy’, n_jobs=-1, # Use all CPU cores verbose=2 ) # Fit grid search grid_search.fit(X_train, y_train) # Best parameters and score print(f”Best parameters: {grid_search.best_params_}”) print(f”Best cross-validation score: {grid_search.best_score_:.3f}”) # Use best model best_model = grid_search.best_estimator_ test_score = best_model.score(X_test, y_test) print(f”Test score: {test_score:.3f}”)

2. Random Search

Randomly samples from the hyperparameter space for a fixed number of iterations.

✅ Pros
  • More efficient than grid search
  • Better for high-dimensional spaces
  • Can find good parameters quickly
  • Explores diverse combinations
❌ Cons
  • No guarantee of finding optimal
  • May miss important regions
  • Results vary between runs
  • Need to set number of iterations

Example: Random Search with Scikit-learn

from sklearn.model_selection import RandomizedSearchCV from scipy.stats import randint, uniform # Define parameter distributions param_distributions = { ‘n_estimators’: randint(100, 500), ‘max_depth’: randint(10, 50), ‘min_samples_split’: randint(2, 20), ‘min_samples_leaf’: randint(1, 10), ‘max_features’: [‘sqrt’, ‘log2’, None], ‘bootstrap’: [True, False] } # Random search random_search = RandomizedSearchCV( estimator=rf, param_distributions=param_distributions, n_iter=100, # Number of parameter settings sampled cv=5, scoring=‘accuracy’, n_jobs=-1, random_state=42, verbose=2 ) # Fit random search random_search.fit(X_train, y_train) print(f”Best parameters: {random_search.best_params_}”) print(f”Best score: {random_search.best_score_:.3f}”)

3. Bayesian Optimization

Uses probabilistic models to intelligently select the next hyperparameters to evaluate based on past results.

✅ Pros
  • Most efficient for expensive evaluations
  • Learns from previous iterations
  • Balances exploration vs exploitation
  • Requires fewer iterations
❌ Cons
  • More complex to implement
  • Sequential (harder to parallelize)
  • Requires additional libraries
  • Can get stuck in local optima

Example: Bayesian Optimization with Optuna

import optuna from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score # Define objective function def objective(trial): # Suggest hyperparameters params = { ‘n_estimators’: trial.suggest_int(‘n_estimators’, 100, 500), ‘max_depth’: trial.suggest_int(‘max_depth’, 10, 50), ‘min_samples_split’: trial.suggest_int(‘min_samples_split’, 2, 20), ‘min_samples_leaf’: trial.suggest_int(‘min_samples_leaf’, 1, 10), ‘max_features’: trial.suggest_categorical(‘max_features’, [‘sqrt’, ‘log2’]), ‘random_state’: 42 } # Create model and evaluate model = RandomForestClassifier(**params) score = cross_val_score(model, X_train, y_train, cv=5).mean() return score # Create study and optimize study = optuna.create_study(direction=‘maximize’) study.optimize(objective, n_trials=100, show_progress_bar=True) # Best parameters print(f”Best parameters: {study.best_params}”) print(f”Best score: {study.best_value:.3f}”) # Train final model with best parameters best_model = RandomForestClassifier(**study.best_params) best_model.fit(X_train, y_train)

4. Successive Halving (HalvingGridSearch / HalvingRandomSearch)

Starts with many configurations on small data, progressively eliminates poor performers, allocating more resources to promising candidates.

✅ Pros
  • Much faster than standard grid/random search
  • Efficient resource allocation
  • Good for large datasets
  • Can handle more configurations
❌ Cons
  • May eliminate good configs early
  • Experimental in scikit-learn
  • Requires more setup
  • Less interpretable

Example: Successive Halving

from sklearn.experimental import enable_halving_search_cv from sklearn.model_selection import HalvingRandomSearchCV # Define parameter distributions param_distributions = { ‘n_estimators’: [100, 200, 300, 400, 500], ‘max_depth’: [10, 20, 30, 40, 50], ‘min_samples_split’: [2, 5, 10, 15], ‘min_samples_leaf’: [1, 2, 4, 8] } # Halving random search halving_search = HalvingRandomSearchCV( estimator=rf, param_distributions=param_distributions, factor=3, # Reduce candidates by factor of 3 each iteration cv=5, random_state=42, n_jobs=-1 ) halving_search.fit(X_train, y_train) print(f”Best parameters: {halving_search.best_params_}”) print(f”Best score: {halving_search.best_score_:.3f}”)

🚀Advanced Optimization Techniques

Hyperband (via Optuna or Ray Tune)

Hyperband: Extension of successive halving that automatically determines the optimal resource allocation strategy.
import optuna # Create study with Hyperband sampler study = optuna.create_study( direction=‘maximize’, sampler=optuna.samplers.TPESampler() # Tree-structured Parzen Estimator ) study.optimize(objective, n_trials=100)

Neural Architecture Search (NAS) with Keras Tuner

from tensorflow import keras from keras_tuner import RandomSearch def build_model(hp): model = keras.Sequential() # Tune number of layers for i in range(hp.Int(‘num_layers’, 1, 5)): model.add(keras.layers.Dense( units=hp.Int(f’units_{i}’, 32, 512, step=32), activation=‘relu’ )) model.add(keras.layers.Dropout( hp.Float(f’dropout_{i}’, 0, 0.5, step=0.1) )) model.add(keras.layers.Dense(1, activation=‘sigmoid’)) # Tune learning rate model.compile( optimizer=keras.optimizers.Adam( hp.Float(‘learning_rate’, 1e-4, 1e-2, sampling=‘log’) ), loss=‘binary_crossentropy’, metrics=[‘accuracy’] ) return model # Create tuner tuner = RandomSearch( build_model, objective=‘val_accuracy’, max_trials=50, directory=‘tuner_results’, project_name=‘neural_net_optimization’ ) # Search for best hyperparameters tuner.search(X_train, y_train, epochs=10, validation_split=0.2) # Get best model best_model = tuner.get_best_models(num_models=1)[0]

Genetic Algorithms with TPOT (AutoML)

from tpot import TPOTClassifier # TPOT uses genetic algorithms to optimize pipelines tpot = TPOTClassifier( generations=5, # Number of iterations population_size=50, # Number of models per generation cv=5, random_state=42, verbosity=2, n_jobs=-1 ) # Fit TPOT (finds best pipeline + hyperparameters) tpot.fit(X_train, y_train) # Evaluate print(tpot.score(X_test, y_test)) # Export best pipeline tpot.export(‘best_pipeline.py’)

📊Method Comparison

Method Speed Efficiency Best For Complexity
Grid Search Small search spaces, few parameters Low
Random Search ⭐⭐⭐ ⭐⭐⭐ Medium-large spaces, quick exploration Low
Bayesian Optimization ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ Expensive models, limited budget Medium
Successive Halving ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ Large datasets, many configurations Medium
Hyperband ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ Neural networks, adaptive budgets High
Genetic Algorithms ⭐⭐⭐ ⭐⭐⭐⭐ Pipeline optimization, AutoML High

🔄Complete Optimization Workflow

1
Establish Baseline
Train model with default parameters and record performance
2
Define Search Space
Identify important hyperparameters and reasonable ranges
3
Choose Optimization Method
Select based on time budget and search space size
4
Run Initial Search
Start with random search or Bayesian optimization
5
Analyze Results
Examine parameter importance and interactions
6
Refine Search
Narrow range around best parameters, run focused search
7
Validate on Test Set
Evaluate final model on held-out test data
8
Document & Deploy
Record final hyperparameters and performance metrics

Complete Example: End-to-End Optimization

import numpy as np from sklearn.model_selection import train_test_split, cross_val_score from sklearn.ensemble import RandomForestClassifier import optuna # 1. Prepare data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) # 2. Baseline model baseline_model = RandomForestClassifier(random_state=42) baseline_score = cross_val_score(baseline_model, X_train, y_train, cv=5).mean() print(f”Baseline CV Score: {baseline_score:.3f}”) # 3. Define optimization objective def objective(trial): params = { ‘n_estimators’: trial.suggest_int(‘n_estimators’, 100, 500), ‘max_depth’: trial.suggest_int(‘max_depth’, 10, 50), ‘min_samples_split’: trial.suggest_int(‘min_samples_split’, 2, 20), ‘min_samples_leaf’: trial.suggest_int(‘min_samples_leaf’, 1, 10), ‘max_features’: trial.suggest_categorical(‘max_features’, [‘sqrt’, ‘log2’]), ‘random_state’: 42 } model = RandomForestClassifier(**params) score = cross_val_score(model, X_train, y_train, cv=5, scoring=‘accuracy’).mean() return score # 4. Run optimization study = optuna.create_study(direction=‘maximize’) study.optimize(objective, n_trials=100, show_progress_bar=True) # 5. Results print(f”\n{‘=’*50}”) print(f”Best CV Score: {study.best_value:.3f}”) print(f”Improvement: {(study.best_value – baseline_score)*100:.2f}%”) print(f”Best Parameters:”) for key, value in study.best_params.items(): print(f” {key}: {value}”) # 6. Train final model with best parameters final_model = RandomForestClassifier(**study.best_params) final_model.fit(X_train, y_train) # 7. Evaluate on test set test_score = final_model.score(X_test, y_test) print(f”\nTest Set Score: {test_score:.3f}”) print(f”{‘=’*50}”) # 8. Visualize optimization history import matplotlib.pyplot as plt fig = optuna.visualization.matplotlib.plot_optimization_history(study) plt.show() fig = optuna.visualization.matplotlib.plot_param_importances(study) plt.show()

Best Practices & Tips

1. Start Simple, Then Expand

Begin with a small number of key hyperparameters. Once you understand their impact, expand your search.

💡 Tip: Focus first on learning_rate, regularization, and model complexity parameters.

2. Use Logarithmic Scales for Learning Rates

Learning rates often work best on log scale (e.g., 0.001, 0.01, 0.1 rather than 0.001, 0.002, 0.003).

💡 Code: trial.suggest_float(‘lr’, 1e-5, 1e-1, log=True)

3. Always Use Cross-Validation

Evaluate hyperparameters with CV to get robust estimates and avoid overfitting to validation set.

💡 Recommended: Use 5-fold or 10-fold cross-validation during tuning.

4. Set a Time Budget

Hyperparameter optimization has diminishing returns. Set a reasonable time limit.

💡 Example: study.optimize(objective, timeout=3600) # 1 hour

5. Monitor Overfitting During Tuning

Track both training and validation performance. Large gaps indicate overfitting.

💡 Warning: If CV score >> test score, you may be overfitting the validation set.

6. Document Everything

Keep detailed records of all experiments, parameters, and results for reproducibility.

💡 Tools: Use MLflow, Weights & Biases, or Optuna dashboard for tracking.

7. Use Early Stopping for Iterative Models

Stop training when validation performance stops improving to save time and prevent overfitting.

💡 Example: XGBoost early_stopping_rounds parameter

8. Consider Computational Cost

Balance performance gains against training time and computational resources.

💡 Rule: If 10% more trials only improve by <1%, stop tuning.

⚠️Common Pitfalls to Avoid

❌ Using Test Set for Hyperparameter Tuning

Problem: Leads to overly optimistic performance estimates

Solution: Only use train + validation for tuning; test set for final evaluation only

❌ Too Many Hyperparameters at Once

Problem: Exponential growth in search space, wasted computation

Solution: Start with 3-5 most important parameters, then expand gradually

❌ Ignoring Domain Knowledge

Problem: Searching unrealistic parameter ranges

Solution: Use literature and documentation to set sensible ranges

❌ Not Fixing Random Seeds

Problem: Results not reproducible, hard to compare runs

Solution: Always set random_state/seed in models and search algorithms

❌ Optimizing Wrong Metric

Problem: High accuracy doesn’t mean good business outcomes

Solution: Choose metrics aligned with your actual objectives (F1, precision, recall, etc.)

🛠️Popular Tools & Libraries

Scikit-learn

GridSearchCV, RandomizedSearchCV, HalvingGridSearchCV

Best for: Traditional ML models

Optuna

Advanced Bayesian optimization framework

Best for: Any ML framework, highly customizable

Hyperopt

Bayesian optimization with TPE algorithm

Best for: Complex search spaces

Ray Tune

Scalable hyperparameter tuning for distributed systems

Best for: Large-scale experiments, deep learning

Keras Tuner

Hyperparameter tuning for Keras/TensorFlow

Best for: Neural architecture search

Weights & Biases

Experiment tracking with hyperparameter sweeps

Best for: Tracking and visualization

Quick Installation

# Install core libraries pip install scikit-learn optuna hyperopt ray[tune] keras-tuner wandb # For AutoML pip install tpot auto-sklearn

Hyperparameter Optimization Cheat Sheet

Step 7: Fine-Tune Parameters to Improve Performance

© 2026 ML Learning Resources | Complete Machine Learning Pipeline Series

Leave a Reply

Your email address will not be published. Required fields are marked *