How to Perform Grid Search in Python

Key Insights

Grid search exhaustively tests every combination of hyperparameters you specify, making it ideal for small parameter spaces where thoroughness matters more than speed—but computational cost grows exponentially with each added parameter.
Scikit-learn’s GridSearchCV automates the entire process with built-in cross-validation, parallel processing, and comprehensive result tracking, eliminating the need for manual loops and validation splits.
Start with a coarse grid using wide parameter ranges, identify promising regions, then perform refined searches—this two-stage approach reduces computation time by 10x or more while maintaining model quality.

Introduction to Hyperparameter Tuning

Hyperparameters are the configuration settings you choose before training begins—learning rate, tree depth, regularization strength. Unlike model parameters (weights and biases learned during training), hyperparameters control the learning process itself. Getting them right is the difference between a model that barely beats random guessing and one that achieves production-grade performance.

Grid search solves this problem through brute force: test every possible combination of hyperparameters you specify, evaluate each with cross-validation, and select the winner. Compared to manual tuning (guessing and checking), grid search is systematic and reproducible. Compared to random search (sampling combinations randomly), grid search guarantees you won’t miss the optimal combination within your specified ranges—though random search often finds “good enough” solutions much faster when dealing with many hyperparameters.

Understanding Grid Search Basics

Grid search creates a multidimensional grid where each axis represents one hyperparameter. For a support vector machine with two hyperparameters—C (regularization) and gamma (kernel coefficient)—testing 5 values of C and 4 values of gamma creates 20 combinations. Add a third hyperparameter with 3 values, and you’re now testing 60 combinations. This exponential growth is grid search’s fundamental limitation.

Here’s what grid search looks like under the hood:

import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Define parameter grid manually
C_values = [0.1, 1, 10]
gamma_values = [0.001, 0.01, 0.1]

best_score = 0
best_params = {}

# Manual grid search
for C in C_values:
    for gamma in gamma_values:
        model = SVC(C=C, gamma=gamma)
        scores = cross_val_score(model, X, y, cv=5)
        mean_score = np.mean(scores)
        
        if mean_score > best_score:
            best_score = mean_score
            best_params = {'C': C, 'gamma': gamma}
        
        print(f"C={C}, gamma={gamma}: {mean_score:.3f}")

print(f"\nBest parameters: {best_params}")
print(f"Best score: {best_score:.3f}")

This manual approach works but requires explicit loops, manual tracking of results, and careful bookkeeping. Scikit-learn’s GridSearchCV handles all of this automatically.

Grid Search with Scikit-learn

GridSearchCV wraps any scikit-learn estimator and automates the entire grid search process. You define the parameter grid as a dictionary, specify cross-validation strategy, and let it run.

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load and split data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf', 'linear']
}

# Initialize GridSearchCV
grid_search = GridSearchCV(
    estimator=SVC(),
    param_grid=param_grid,
    cv=5,                    # 5-fold cross-validation
    scoring='accuracy',      # Metric to optimize
    n_jobs=-1,              # Use all CPU cores
    verbose=2               # Print progress
)

# Perform grid search
grid_search.fit(X_train, y_train)

# Results
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.3f}")
print(f"Test set score: {grid_search.score(X_test, y_test):.3f}")

Key parameters to understand:

param_grid: Dictionary where keys are parameter names and values are lists to try
cv: Number of cross-validation folds (5 or 10 are standard)
scoring: Metric for evaluation (‘accuracy’, ‘f1’, ‘roc_auc’, etc.)
n_jobs=-1: Parallelize across all CPU cores for significant speedup

Analyzing Grid Search Results

The cv_results_ attribute contains detailed information about every combination tested. Extract and visualize this data to understand your parameter space:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Convert results to DataFrame
results_df = pd.DataFrame(grid_search.cv_results_)

# Display key columns
print(results_df[['params', 'mean_test_score', 'std_test_score', 'rank_test_score']].head(10))

# For 2D parameter visualization (RBF kernel only)
rbf_results = results_df[results_df['param_kernel'] == 'rbf']

# Create pivot table for heatmap
pivot_table = rbf_results.pivot_table(
    values='mean_test_score',
    index='param_C',
    columns='param_gamma'
)

# Visualize
plt.figure(figsize=(10, 6))
sns.heatmap(pivot_table, annot=True, fmt='.3f', cmap='viridis')
plt.title('Grid Search Results: Accuracy by C and Gamma (RBF Kernel)')
plt.xlabel('Gamma')
plt.ylabel('C')
plt.tight_layout()
plt.savefig('grid_search_heatmap.png', dpi=300)
plt.show()

This heatmap reveals patterns: perhaps accuracy plateaus at high C values, or gamma shows a clear optimum. These insights guide refined searches.

Advanced Grid Search Techniques

Custom Scoring Functions

Built-in metrics don’t always match business requirements. Create custom scorers for domain-specific objectives:

from sklearn.metrics import make_scorer, fbeta_score

# Custom scorer: F2 score (weights recall over precision)
f2_scorer = make_scorer(fbeta_score, beta=2, average='weighted')

# Grid search with custom scoring
grid_search = GridSearchCV(
    estimator=SVC(),
    param_grid=param_grid,
    cv=5,
    scoring=f2_scorer,  # Use custom metric
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

Pipeline Integration

Preprocessing steps (scaling, feature selection) have their own hyperparameters. Use pipelines to tune the entire workflow:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_classif

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('feature_selection', SelectKBest(f_classif)),
    ('classifier', SVC())
])

# Define parameter grid for entire pipeline
param_grid = {
    'feature_selection__k': [2, 3, 4],  # Number of features
    'classifier__C': [0.1, 1, 10],
    'classifier__gamma': [0.001, 0.01, 0.1],
    'classifier__kernel': ['rbf']
}

# Grid search on pipeline
grid_search = GridSearchCV(
    pipeline,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)
print(f"Best pipeline parameters: {grid_search.best_params_}")

This approach prevents data leakage by ensuring preprocessing happens inside each cross-validation fold.

Best Practices and Optimization Tips

Start coarse, then refine. Don’t immediately test 50 values per parameter. Begin with wide ranges and few values:

# Stage 1: Coarse grid
coarse_grid = {
    'C': [0.001, 0.1, 10, 1000],
    'gamma': [0.0001, 0.01, 1, 100]
}

# Suppose best is C=10, gamma=0.01

# Stage 2: Refined grid around best region
refined_grid = {
    'C': [5, 7.5, 10, 12.5, 15],
    'gamma': [0.005, 0.0075, 0.01, 0.0125, 0.015]
}

Use logarithmic scales for parameters that span orders of magnitude (learning rates, regularization). Instead of [1, 2, 3, 4, 5], use [0.001, 0.01, 0.1, 1, 10].

Monitor computational cost. Calculate expected runtime before starting:

import time

n_combinations = 1
for values in param_grid.values():
    n_combinations *= len(values)

# Estimate: time per fit × combinations × CV folds
estimated_minutes = (0.5 * n_combinations * 5) / 60
print(f"Estimated runtime: {estimated_minutes:.1f} minutes")

Alternatives and When to Move Beyond Grid Search

When parameter spaces grow large (6+ hyperparameters or 1000+ combinations), switch to RandomizedSearchCV:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, loguniform

# Define distributions instead of discrete values
param_distributions = {
    'C': loguniform(0.001, 1000),
    'gamma': loguniform(0.0001, 100),
    'kernel': ['rbf', 'linear']
}

# Sample 50 random combinations
random_search = RandomizedSearchCV(
    estimator=SVC(),
    param_distributions=param_distributions,
    n_iter=50,  # Number of combinations to try
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42
)

random_search.fit(X_train, y_train)
print(f"Best parameters: {random_search.best_params_}")

Random search often finds near-optimal solutions in 10-20% of the time grid search requires. For even better efficiency, consider Bayesian optimization libraries like Optuna or Hyperopt, which learn from previous trials to intelligently select the next combination to test.

Grid search remains the gold standard for small, well-defined parameter spaces where you need guaranteed coverage and interpretable results. Master it first, then graduate to more sophisticated methods as your problems scale.