How to Perform Polynomial Regression in Python

Linear regression works beautifully when your data follows a straight line. But real-world relationships are often curved—think diminishing returns, exponential growth, or seasonal patterns. When you...

Key Insights

  • Polynomial regression extends linear regression to capture curved relationships by adding squared, cubed, and higher-degree terms—it’s still “linear” in the coefficients, which means you can use the same LinearRegression class after transforming your features.
  • Choosing the right polynomial degree is critical: too low and you underfit, too high and you overfit. Cross-validation is your best tool for finding the sweet spot.
  • Feature scaling becomes essential for higher-degree polynomials because terms like x⁵ can explode to enormous values, causing numerical instability and poor model performance.

Introduction to Polynomial Regression

Linear regression works beautifully when your data follows a straight line. But real-world relationships are often curved—think diminishing returns, exponential growth, or seasonal patterns. When you plot your residuals and see a clear pattern rather than random scatter, linear regression is failing you.

Polynomial regression solves this by extending the familiar y = mx + b to include higher powers of x:

y = β₀ + β₁x + β₂x² + β₃x³ + ... + βₙxⁿ

Here’s the key insight: despite the curved output, polynomial regression is still a form of linear regression. The model is linear in its coefficients (the β values). We’re just creating new features (x², x³, etc.) from our original feature. This means we can use the exact same fitting algorithms—ordinary least squares still works perfectly.

This distinction matters because it means polynomial regression inherits all the nice properties of linear regression: closed-form solutions, well-understood statistics, and fast computation. The tradeoff is that we need to choose the polynomial degree carefully.

Setting Up Your Environment

You need three libraries for polynomial regression in Python: NumPy for numerical operations, scikit-learn for the modeling pipeline, and Matplotlib for visualization.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, r2_score

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic data with a quadratic relationship
n_samples = 100
X = np.linspace(-3, 3, n_samples).reshape(-1, 1)
y_true = 0.5 * X.ravel()**3 - 2 * X.ravel()**2 + X.ravel() + 3
y = y_true + np.random.normal(0, 2, n_samples)  # Add noise

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

This generates 100 data points following a cubic relationship with added Gaussian noise. The noise simulates real-world measurement error and makes the problem more realistic. Without noise, any sufficiently high-degree polynomial would fit perfectly—not a useful test case.

Implementing Polynomial Regression with scikit-learn

The scikit-learn approach uses PolynomialFeatures to transform your input features, then fits a standard LinearRegression model on the transformed data. The cleanest way to combine these steps is with a pipeline.

# Method 1: Step by step (for understanding)
poly_features = PolynomialFeatures(degree=3, include_bias=False)
X_poly = poly_features.fit_transform(X)

print(f"Original features: {X.shape[1]}")
print(f"Polynomial features: {X_poly.shape[1]}")
print(f"Feature names: {poly_features.get_feature_names_out()}")

# Fit linear regression on polynomial features
model = LinearRegression()
model.fit(X_poly, y)

# Make predictions
y_pred = model.predict(X_poly)

print(f"\nCoefficients: {model.coef_}")
print(f"Intercept: {model.intercept_:.4f}")
print(f"R² score: {r2_score(y, y_pred):.4f}")
# Method 2: Using a pipeline (recommended for production)
poly_model = make_pipeline(
    PolynomialFeatures(degree=3, include_bias=False),
    LinearRegression()
)

poly_model.fit(X, y)
y_pred_pipeline = poly_model.predict(X)

print(f"R² score (pipeline): {r2_score(y, y_pred_pipeline):.4f}")

The pipeline approach is cleaner and prevents a common bug: forgetting to transform your test data the same way you transformed your training data. With a pipeline, calling predict() automatically applies all transformations.

Note the include_bias=False parameter. PolynomialFeatures can add a column of ones (the bias term), but LinearRegression already handles this with its fit_intercept parameter. Including both creates redundancy.

Choosing the Right Polynomial Degree

This is where most people go wrong. A degree-1 polynomial is just linear regression—it might underfit curved data. A degree-20 polynomial will fit your training data almost perfectly but will perform terribly on new data. This is the bias-variance tradeoff in action.

Cross-validation gives you an honest estimate of how well each degree will generalize:

degrees = range(1, 11)
train_scores = []
cv_scores = []
cv_stds = []

for degree in degrees:
    model = make_pipeline(
        PolynomialFeatures(degree=degree, include_bias=False),
        LinearRegression()
    )
    
    # Training score
    model.fit(X, y)
    train_pred = model.predict(X)
    train_scores.append(r2_score(y, train_pred))
    
    # Cross-validation score (5-fold)
    cv_score = cross_val_score(model, X, y, cv=5, scoring='r2')
    cv_scores.append(cv_score.mean())
    cv_stds.append(cv_score.std())

# Find best degree
best_degree = degrees[np.argmax(cv_scores)]
print(f"Best polynomial degree: {best_degree}")
print(f"Best CV R² score: {max(cv_scores):.4f}")

# Display results table
print("\nDegree | Train R² | CV R² (mean ± std)")
print("-" * 45)
for i, degree in enumerate(degrees):
    print(f"  {degree:2d}   |  {train_scores[i]:.4f}  | {cv_scores[i]:.4f} ± {cv_stds[i]:.4f}")

Watch for the classic overfitting pattern: training scores keep improving as degree increases, but cross-validation scores peak and then decline. The gap between training and CV scores is your overfitting indicator.

Visualizing Results

A picture is worth a thousand R² scores. Plotting your data with multiple polynomial fits makes the underfitting/overfitting tradeoff immediately obvious:

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

degrees_to_plot = [1, 3, 9]
X_plot = np.linspace(X.min(), X.max(), 200).reshape(-1, 1)

for ax, degree in zip(axes, degrees_to_plot):
    # Fit model
    model = make_pipeline(
        PolynomialFeatures(degree=degree, include_bias=False),
        LinearRegression()
    )
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    
    # Calculate scores
    y_pred = model.predict(X)
    r2 = r2_score(y, y_pred)
    cv_score = cross_val_score(model, X, y, cv=5, scoring='r2').mean()
    
    # Plot
    ax.scatter(X, y, alpha=0.6, label='Data', color='steelblue')
    ax.plot(X_plot, y_plot, color='red', linewidth=2, label=f'Degree {degree}')
    ax.set_xlabel('X')
    ax.set_ylabel('y')
    ax.set_title(f'Degree {degree}\nTrain R²={r2:.3f}, CV R²={cv_score:.3f}')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('polynomial_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
# Comprehensive comparison plot
plt.figure(figsize=(10, 6))

plt.scatter(X, y, alpha=0.6, label='Data', color='steelblue', s=30)

colors = plt.cm.viridis(np.linspace(0, 1, len([1, 2, 3, 5, 9])))
for degree, color in zip([1, 2, 3, 5, 9], colors):
    model = make_pipeline(
        PolynomialFeatures(degree=degree, include_bias=False),
        LinearRegression()
    )
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    plt.plot(X_plot, y_plot, color=color, linewidth=2, label=f'Degree {degree}')

plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Polynomial Regression: Comparing Different Degrees', fontsize=14)
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Practical Considerations and Pitfalls

Feature scaling is non-negotiable for high-degree polynomials. Consider a feature value of 10. Its square is 100, its cube is 1,000, and its fifth power is 100,000. These wildly different scales cause numerical instability and make optimization algorithms struggle.

from sklearn.preprocessing import StandardScaler

# Proper pipeline with scaling
robust_poly_model = make_pipeline(
    StandardScaler(),
    PolynomialFeatures(degree=5, include_bias=False),
    LinearRegression()
)

robust_poly_model.fit(X, y)
print(f"R² with scaling: {r2_score(y, robust_poly_model.predict(X)):.4f}")

Extrapolation is dangerous. Polynomial models can produce absurd predictions outside the training data range. A well-fitted cubic might predict negative values or exponential growth just beyond your data boundaries. Always be skeptical of polynomial predictions at the edges.

# Demonstrate extrapolation danger
X_extrapolate = np.linspace(-5, 5, 200).reshape(-1, 1)
model = make_pipeline(
    PolynomialFeatures(degree=5, include_bias=False),
    LinearRegression()
)
model.fit(X, y)

plt.figure(figsize=(10, 5))
plt.scatter(X, y, alpha=0.6, label='Training data')
plt.plot(X_extrapolate, model.predict(X_extrapolate), 'r-', label='Degree 5 fit')
plt.axvline(X.min(), color='gray', linestyle='--', alpha=0.5)
plt.axvline(X.max(), color='gray', linestyle='--', alpha=0.5)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Extrapolation Danger: Model Goes Wild Outside Training Range')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Know when to use alternatives. If you need a high-degree polynomial (say, degree 7+), consider these options instead:

  • Ridge or Lasso regression with polynomial features adds regularization to prevent overfitting
  • Splines provide local flexibility without the global instability of high-degree polynomials
  • Gaussian Process regression handles complex patterns with built-in uncertainty quantification

Conclusion

Polynomial regression is a powerful technique for capturing non-linear relationships while staying within the familiar linear regression framework. The workflow is straightforward:

  1. Transform features using PolynomialFeatures
  2. Fit a standard LinearRegression model
  3. Use cross-validation to select the optimal degree
  4. Scale your features for numerical stability
  5. Never trust extrapolations

For more complex scenarios, explore Ridge regression (sklearn.linear_model.Ridge) or Lasso (sklearn.linear_model.Lasso) combined with polynomial features. These add regularization that shrinks coefficients toward zero, reducing overfitting without manually limiting the polynomial degree.

The key is treating polynomial degree as a hyperparameter to be tuned, not a fixed choice. Let cross-validation guide you, visualize your results, and always question whether that high-degree polynomial is capturing real patterns or just memorizing noise.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.