NumPy - np.interp() - Linear Interpolation
Linear interpolation estimates unknown values that fall between known data points by drawing straight lines between consecutive points. Given two points (x₀, y₀) and (x₁, y₁), the interpolated value...
Key Insights
np.interp()performs one-dimensional linear interpolation, finding intermediate values between known data points using straight-line segments- The function handles extrapolation automatically by returning boundary values for out-of-range inputs, though you can customize this behavior
- Understanding interpolation is critical for resampling time series, filling missing data, and mapping values between different scales in data pipelines
Understanding Linear Interpolation
Linear interpolation estimates unknown values that fall between known data points by drawing straight lines between consecutive points. Given two points (x₀, y₀) and (x₁, y₁), the interpolated value at any x between x₀ and x₁ follows the formula: y = y₀ + (x - x₀) × (y₁ - y₀) / (x₁ - x₀).
NumPy’s np.interp() automates this process across entire datasets. The function signature is:
np.interp(x, xp, fp, left=None, right=None, period=None)
Where x contains the points to interpolate, xp are known x-coordinates, and fp are corresponding y-values.
Basic Interpolation
Start with a simple example interpolating between known temperature readings:
import numpy as np
# Known data points: hours and temperatures
hours = np.array([0, 6, 12, 18, 24])
temps = np.array([15.0, 12.5, 22.0, 19.5, 16.0])
# Interpolate temperature at specific times
query_hours = np.array([3, 9, 15, 21])
interpolated_temps = np.interp(query_hours, hours, temps)
print("Query hours:", query_hours)
print("Interpolated temps:", interpolated_temps)
# Output: [13.75 17.25 20.75 17.75]
The function finds where each query point falls within the known data and calculates the intermediate value. At hour 3 (between 0 and 6), the temperature is 13.75°C, exactly halfway between 15.0 and 12.5.
Handling Extrapolation
By default, np.interp() returns boundary values for queries outside the known range:
hours = np.array([6, 12, 18])
temps = np.array([12.5, 22.0, 19.5])
# Query points outside the range
query_hours = np.array([0, 9, 24])
result = np.interp(query_hours, hours, temps)
print(result)
# Output: [12.5 17.25 19.5]
# 0 -> 12.5 (left boundary)
# 9 -> 17.25 (interpolated)
# 24 -> 19.5 (right boundary)
Customize extrapolation behavior using left and right parameters:
# Use specific values for out-of-range queries
result = np.interp(query_hours, hours, temps, left=-999, right=-999)
print(result)
# Output: [-999. 17.25 -999.]
# Use NaN for clarity
result = np.interp(query_hours, hours, temps, left=np.nan, right=np.nan)
print(result)
# Output: [nan 17.25 nan]
Resampling Time Series Data
A common use case is resampling irregularly-spaced data to regular intervals:
# Irregular sensor readings (timestamp, value)
timestamps = np.array([0.0, 1.3, 3.7, 5.2, 8.9, 10.0])
sensor_values = np.array([100, 105, 98, 102, 110, 108])
# Create regular 1-second intervals
regular_timestamps = np.arange(0, 10.1, 1.0)
resampled_values = np.interp(regular_timestamps, timestamps, sensor_values)
print("Original timestamps:", timestamps)
print("Regular timestamps:", regular_timestamps)
print("Resampled values:", resampled_values)
# Output: [100. 103.46153846 101.54166667 99.4 101.06666667
# 103.73333333 106.4 109.06666667 109.51612903 108.4 108.]
This technique is essential for synchronizing data from multiple sensors or preparing data for algorithms that require uniform sampling.
Mapping Between Scales
Use interpolation to map values from one scale to another, useful for color mapping, normalization, or custom transformations:
# Map test scores (0-100) to letter grades with custom boundaries
score_boundaries = np.array([0, 60, 70, 80, 90, 100])
grade_values = np.array([0, 1, 2, 3, 4, 5]) # F, D, C, B, A, A+
# Student scores
scores = np.array([45, 65, 75, 85, 95, 100])
grade_indices = np.interp(scores, score_boundaries, grade_values)
grade_names = ['F', 'D', 'C', 'B', 'A', 'A+']
for score, idx in zip(scores, grade_indices):
# Round to nearest grade
print(f"Score {score}: {grade_names[int(np.round(idx))]}")
# Output:
# Score 45: F
# Score 65: D
# Score 75: C
# Score 85: B
# Score 95: A
# Score 100: A+
Periodic Interpolation
The period parameter enables interpolation on circular data like angles or time-of-day:
# Sun elevation angles throughout the day (in degrees)
hours_of_day = np.array([0, 6, 12, 18, 23])
sun_elevation = np.array([0, 30, 90, 30, 5])
# Without period: treats 23 and 0 as disconnected
query = np.array([23.5])
result_no_period = np.interp(query, hours_of_day, sun_elevation)
print(f"Without period: {result_no_period}") # [5.]
# With period=24: understands 23.5 is near 0
result_with_period = np.interp(query, hours_of_day, sun_elevation, period=24)
print(f"With period: {result_with_period}") # [2.5]
This is critical for handling compass bearings, seasonal data, or any cyclical measurements where the end connects to the beginning.
Filling Missing Data
Replace NaN values in datasets using interpolation from valid neighbors:
# Dataset with missing values
data_points = np.array([0, 1, 2, 3, 4, 5, 6])
measurements = np.array([10.0, 12.0, np.nan, np.nan, 18.0, np.nan, 22.0])
# Find valid (non-NaN) indices
valid_mask = ~np.isnan(measurements)
valid_indices = data_points[valid_mask]
valid_values = measurements[valid_mask]
# Interpolate missing values
filled_measurements = np.interp(data_points, valid_indices, valid_values)
print("Original:", measurements)
print("Filled:", filled_measurements)
# Output:
# Original: [10. 12. nan nan 18. nan 22.]
# Filled: [10. 12. 14. 16. 18. 20. 22.]
Performance Considerations
np.interp() uses binary search internally, providing O(log n) lookup performance. For large-scale interpolation, vectorize queries:
import time
# Large dataset
xp = np.linspace(0, 1000, 10000)
fp = np.sin(xp) * np.exp(-xp/1000)
# Many query points
x = np.random.uniform(0, 1000, 100000)
# Single vectorized call (fast)
start = time.time()
result = np.interp(x, xp, fp)
vectorized_time = time.time() - start
# Loop-based approach (slow)
start = time.time()
result_loop = np.array([np.interp(xi, xp, fp) for xi in x])
loop_time = time.time() - start
print(f"Vectorized: {vectorized_time:.4f}s")
print(f"Loop-based: {loop_time:.4f}s")
print(f"Speedup: {loop_time/vectorized_time:.1f}x")
# Typical output: Vectorized is 50-100x faster
Limitations and Alternatives
np.interp() only handles one-dimensional data and assumes x-coordinates are monotonically increasing. For more complex scenarios:
Unsorted x-coordinates: Sort before interpolating:
xp = np.array([3, 1, 4, 2, 5])
fp = np.array([30, 10, 40, 20, 50])
# Sort by x-coordinates
sort_indices = np.argsort(xp)
xp_sorted = xp[sort_indices]
fp_sorted = fp[sort_indices]
result = np.interp([2.5], xp_sorted, fp_sorted)
print(result) # [25.]
Multi-dimensional interpolation: Use scipy.interpolate.RegularGridInterpolator or griddata:
from scipy.interpolate import interp2d
# For 2D interpolation
x = np.array([0, 1, 2])
y = np.array([0, 1, 2])
z = np.array([[0, 1, 2], [1, 2, 3], [2, 3, 4]])
f = interp2d(x, y, z, kind='linear')
result = f(0.5, 0.5)
print(result) # [[1.]]
Non-linear interpolation: Use scipy.interpolate.interp1d with kind='cubic' or other spline methods for smoother curves between points.
Practical Application: Signal Processing
Combine interpolation with other NumPy operations for real-world signal processing:
# Simulate ADC sampling at irregular intervals due to jitter
true_time = np.linspace(0, 1, 1000)
true_signal = np.sin(2 * np.pi * 5 * true_time)
# Irregular sampling with timing jitter
sample_times = np.sort(np.random.uniform(0, 1, 100))
sampled_signal = np.interp(sample_times, true_time, true_signal)
# Reconstruct on regular grid
regular_time = np.linspace(0, 1, 500)
reconstructed = np.interp(regular_time, sample_times, sampled_signal)
# Calculate reconstruction error
true_at_regular = np.interp(regular_time, true_time, true_signal)
rmse = np.sqrt(np.mean((reconstructed - true_at_regular)**2))
print(f"Reconstruction RMSE: {rmse:.6f}")
Linear interpolation provides a fast, reliable method for estimating intermediate values in numerical datasets. While limited to one dimension and straight-line segments, its simplicity and performance make it the default choice for resampling, gap-filling, and scale mapping operations in data processing pipelines.