NumPy - np.vectorize() Function
• `np.vectorize()` creates a vectorized function that operates element-wise on arrays, but it's primarily a convenience wrapper—not a performance optimization tool
Key Insights
• np.vectorize() creates a vectorized function that operates element-wise on arrays, but it’s primarily a convenience wrapper—not a performance optimization tool
• The function automatically handles broadcasting and type conversion, making it ideal for applying custom Python functions to NumPy arrays without explicit loops
• For performance-critical code, prefer native NumPy operations, universal functions (ufuncs), or Numba’s @vectorize decorator instead
Understanding np.vectorize()
The np.vectorize() function transforms a scalar function into a vectorized function that accepts and operates on NumPy arrays. Despite the name, it doesn’t provide true vectorization benefits—it’s essentially a for loop wrapper that simplifies syntax.
import numpy as np
# Scalar function
def celsius_to_fahrenheit(celsius):
return (celsius * 9/5) + 32
# Vectorize it
vectorized_convert = np.vectorize(celsius_to_fahrenheit)
temperatures_c = np.array([0, 10, 20, 30, 40])
temperatures_f = vectorized_convert(temperatures_c)
print(temperatures_f) # [32. 50. 68. 86. 104.]
The function iterates over input elements internally, applying the scalar function to each. This is functionally equivalent to a list comprehension but integrates seamlessly with NumPy’s array ecosystem.
Basic Usage Patterns
np.vectorize() accepts various callable objects and configuration parameters. The otypes parameter explicitly defines output data types, while excluded specifies parameters that shouldn’t be vectorized.
# Multiple input arrays
def calculate_bmi(weight_kg, height_m):
return weight_kg / (height_m ** 2)
bmi_calculator = np.vectorize(calculate_bmi)
weights = np.array([70, 85, 60, 95])
heights = np.array([1.75, 1.80, 1.65, 1.90])
bmis = bmi_calculator(weights, heights)
print(bmis) # [22.86 26.23 22.04 26.32]
# Specifying output type
def is_even(n):
return n % 2 == 0
check_even = np.vectorize(is_even, otypes=[bool])
numbers = np.array([1, 2, 3, 4, 5, 6])
print(check_even(numbers)) # [False True False True False True]
Working with Complex Return Types
np.vectorize() handles functions returning tuples, strings, or other complex types. The signature parameter enables operations on array slices rather than individual elements.
# Returning multiple values
def stats(value):
return (value, value**2, value**3)
vectorized_stats = np.vectorize(stats)
inputs = np.array([2, 3, 4])
result = vectorized_stats(inputs)
print("Original:", result[0]) # [2 3 4]
print("Squared:", result[1]) # [4 9 16]
print("Cubed:", result[2]) # [8 27 64]
# String operations
def categorize_age(age):
if age < 18:
return "minor"
elif age < 65:
return "adult"
else:
return "senior"
categorize = np.vectorize(categorize_age, otypes=[str])
ages = np.array([15, 25, 45, 70, 80])
print(categorize(ages)) # ['minor' 'adult' 'adult' 'senior' 'senior']
Excluding Parameters from Vectorization
The excluded parameter keeps specific arguments scalar across all iterations. This is useful for configuration parameters, thresholds, or constants.
def threshold_filter(value, threshold, mode='above'):
if mode == 'above':
return value if value > threshold else 0
else:
return value if value < threshold else 0
# Vectorize only the 'value' parameter
filter_func = np.vectorize(threshold_filter, excluded=['threshold', 'mode'])
data = np.array([10, 25, 30, 45, 60, 15])
filtered_above = filter_func(data, threshold=30, mode='above')
filtered_below = filter_func(data, threshold=30, mode='below')
print("Above 30:", filtered_above) # [ 0 0 0 45 60 0]
print("Below 30:", filtered_below) # [10 25 30 0 0 15]
Signature Parameter for Array Processing
The signature parameter allows vectorization over higher-dimensional subarrays, similar to generalized universal functions.
# Process rows of a 2D array
def normalize_vector(vec):
norm = np.sqrt(np.sum(vec**2))
return vec / norm if norm > 0 else vec
# Signature: each 1D array (n) maps to another 1D array (n)
normalize = np.vectorize(normalize_vector, signature='(n)->(n)')
vectors = np.array([[3, 4],
[5, 12],
[8, 15]])
normalized = normalize(vectors)
print(normalized)
# [[0.6 0.8 ]
# [0.38 0.92]
# [0.47 0.88]]
# Verify normalization
norms = np.linalg.norm(normalized, axis=1)
print(norms) # [1. 1. 1.]
Performance Considerations
np.vectorize() provides no performance advantage over Python loops. For computational efficiency, use native NumPy operations or compile functions with Numba.
import time
# Test function
def custom_operation(x):
return np.sin(x) ** 2 + np.cos(x) ** 2
data = np.random.rand(1000000)
# Method 1: np.vectorize
vectorized = np.vectorize(custom_operation)
start = time.time()
result1 = vectorized(data)
time_vectorize = time.time() - start
# Method 2: Native NumPy
start = time.time()
result2 = np.sin(data) ** 2 + np.cos(data) ** 2
time_numpy = time.time() - start
# Method 3: List comprehension
start = time.time()
result3 = np.array([custom_operation(x) for x in data])
time_loop = time.time() - start
print(f"np.vectorize: {time_vectorize:.4f}s") # ~0.85s
print(f"Native NumPy: {time_numpy:.4f}s") # ~0.02s
print(f"List comp: {time_loop:.4f}s") # ~0.90s
Native NumPy operations are 40x faster in this example. Use np.vectorize() when:
- Prototyping or working with small datasets
- The function contains complex logic unsuitable for NumPy operations
- Code readability outweighs performance requirements
Practical Applications
np.vectorize() excels when applying business logic, conditional transformations, or external function calls across arrays.
# Applying business rules
def calculate_discount(price, quantity):
if quantity >= 100:
return price * 0.80 # 20% discount
elif quantity >= 50:
return price * 0.90 # 10% discount
elif quantity >= 10:
return price * 0.95 # 5% discount
return price
discount_calc = np.vectorize(calculate_discount)
prices = np.array([100, 200, 150, 80])
quantities = np.array([5, 60, 120, 25])
final_prices = discount_calc(prices, quantities)
print(final_prices) # [100. 180. 120. 76.]
# Working with external libraries
from datetime import datetime, timedelta
def add_business_days(date_str, days):
date = datetime.strptime(date_str, '%Y-%m-%d')
current = date
added = 0
while added < days:
current += timedelta(days=1)
if current.weekday() < 5: # Monday-Friday
added += 1
return current.strftime('%Y-%m-%d')
add_days = np.vectorize(add_business_days, excluded=['days'])
dates = np.array(['2024-01-15', '2024-01-16', '2024-01-17'])
result = add_days(dates, days=5)
print(result) # ['2024-01-22' '2024-01-23' '2024-01-24']
Alternatives and Best Practices
For production code, consider these alternatives:
# Numba for numerical performance
from numba import vectorize
@vectorize(['float64(float64, float64)'])
def numba_bmi(weight, height):
return weight / (height ** 2)
# 100x+ faster than np.vectorize for large arrays
# NumPy's where for conditional logic
prices = np.array([100, 200, 150, 80])
quantities = np.array([5, 60, 120, 25])
discounts = np.where(quantities >= 100, 0.80,
np.where(quantities >= 50, 0.90,
np.where(quantities >= 10, 0.95, 1.0)))
final_prices = prices * discounts
# Array broadcasting for element-wise operations
def complex_calc(x, y, z):
return (x[:, None, None] + y[None, :, None]) * z[None, None, :]
x, y, z = np.arange(3), np.arange(4), np.arange(5)
result = complex_calc(x, y, z) # Shape: (3, 4, 5)
Use np.vectorize() as a development tool and readability enhancer, not a performance optimizer. Profile your code and migrate performance-critical sections to native NumPy operations or compiled alternatives.