NumPy - np.vectorize() Function | Application Architect

Key Insights

• np.vectorize() creates a vectorized function that operates element-wise on arrays, but it’s primarily a convenience wrapper—not a performance optimization tool • The function automatically handles broadcasting and type conversion, making it ideal for applying custom Python functions to NumPy arrays without explicit loops • For performance-critical code, prefer native NumPy operations, universal functions (ufuncs), or Numba’s @vectorize decorator instead

Understanding np.vectorize()

The np.vectorize() function transforms a scalar function into a vectorized function that accepts and operates on NumPy arrays. Despite the name, it doesn’t provide true vectorization benefits—it’s essentially a for loop wrapper that simplifies syntax.

import numpy as np

# Scalar function
def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

# Vectorize it
vectorized_convert = np.vectorize(celsius_to_fahrenheit)

temperatures_c = np.array([0, 10, 20, 30, 40])
temperatures_f = vectorized_convert(temperatures_c)

print(temperatures_f)  # [32. 50. 68. 86. 104.]

The function iterates over input elements internally, applying the scalar function to each. This is functionally equivalent to a list comprehension but integrates seamlessly with NumPy’s array ecosystem.

Basic Usage Patterns

np.vectorize() accepts various callable objects and configuration parameters. The otypes parameter explicitly defines output data types, while excluded specifies parameters that shouldn’t be vectorized.

# Multiple input arrays
def calculate_bmi(weight_kg, height_m):
    return weight_kg / (height_m ** 2)

bmi_calculator = np.vectorize(calculate_bmi)

weights = np.array([70, 85, 60, 95])
heights = np.array([1.75, 1.80, 1.65, 1.90])

bmis = bmi_calculator(weights, heights)
print(bmis)  # [22.86 26.23 22.04 26.32]

# Specifying output type
def is_even(n):
    return n % 2 == 0

check_even = np.vectorize(is_even, otypes=[bool])
numbers = np.array([1, 2, 3, 4, 5, 6])
print(check_even(numbers))  # [False  True False  True False  True]

Working with Complex Return Types

np.vectorize() handles functions returning tuples, strings, or other complex types. The signature parameter enables operations on array slices rather than individual elements.

# Returning multiple values
def stats(value):
    return (value, value**2, value**3)

vectorized_stats = np.vectorize(stats)
inputs = np.array([2, 3, 4])
result = vectorized_stats(inputs)

print("Original:", result[0])  # [2 3 4]
print("Squared:", result[1])   # [4 9 16]
print("Cubed:", result[2])     # [8 27 64]

# String operations
def categorize_age(age):
    if age < 18:
        return "minor"
    elif age < 65:
        return "adult"
    else:
        return "senior"

categorize = np.vectorize(categorize_age, otypes=[str])
ages = np.array([15, 25, 45, 70, 80])
print(categorize(ages))  # ['minor' 'adult' 'adult' 'senior' 'senior']

Excluding Parameters from Vectorization

The excluded parameter keeps specific arguments scalar across all iterations. This is useful for configuration parameters, thresholds, or constants.

def threshold_filter(value, threshold, mode='above'):
    if mode == 'above':
        return value if value > threshold else 0
    else:
        return value if value < threshold else 0

# Vectorize only the 'value' parameter
filter_func = np.vectorize(threshold_filter, excluded=['threshold', 'mode'])

data = np.array([10, 25, 30, 45, 60, 15])
filtered_above = filter_func(data, threshold=30, mode='above')
filtered_below = filter_func(data, threshold=30, mode='below')

print("Above 30:", filtered_above)  # [ 0  0  0 45 60  0]
print("Below 30:", filtered_below)  # [10 25 30  0  0 15]

Signature Parameter for Array Processing

The signature parameter allows vectorization over higher-dimensional subarrays, similar to generalized universal functions.

# Process rows of a 2D array
def normalize_vector(vec):
    norm = np.sqrt(np.sum(vec**2))
    return vec / norm if norm > 0 else vec

# Signature: each 1D array (n) maps to another 1D array (n)
normalize = np.vectorize(normalize_vector, signature='(n)->(n)')

vectors = np.array([[3, 4],
                    [5, 12],
                    [8, 15]])

normalized = normalize(vectors)
print(normalized)
# [[0.6  0.8 ]
#  [0.38 0.92]
#  [0.47 0.88]]

# Verify normalization
norms = np.linalg.norm(normalized, axis=1)
print(norms)  # [1. 1. 1.]

Performance Considerations

np.vectorize() provides no performance advantage over Python loops. For computational efficiency, use native NumPy operations or compile functions with Numba.

import time

# Test function
def custom_operation(x):
    return np.sin(x) ** 2 + np.cos(x) ** 2

data = np.random.rand(1000000)

# Method 1: np.vectorize
vectorized = np.vectorize(custom_operation)
start = time.time()
result1 = vectorized(data)
time_vectorize = time.time() - start

# Method 2: Native NumPy
start = time.time()
result2 = np.sin(data) ** 2 + np.cos(data) ** 2
time_numpy = time.time() - start

# Method 3: List comprehension
start = time.time()
result3 = np.array([custom_operation(x) for x in data])
time_loop = time.time() - start

print(f"np.vectorize: {time_vectorize:.4f}s")  # ~0.85s
print(f"Native NumPy: {time_numpy:.4f}s")      # ~0.02s
print(f"List comp:    {time_loop:.4f}s")       # ~0.90s

Native NumPy operations are 40x faster in this example. Use np.vectorize() when:

Prototyping or working with small datasets
The function contains complex logic unsuitable for NumPy operations
Code readability outweighs performance requirements

Practical Applications

np.vectorize() excels when applying business logic, conditional transformations, or external function calls across arrays.

# Applying business rules
def calculate_discount(price, quantity):
    if quantity >= 100:
        return price * 0.80  # 20% discount
    elif quantity >= 50:
        return price * 0.90  # 10% discount
    elif quantity >= 10:
        return price * 0.95  # 5% discount
    return price

discount_calc = np.vectorize(calculate_discount)

prices = np.array([100, 200, 150, 80])
quantities = np.array([5, 60, 120, 25])

final_prices = discount_calc(prices, quantities)
print(final_prices)  # [100. 180. 120.  76.]

# Working with external libraries
from datetime import datetime, timedelta

def add_business_days(date_str, days):
    date = datetime.strptime(date_str, '%Y-%m-%d')
    current = date
    added = 0
    while added < days:
        current += timedelta(days=1)
        if current.weekday() < 5:  # Monday-Friday
            added += 1
    return current.strftime('%Y-%m-%d')

add_days = np.vectorize(add_business_days, excluded=['days'])

dates = np.array(['2024-01-15', '2024-01-16', '2024-01-17'])
result = add_days(dates, days=5)
print(result)  # ['2024-01-22' '2024-01-23' '2024-01-24']

Alternatives and Best Practices

For production code, consider these alternatives:

# Numba for numerical performance
from numba import vectorize

@vectorize(['float64(float64, float64)'])
def numba_bmi(weight, height):
    return weight / (height ** 2)

# 100x+ faster than np.vectorize for large arrays

# NumPy's where for conditional logic
prices = np.array([100, 200, 150, 80])
quantities = np.array([5, 60, 120, 25])

discounts = np.where(quantities >= 100, 0.80,
            np.where(quantities >= 50, 0.90,
            np.where(quantities >= 10, 0.95, 1.0)))

final_prices = prices * discounts

# Array broadcasting for element-wise operations
def complex_calc(x, y, z):
    return (x[:, None, None] + y[None, :, None]) * z[None, None, :]

x, y, z = np.arange(3), np.arange(4), np.arange(5)
result = complex_calc(x, y, z)  # Shape: (3, 4, 5)

Use np.vectorize() as a development tool and readability enhancer, not a performance optimizer. Profile your code and migrate performance-critical sections to native NumPy operations or compiled alternatives.