Pandas - Apply Lambda Function to Column

• Lambda functions with `apply()` provide a concise way to transform DataFrame columns without writing separate function definitions, ideal for simple operations like string manipulation,...

Key Insights

• Lambda functions with apply() provide a concise way to transform DataFrame columns without writing separate function definitions, ideal for simple operations like string manipulation, mathematical transformations, and conditional logic. • Use apply() for row-wise operations, map() for Series element-wise transformations, and vectorized operations when performance matters—lambda with apply() is slower than vectorized alternatives but more readable for complex logic. • Lambda functions can access multiple columns simultaneously using axis=1, enabling sophisticated transformations that depend on relationships between different fields in your dataset.

Basic Lambda Application on Single Column

The apply() method combined with lambda functions offers a straightforward approach to transform DataFrame columns. Here’s the fundamental pattern:

import pandas as pd

df = pd.DataFrame({
    'price': [100, 250, 175, 300, 425],
    'quantity': [2, 1, 3, 2, 1]
})

# Apply lambda to calculate 20% discount
df['discounted_price'] = df['price'].apply(lambda x: x * 0.8)

print(df)

Output:

   price  quantity  discounted_price
0    100         2              80.0
1    250         1             200.0
2    175         3             140.0
3    300         2             240.0
4    425         1             340.0

The lambda function receives each element from the price column as x and returns the transformed value. Pandas automatically creates a new Series with the results.

String Transformations

Lambda functions excel at string manipulation tasks that would otherwise require verbose code:

df = pd.DataFrame({
    'name': ['john doe', 'jane smith', 'bob wilson'],
    'email': ['john@example.com', 'jane@example.com', 'bob@example.com']
})

# Capitalize names
df['name_formatted'] = df['name'].apply(lambda x: x.title())

# Extract email domain
df['domain'] = df['email'].apply(lambda x: x.split('@')[1])

# Create username from email
df['username'] = df['email'].apply(lambda x: x.split('@')[0].upper())

print(df)

Output:

          name               email name_formatted    domain username
0     john doe  john@example.com      John Doe  example.com     JOHN
1   jane smith  jane@example.com    Jane Smith  example.com     JANE
2   bob wilson   bob@example.com    Bob Wilson  example.com      BOB

Conditional Logic with Lambda

Lambda functions support inline conditional expressions using Python’s ternary operator:

df = pd.DataFrame({
    'temperature': [15, 25, 30, 10, 35],
    'humidity': [60, 70, 80, 50, 85]
})

# Categorize temperature
df['temp_category'] = df['temperature'].apply(
    lambda x: 'Hot' if x > 28 else ('Warm' if x > 20 else 'Cold')
)

# Flag high humidity
df['high_humidity'] = df['humidity'].apply(lambda x: True if x > 75 else False)

print(df)

Output:

   temperature  humidity temp_category  high_humidity
0           15        60          Cold          False
1           25        70          Warm          False
2           30        80           Hot           True
3           10        50          Cold          False
4           35        85           Hot           True

Multi-Column Operations with axis=1

Access multiple columns simultaneously by setting axis=1, which passes entire rows to the lambda function:

df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D'],
    'cost': [50, 75, 100, 125],
    'selling_price': [80, 100, 140, 150],
    'units_sold': [100, 150, 80, 120]
})

# Calculate profit margin percentage
df['profit_margin'] = df.apply(
    lambda row: ((row['selling_price'] - row['cost']) / row['selling_price']) * 100,
    axis=1
)

# Calculate total revenue
df['revenue'] = df.apply(
    lambda row: row['selling_price'] * row['units_sold'],
    axis=1
)

# Create status based on multiple conditions
df['status'] = df.apply(
    lambda row: 'High Performer' if row['profit_margin'] > 30 and row['units_sold'] > 100 else 'Standard',
    axis=1
)

print(df.round(2))

Output:

  product  cost  selling_price  units_sold  profit_margin  revenue         status
0       A    50             80         100          37.50     8000  High Performer
1       B    75            100         150          25.00    15000  High Performer
2       C   100            140          80          28.57    11200        Standard
3       D   125            150         120          16.67    18000        Standard

Lambda with External Functions

Combine lambda functions with external libraries or custom functions for complex transformations:

import numpy as np
from datetime import datetime

df = pd.DataFrame({
    'date_string': ['2024-01-15', '2024-02-20', '2024-03-10'],
    'values': [10, -5, 15],
    'scores': [85, 92, 78]
})

# Parse dates
df['parsed_date'] = df['date_string'].apply(
    lambda x: datetime.strptime(x, '%Y-%m-%d')
)

# Apply numpy function
df['abs_values'] = df['values'].apply(lambda x: np.abs(x))

# Custom function with lambda wrapper
def calculate_grade(score):
    if score >= 90: return 'A'
    elif score >= 80: return 'B'
    else: return 'C'

df['grade'] = df['scores'].apply(lambda x: calculate_grade(x))

print(df)

Output:

  date_string  values  scores parsed_date  abs_values grade
0  2024-01-15      10      85  2024-01-15          10     B
1  2024-02-20      -5      92  2024-02-20           5     A
2  2024-03-10      15      78  2024-03-10          15     C

Handling None and NaN Values

Lambda functions need explicit handling for missing data:

df = pd.DataFrame({
    'values': [10, None, 25, np.nan, 30],
    'names': ['Alice', None, 'Bob', 'Charlie', None]
})

# Safe mathematical operation
df['doubled'] = df['values'].apply(
    lambda x: x * 2 if pd.notna(x) else 0
)

# Safe string operation
df['name_length'] = df['names'].apply(
    lambda x: len(x) if pd.notna(x) else 0
)

print(df)

Output:

   values     names  doubled  name_length
0    10.0     Alice     20.0            5
1     NaN      None      0.0            0
2    25.0       Bob     50.0            3
3     NaN   Charlie      0.0            7
4    30.0      None     60.0            0

Performance Considerations

Lambda with apply() is convenient but slower than vectorized operations. Compare approaches:

import time

df = pd.DataFrame({
    'values': range(100000)
})

# Lambda approach
start = time.time()
df['lambda_result'] = df['values'].apply(lambda x: x * 2 + 10)
lambda_time = time.time() - start

# Vectorized approach
start = time.time()
df['vectorized_result'] = df['values'] * 2 + 10
vectorized_time = time.time() - start

print(f"Lambda time: {lambda_time:.4f}s")
print(f"Vectorized time: {vectorized_time:.4f}s")
print(f"Speedup: {lambda_time/vectorized_time:.1f}x")

Use vectorized operations when possible. Reserve lambda functions for:

  • Complex logic that can’t be vectorized
  • String operations requiring method chaining
  • Conditional transformations with multiple branches
  • Operations requiring external function calls

Common Patterns and Best Practices

df = pd.DataFrame({
    'text': ['hello world', 'PYTHON pandas', 'Data Science'],
    'numbers': [1, 2, 3]
})

# Chain multiple string methods
df['cleaned'] = df['text'].apply(
    lambda x: x.lower().strip().replace(' ', '_')
)

# Type conversion with error handling
df['safe_int'] = df['numbers'].apply(
    lambda x: int(x) if isinstance(x, (int, float)) else 0
)

# Complex extraction
df['word_count'] = df['text'].apply(lambda x: len(x.split()))

print(df)

For better readability with complex logic, define named functions instead of cramming everything into lambda:

def process_text(text):
    text = text.lower()
    text = text.strip()
    return text.replace(' ', '_')

df['processed'] = df['text'].apply(process_text)

Lambda functions with apply() strike a balance between code brevity and functionality. Use them judiciously, understanding the performance trade-offs, and switch to vectorized operations or named functions when lambda expressions become unwieldy.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.