How to Use Agg with Multiple Functions in Pandas

Pandas provides convenient single-function aggregation methods like `sum()`, `mean()`, and `max()`. They work fine when you need one statistic. But real-world data analysis rarely stops at a single...

Key Insights

  • The agg() method lets you apply multiple aggregation functions in a single operation, eliminating repetitive code and improving performance over chained single-function calls.
  • Using dictionaries with agg() gives you precise control over which functions apply to which columns, while named aggregations produce clean, readable output without post-processing.
  • Always flatten multi-level column indices after complex aggregations to maintain compatibility with downstream operations and avoid cryptic column access patterns.

Introduction to the Agg Method

Pandas provides convenient single-function aggregation methods like sum(), mean(), and max(). They work fine when you need one statistic. But real-world data analysis rarely stops at a single number. You typically want the sum and the mean and the standard deviation—all at once.

This is where agg() shines. Short for “aggregate,” this method accepts multiple functions and applies them in a single pass over your data. Instead of writing three separate lines and potentially iterating through your DataFrame multiple times, you declare everything upfront and let Pandas optimize the execution.

The method works on Series, DataFrames, and GroupBy objects, making it one of the most versatile tools in the Pandas arsenal. Once you internalize its patterns, you’ll find yourself reaching for it constantly.

Basic Syntax: Applying Multiple Functions to a Single Column

The simplest use case involves applying several aggregation functions to a single column. Pass a list of function names as strings, and Pandas returns a Series with each result labeled by its function name.

import pandas as pd
import numpy as np

# Sample sales data
df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Widget', 'Gadget', 'Widget'],
    'sales': [150, 200, 175, 225, 160],
    'quantity': [10, 15, 12, 18, 11]
})

# Apply multiple functions to a single column
sales_stats = df['sales'].agg(['sum', 'mean', 'max', 'min', 'std'])
print(sales_stats)

Output:

sum     910.000000
mean    182.000000
max     225.000000
min     150.000000
std      30.822070
Name: sales, dtype: float64

You can also apply these functions to an entire DataFrame. Each column gets each function applied:

# Apply to all numeric columns
full_stats = df[['sales', 'quantity']].agg(['sum', 'mean', 'std'])
print(full_stats)

Output:

        sales   quantity
sum    910.00      66.00
mean   182.00      13.20
std     30.82       3.27

This produces a DataFrame where rows represent functions and columns represent your original columns. It’s a compact summary that would require multiple method calls otherwise.

Applying Different Functions to Different Columns

Real datasets have columns with different meanings. Summing quantities makes sense; summing product IDs doesn’t. The dictionary syntax lets you specify exactly which functions apply to which columns.

# Different functions for different columns
summary = df.agg({
    'sales': 'sum',
    'quantity': ['mean', 'max', 'min']
})
print(summary)

Output:

      sales  quantity
sum   910.0       NaN
mean    NaN      13.2
max     NaN      18.0
min     NaN      11.0

Notice the NaN values—they appear because ‘sum’ wasn’t requested for ‘quantity’ and vice versa. This sparse output can be awkward, but it accurately reflects your request.

For cleaner results when columns have different function sets, consider separate aggregations or use the named aggregation syntax covered later:

# More practical: aggregate with consistent functions per column
result = df.agg({
    'sales': ['sum', 'mean', 'std'],
    'quantity': ['sum', 'mean', 'std']
})
print(result)

This produces a complete matrix without NaN values, since both columns receive the same functions.

Using Custom Functions with Agg

Built-in functions cover common cases, but you’ll often need custom calculations. The agg() method accepts lambda functions, user-defined functions, and NumPy functions interchangeably.

# Custom function for range calculation
def value_range(x):
    return x.max() - x.min()

# Lambda for 90th percentile
percentile_90 = lambda x: np.percentile(x, 90)

# Mix built-in and custom functions
custom_stats = df['sales'].agg([
    'mean',
    'std',
    value_range,
    percentile_90
])
print(custom_stats)

Output:

mean             182.0
std               30.8
value_range       75.0
percentile_90    212.0
Name: sales, dtype: float64

Pandas uses the function’s __name__ attribute for labeling. Lambda functions get labeled as <lambda>, which becomes problematic with multiple lambdas. Name your lambdas explicitly or use regular functions:

# Named lambda for clarity
percentile_90.__name__ = 'p90'

# Or define as a proper function
def p90(x):
    """90th percentile"""
    return np.percentile(x, 90)

def p10(x):
    """10th percentile"""
    return np.percentile(x, 10)

# Now the output is readable
percentile_stats = df['sales'].agg([p10, 'median', p90])
print(percentile_stats)

Custom functions receive the entire column (or group) as a Series, giving you full flexibility. Just ensure your function returns a scalar value for aggregation contexts.

Combining Agg with GroupBy

The real power of agg() emerges when combined with groupby(). This combination lets you compute multiple statistics for each group in a single, readable operation.

# Expanded dataset
df = pd.DataFrame({
    'region': ['North', 'South', 'North', 'South', 'North', 'East', 'East'],
    'category': ['Electronics', 'Electronics', 'Clothing', 'Clothing', 'Electronics', 'Clothing', 'Electronics'],
    'revenue': [1200, 1500, 800, 950, 1100, 700, 1300],
    'orders': [10, 12, 8, 9, 11, 7, 13]
})

# Group by region and aggregate
regional_summary = df.groupby('region').agg({
    'revenue': ['sum', 'mean', 'std'],
    'orders': ['sum', 'count']
})
print(regional_summary)

Output:

        revenue                    orders      
            sum         mean         std    sum count
region                                              
East       2000  1000.000000  424.264069     20     2
North      3100  1033.333333  208.166600     29     3
South      2450  1225.000000  388.908730     21     2

This produces a hierarchical column index (MultiIndex), with the original column name at the top level and function names below. It’s comprehensive but can be unwieldy for downstream processing.

You can also group by multiple columns:

# Multi-level grouping
detailed = df.groupby(['region', 'category']).agg({
    'revenue': ['sum', 'mean'],
    'orders': 'sum'
})
print(detailed)

This creates both a row MultiIndex (for groups) and a column MultiIndex (for aggregations), giving you a complete cross-tabulation of your data.

Renaming Aggregated Columns

The MultiIndex columns from grouped aggregations are technically correct but annoying to work with. You access them with tuples like df[('revenue', 'sum')], which is verbose and error-prone.

Named aggregations solve this elegantly. Use keyword arguments where each key becomes the output column name:

# Named aggregations for clean output
clean_summary = df.groupby('region').agg(
    total_revenue=('revenue', 'sum'),
    avg_revenue=('revenue', 'mean'),
    revenue_std=('revenue', 'std'),
    total_orders=('orders', 'sum'),
    order_count=('orders', 'count')
)
print(clean_summary)

Output:

        total_revenue  avg_revenue  revenue_std  total_orders  order_count
region                                                                     
East             2000  1000.000000   424.264069            20            2
North            3100  1033.333333   208.166600            29            3
South            2450  1225.000000   388.908730            21            2

The output has flat, descriptive column names. No tuple indexing required. This syntax uses tuples of (column, function) as values, with your desired output name as the keyword.

For custom functions with named aggregations:

def coefficient_of_variation(x):
    return x.std() / x.mean() * 100

cv_summary = df.groupby('region').agg(
    total_revenue=('revenue', 'sum'),
    revenue_cv=('revenue', coefficient_of_variation)
)
print(cv_summary)

This approach produces production-ready DataFrames that integrate cleanly with visualization libraries, export functions, and downstream analysis.

Practical Tips and Common Pitfalls

Handling NaN Values

By default, aggregation functions skip NaN values. This is usually what you want, but be aware of it:

df_with_nan = df.copy()
df_with_nan.loc[0, 'revenue'] = np.nan

# NaN is excluded from calculations
print(df_with_nan.groupby('region')['revenue'].agg(['sum', 'mean', 'count']))

The count reflects only non-null values. If you need to count all rows regardless of nulls, use 'size' instead of 'count'.

Flattening Multi-Level Column Indices

When you can’t use named aggregations (perhaps you’re working with legacy code or dynamic function lists), flatten the MultiIndex manually:

# Create aggregation with MultiIndex columns
result = df.groupby('region').agg({
    'revenue': ['sum', 'mean'],
    'orders': ['sum', 'count']
})

# Flatten column names
result.columns = ['_'.join(col).strip() for col in result.columns.values]
print(result.columns)

Output:

Index(['revenue_sum', 'revenue_mean', 'orders_sum', 'orders_count'], dtype='object')

Now you can access columns normally: result['revenue_sum'].

Performance Considerations

Using agg() with multiple functions is faster than calling each aggregation separately because Pandas can optimize the iteration. However, custom Python functions are slower than built-in aggregations. When performance matters:

# Prefer string names over function references
df.agg(['sum', 'mean'])  # Faster: uses optimized C code

# Avoid when possible
df.agg([np.sum, np.mean])  # Slower: Python function overhead

For large datasets, consider whether you truly need all requested statistics. Each additional function adds computation time, and some (like std) are more expensive than others (like sum).

Resetting the Index

Grouped aggregations produce the grouping columns as an index. For most downstream work, you’ll want them as regular columns:

final_result = df.groupby('region').agg(
    total_revenue=('revenue', 'sum')
).reset_index()

print(final_result)

This gives you a clean DataFrame ready for merging, exporting, or further analysis.

The agg() method transforms verbose, repetitive aggregation code into declarative specifications of what you want. Master its patterns, and you’ll write cleaner Pandas code that’s easier to read, maintain, and extend.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.