Pandas GroupBy Patterns for Real-World Analysis

GroupBy is the workhorse of pandas analysis. These patterns handle the cases that basic tutorials skip.

Key Insights

  • Use .agg() with named aggregations for readable multi-column summaries
  • Transform returns same-shaped output — perfect for group-level normalization
  • apply() is a last resort; most cases are handled by agg/transform/filter

Named Aggregations

result = df.groupby("department").agg(
    avg_salary=("salary", "mean"),
    headcount=("employee_id", "count"),
    max_tenure=("start_date", lambda x: (pd.Timestamp.now() - x.min()).days)
)

Transform for Group Normalization

# Z-score within each group
df["salary_zscore"] = df.groupby("department")["salary"].transform(
    lambda x: (x - x.mean()) / x.std()
)

Filter Groups

# Keep only departments with more than 5 employees
large_depts = df.groupby("department").filter(lambda g: len(g) > 5)

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.