Pandas GroupBy Patterns for Real-World Analysis
GroupBy is the workhorse of pandas analysis. These patterns handle the cases that basic tutorials skip.
Key Insights
- Use .agg() with named aggregations for readable multi-column summaries
- Transform returns same-shaped output — perfect for group-level normalization
- apply() is a last resort; most cases are handled by agg/transform/filter
Named Aggregations
result = df.groupby("department").agg(
avg_salary=("salary", "mean"),
headcount=("employee_id", "count"),
max_tenure=("start_date", lambda x: (pd.Timestamp.now() - x.min()).days)
)
Transform for Group Normalization
# Z-score within each group
df["salary_zscore"] = df.groupby("department")["salary"].transform(
lambda x: (x - x.mean()) / x.std()
)
Filter Groups
# Keep only departments with more than 5 employees
large_depts = df.groupby("department").filter(lambda g: len(g) > 5)