How to Create a Stacked Bar Chart in ggplot2
Stacked bar charts display categorical data where each bar represents a total divided into segments. They answer two questions simultaneously: 'What's the total for each category?' and 'How is that...
Key Insights
- Stacked bar charts work best when comparing totals across categories while showing composition, but fail when precise comparison of individual segments matters—use grouped bars or facets instead
- Data must be in long format with one row per category-subcategory combination; use
pivot_longer()to reshape wide data before plotting - Choose
position = "stack"for absolute values orposition = "fill"for proportions, but never mix more than 5-6 segments or the chart becomes unreadable
Understanding Stacked Bar Charts
Stacked bar charts display categorical data where each bar represents a total divided into segments. They answer two questions simultaneously: “What’s the total for each category?” and “How is that total composed?” This makes them ideal for budget breakdowns showing department spending across quarters, survey responses comparing agreement levels across demographic groups, or sales data revealing product mix by region.
The key limitation: stacked bars only allow easy comparison of the bottom segment and totals. Middle segments become difficult to compare because they don’t share a common baseline. If your primary goal is comparing individual segments across categories, use grouped bars instead.
Building a Basic Stacked Bar Chart
Let’s start with sales data across product categories and regions. First, load the necessary packages:
library(ggplot2)
library(dplyr)
library(tidyr)
# Create sample sales data
sales_data <- data.frame(
category = rep(c("Electronics", "Clothing", "Home Goods", "Sports"), each = 3),
region = rep(c("North", "South", "West"), times = 4),
sales = c(45000, 38000, 52000, # Electronics
32000, 28000, 35000, # Clothing
18000, 22000, 19000, # Home Goods
25000, 31000, 27000) # Sports
)
The basic stacked bar chart uses geom_bar() or geom_col() with the fill aesthetic mapped to your subcategory:
ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
geom_col(position = "stack") +
labs(title = "Sales by Category and Region",
x = "Product Category",
y = "Sales ($)",
fill = "Region") +
theme_minimal()
This creates bars where each category shows total sales with colored segments representing regional contributions. The position = "stack" is actually the default for geom_col(), but I specify it explicitly for clarity.
Getting Your Data Structure Right
ggplot2 requires data in long format: one row per observation with separate columns for category, subcategory, and value. Many datasets start in wide format where subcategories are columns.
Here’s how to transform wide data:
# Wide format data (common in spreadsheets)
sales_wide <- data.frame(
category = c("Electronics", "Clothing", "Home Goods", "Sports"),
North = c(45000, 32000, 18000, 25000),
South = c(38000, 28000, 22000, 31000),
West = c(52000, 35000, 19000, 27000)
)
# Convert to long format
sales_long <- sales_wide %>%
pivot_longer(cols = c(North, South, West),
names_to = "region",
values_to = "sales")
# Now ready for ggplot2
head(sales_long)
If you need to aggregate data first, use dplyr:
# Aggregate from transaction-level data
transaction_data <- data.frame(
category = sample(c("Electronics", "Clothing", "Home Goods"), 1000, replace = TRUE),
region = sample(c("North", "South", "West"), 1000, replace = TRUE),
amount = runif(1000, 10, 500)
)
aggregated_sales <- transaction_data %>%
group_by(category, region) %>%
summarise(total_sales = sum(amount), .groups = "drop")
Creating Proportional Stacked Bars
Absolute values can be misleading when totals vary dramatically. Proportional stacked bars show each segment as a percentage of the total, making composition easier to compare:
ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Regional Sales Composition by Category",
x = "Product Category",
y = "Percentage of Sales",
fill = "Region") +
theme_minimal()
The position = "fill" normalizes each bar to height 1.0, and scales::percent_format() converts the y-axis to percentages. This reveals that while Electronics has the highest absolute sales, its regional distribution differs from other categories.
You can also manually calculate percentages for more control:
sales_pct <- sales_data %>%
group_by(category) %>%
mutate(percentage = sales / sum(sales) * 100) %>%
ungroup()
ggplot(sales_pct, aes(x = category, y = percentage, fill = region)) +
geom_col(position = "stack") +
labs(y = "Percentage (%)")
Customizing for Clarity
Default colors and styling rarely produce publication-ready charts. Here’s how to improve readability:
# Define custom color palette
region_colors <- c("North" = "#2E86AB", "South" = "#A23B72", "West" = "#F18F01")
# Reorder categories by total sales
sales_data <- sales_data %>%
group_by(category) %>%
mutate(total = sum(sales)) %>%
ungroup() %>%
mutate(category = forcats::fct_reorder(category, total))
# Create polished chart
ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
geom_col(position = "stack", width = 0.7) +
geom_text(aes(label = scales::comma(sales)),
position = position_stack(vjust = 0.5),
color = "white",
fontface = "bold",
size = 3.5) +
scale_fill_manual(values = region_colors) +
scale_y_continuous(labels = scales::comma_format(),
expand = expansion(mult = c(0, 0.05))) +
labs(title = "Regional Sales Performance by Category",
subtitle = "Total sales in USD, sorted by category total",
x = NULL,
y = "Sales ($)",
fill = "Region",
caption = "Data: 2024 Sales Report") +
theme_minimal(base_size = 12) +
theme(panel.grid.major.x = element_blank(),
legend.position = "top",
plot.title = element_text(face = "bold", size = 14))
The geom_text() with position_stack(vjust = 0.5) centers labels within each segment. Use scales::comma() for thousands separators and adjust size based on your segment heights.
Advanced Techniques
Faceting for complex comparisons:
# Add time dimension
sales_quarterly <- sales_data %>%
slice(rep(1:n(), 4)) %>%
mutate(quarter = rep(paste0("Q", 1:4), each = nrow(sales_data)),
sales = sales * runif(n(), 0.8, 1.2))
ggplot(sales_quarterly, aes(x = category, y = sales, fill = region)) +
geom_col(position = "fill") +
facet_wrap(~quarter, nrow = 1) +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Controlling stack order:
By default, ggplot2 stacks in alphabetical order. Control this with factor levels:
sales_data <- sales_data %>%
mutate(region = factor(region, levels = c("West", "North", "South")))
# Now West appears at the bottom of each stack
Handling negative values:
Stacked bars with negative values require careful handling. Split positive and negative values:
profit_data <- data.frame(
department = rep(c("Sales", "Marketing", "Operations"), each = 2),
category = rep(c("Revenue", "Costs"), times = 3),
amount = c(500000, -300000, 200000, -180000, 150000, -140000)
)
ggplot(profit_data, aes(x = department, y = amount, fill = category)) +
geom_col(position = "stack") +
geom_hline(yintercept = 0, linewidth = 0.5) +
scale_y_continuous(labels = scales::comma_format()) +
theme_minimal()
Common Pitfalls and Best Practices
When NOT to use stacked bars:
- More than 5-6 segments make colors indistinguishable
- When precise comparison of middle segments matters (use grouped bars)
- When showing trends over many time periods (use line charts)
- With highly variable totals (consider proportional bars or separate charts)
Accessibility considerations:
Use colorblind-friendly palettes like viridis or ensure sufficient contrast:
library(viridis)
ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
geom_col(position = "stack") +
scale_fill_viridis_d(option = "plasma", end = 0.9) +
theme_minimal()
Always include a legend and consider adding direct labels for critical values.
Ordering matters:
Place the most important segment at the bottom for easiest comparison. If showing time-series data, maintain consistent segment order across all bars.
Alternative approaches:
For complex data, consider:
- Grouped bars for easier segment comparison
- Small multiples (faceting) to reduce segments per chart
- Treemaps for hierarchical data with many categories
- Slope charts for before/after comparisons
Stacked bar charts excel at showing part-to-whole relationships, but they’re not universal. Choose them when totals matter and you need to show composition, but be ready to switch approaches when precise comparison of individual segments becomes the priority.