How to Create a Stacked Bar Chart in ggplot2

Key Insights

Stacked bar charts work best when comparing totals across categories while showing composition, but fail when precise comparison of individual segments matters—use grouped bars or facets instead
Data must be in long format with one row per category-subcategory combination; use pivot_longer() to reshape wide data before plotting
Choose position = "stack" for absolute values or position = "fill" for proportions, but never mix more than 5-6 segments or the chart becomes unreadable

Understanding Stacked Bar Charts

Stacked bar charts display categorical data where each bar represents a total divided into segments. They answer two questions simultaneously: “What’s the total for each category?” and “How is that total composed?” This makes them ideal for budget breakdowns showing department spending across quarters, survey responses comparing agreement levels across demographic groups, or sales data revealing product mix by region.

The key limitation: stacked bars only allow easy comparison of the bottom segment and totals. Middle segments become difficult to compare because they don’t share a common baseline. If your primary goal is comparing individual segments across categories, use grouped bars instead.

Building a Basic Stacked Bar Chart

Let’s start with sales data across product categories and regions. First, load the necessary packages:

library(ggplot2)
library(dplyr)
library(tidyr)

# Create sample sales data
sales_data <- data.frame(
  category = rep(c("Electronics", "Clothing", "Home Goods", "Sports"), each = 3),
  region = rep(c("North", "South", "West"), times = 4),
  sales = c(45000, 38000, 52000,  # Electronics
            32000, 28000, 35000,  # Clothing
            18000, 22000, 19000,  # Home Goods
            25000, 31000, 27000)  # Sports
)

The basic stacked bar chart uses geom_bar() or geom_col() with the fill aesthetic mapped to your subcategory:

ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
  geom_col(position = "stack") +
  labs(title = "Sales by Category and Region",
       x = "Product Category",
       y = "Sales ($)",
       fill = "Region") +
  theme_minimal()

This creates bars where each category shows total sales with colored segments representing regional contributions. The position = "stack" is actually the default for geom_col(), but I specify it explicitly for clarity.

Getting Your Data Structure Right

ggplot2 requires data in long format: one row per observation with separate columns for category, subcategory, and value. Many datasets start in wide format where subcategories are columns.

Here’s how to transform wide data:

# Wide format data (common in spreadsheets)
sales_wide <- data.frame(
  category = c("Electronics", "Clothing", "Home Goods", "Sports"),
  North = c(45000, 32000, 18000, 25000),
  South = c(38000, 28000, 22000, 31000),
  West = c(52000, 35000, 19000, 27000)
)

# Convert to long format
sales_long <- sales_wide %>%
  pivot_longer(cols = c(North, South, West),
               names_to = "region",
               values_to = "sales")

# Now ready for ggplot2
head(sales_long)

If you need to aggregate data first, use dplyr:

# Aggregate from transaction-level data
transaction_data <- data.frame(
  category = sample(c("Electronics", "Clothing", "Home Goods"), 1000, replace = TRUE),
  region = sample(c("North", "South", "West"), 1000, replace = TRUE),
  amount = runif(1000, 10, 500)
)

aggregated_sales <- transaction_data %>%
  group_by(category, region) %>%
  summarise(total_sales = sum(amount), .groups = "drop")

Creating Proportional Stacked Bars

Absolute values can be misleading when totals vary dramatically. Proportional stacked bars show each segment as a percentage of the total, making composition easier to compare:

ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Regional Sales Composition by Category",
       x = "Product Category",
       y = "Percentage of Sales",
       fill = "Region") +
  theme_minimal()

The position = "fill" normalizes each bar to height 1.0, and scales::percent_format() converts the y-axis to percentages. This reveals that while Electronics has the highest absolute sales, its regional distribution differs from other categories.

You can also manually calculate percentages for more control:

sales_pct <- sales_data %>%
  group_by(category) %>%
  mutate(percentage = sales / sum(sales) * 100) %>%
  ungroup()

ggplot(sales_pct, aes(x = category, y = percentage, fill = region)) +
  geom_col(position = "stack") +
  labs(y = "Percentage (%)")

Customizing for Clarity

Default colors and styling rarely produce publication-ready charts. Here’s how to improve readability:

# Define custom color palette
region_colors <- c("North" = "#2E86AB", "South" = "#A23B72", "West" = "#F18F01")

# Reorder categories by total sales
sales_data <- sales_data %>%
  group_by(category) %>%
  mutate(total = sum(sales)) %>%
  ungroup() %>%
  mutate(category = forcats::fct_reorder(category, total))

# Create polished chart
ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
  geom_col(position = "stack", width = 0.7) +
  geom_text(aes(label = scales::comma(sales)),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold",
            size = 3.5) +
  scale_fill_manual(values = region_colors) +
  scale_y_continuous(labels = scales::comma_format(),
                     expand = expansion(mult = c(0, 0.05))) +
  labs(title = "Regional Sales Performance by Category",
       subtitle = "Total sales in USD, sorted by category total",
       x = NULL,
       y = "Sales ($)",
       fill = "Region",
       caption = "Data: 2024 Sales Report") +
  theme_minimal(base_size = 12) +
  theme(panel.grid.major.x = element_blank(),
        legend.position = "top",
        plot.title = element_text(face = "bold", size = 14))

The geom_text() with position_stack(vjust = 0.5) centers labels within each segment. Use scales::comma() for thousands separators and adjust size based on your segment heights.

Advanced Techniques

Faceting for complex comparisons:

# Add time dimension
sales_quarterly <- sales_data %>%
  slice(rep(1:n(), 4)) %>%
  mutate(quarter = rep(paste0("Q", 1:4), each = nrow(sales_data)),
         sales = sales * runif(n(), 0.8, 1.2))

ggplot(sales_quarterly, aes(x = category, y = sales, fill = region)) +
  geom_col(position = "fill") +
  facet_wrap(~quarter, nrow = 1) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Controlling stack order:

By default, ggplot2 stacks in alphabetical order. Control this with factor levels:

sales_data <- sales_data %>%
  mutate(region = factor(region, levels = c("West", "North", "South")))

# Now West appears at the bottom of each stack

Handling negative values:

Stacked bars with negative values require careful handling. Split positive and negative values:

profit_data <- data.frame(
  department = rep(c("Sales", "Marketing", "Operations"), each = 2),
  category = rep(c("Revenue", "Costs"), times = 3),
  amount = c(500000, -300000, 200000, -180000, 150000, -140000)
)

ggplot(profit_data, aes(x = department, y = amount, fill = category)) +
  geom_col(position = "stack") +
  geom_hline(yintercept = 0, linewidth = 0.5) +
  scale_y_continuous(labels = scales::comma_format()) +
  theme_minimal()

Common Pitfalls and Best Practices

When NOT to use stacked bars:

More than 5-6 segments make colors indistinguishable
When precise comparison of middle segments matters (use grouped bars)
When showing trends over many time periods (use line charts)
With highly variable totals (consider proportional bars or separate charts)

Accessibility considerations:

Use colorblind-friendly palettes like viridis or ensure sufficient contrast:

library(viridis)

ggplot(sales_data, aes(x = category, y = sales, fill = region)) +
  geom_col(position = "stack") +
  scale_fill_viridis_d(option = "plasma", end = 0.9) +
  theme_minimal()

Always include a legend and consider adding direct labels for critical values.

Ordering matters:

Place the most important segment at the bottom for easiest comparison. If showing time-series data, maintain consistent segment order across all bars.

Alternative approaches:

For complex data, consider:

Grouped bars for easier segment comparison
Small multiples (faceting) to reduce segments per chart
Treemaps for hierarchical data with many categories
Slope charts for before/after comparisons

Stacked bar charts excel at showing part-to-whole relationships, but they’re not universal. Choose them when totals matter and you need to show composition, but be ready to switch approaches when precise comparison of individual segments becomes the priority.