How to Create a Bubble Chart in ggplot2
Bubble charts are enhanced scatter plots that display three dimensions of data simultaneously: two variables mapped to the x and y axes, and a third variable represented by the size of each point...
Key Insights
- Bubble charts extend scatter plots by adding a third dimension through point size, making them ideal for displaying relationships between three continuous variables simultaneously
- Use
scale_size_area()instead ofscale_size_continuous()to ensure bubble area (not radius) is proportional to your data values, preventing visual misrepresentation - Always apply alpha transparency (0.5-0.7) to overlapping bubbles to reveal hidden data points and improve readability in dense visualizations
Introduction to Bubble Charts
Bubble charts are enhanced scatter plots that display three dimensions of data simultaneously: two variables mapped to the x and y axes, and a third variable represented by the size of each point (bubble). This makes them particularly effective for exploring relationships between three continuous variables in a single visualization.
The primary use case for bubble charts is when you need to understand how three variables interact. For example, you might plot a company’s products with revenue on the x-axis, profit margin on the y-axis, and market share as bubble size. This immediately reveals which products are both profitable and significant to your business.
You can extend bubble charts to a fourth dimension by mapping an additional variable to color. This could be a categorical variable (like product category or region) or another continuous variable. However, be cautious—adding too many dimensions can make your chart difficult to interpret.
Use bubble charts when you have a reasonably sized dataset (typically under 100 observations). Too many bubbles create an unreadable mess. For larger datasets, consider filtering to the most important observations or using alternative visualizations.
Setting Up Your Environment
Before creating bubble charts, you’ll need ggplot2. While ggplot2 is the core requirement, I recommend also loading dplyr for data manipulation and scales for better control over your aesthetics.
# Install packages if needed
install.packages("ggplot2")
install.packages("dplyr")
# Load libraries
library(ggplot2)
library(dplyr)
# Create sample dataset
companies <- data.frame(
name = c("TechCorp", "DataInc", "CloudSys", "DevTools", "AIStart",
"WebPro", "MobileCo", "SecureNet", "AnalytiX", "CodeBase"),
revenue = c(450, 280, 620, 150, 90, 380, 520, 210, 340, 180),
profit_margin = c(22, 18, 28, 15, 8, 25, 30, 20, 23, 12),
employees = c(1200, 450, 2100, 180, 50, 890, 1500, 320, 670, 240),
industry = c("Software", "Analytics", "Cloud", "Tools", "AI",
"Web", "Mobile", "Security", "Analytics", "Tools")
)
This dataset represents fictional companies with revenue (millions), profit margin (percentage), employee count, and industry category. This gives us multiple dimensions to explore in our visualizations.
Creating a Basic Bubble Chart
The foundation of a bubble chart in ggplot2 is geom_point() with the size aesthetic mapped to your third variable. This is simpler than you might expect—ggplot2 handles the heavy lifting.
# Basic bubble chart
ggplot(companies, aes(x = revenue, y = profit_margin, size = employees)) +
geom_point()
This creates a functional bubble chart where each company is represented by a bubble. The x-axis shows revenue, the y-axis shows profit margin, and bubble size represents employee count. You can immediately see patterns: larger companies don’t necessarily have higher profit margins.
However, this basic chart has issues. The bubbles might be too large or too small, they overlap without transparency, and there’s no context about what we’re looking at. Let’s fix these problems.
Customizing Bubble Appearance
The default size scaling often produces bubbles that are either too large (obscuring other data) or too small (hard to differentiate). You have two main options for controlling bubble size: scale_size_continuous() and scale_size_area().
Here’s the critical distinction: scale_size_continuous() maps your data values to the radius of bubbles, while scale_size_area() maps values to the area. Since humans perceive bubble size by area (not radius), scale_size_area() provides more accurate visual representation.
# Improved bubble chart with proper scaling
ggplot(companies, aes(x = revenue, y = profit_margin, size = employees)) +
geom_point(alpha = 0.6) +
scale_size_area(max_size = 20, name = "Employees")
The alpha = 0.6 parameter adds 60% opacity (40% transparency), which is crucial when bubbles overlap. This lets you see overlapping data points instead of having them completely hidden.
The max_size parameter in scale_size_area() controls the maximum bubble size in millimeters. Adjust this based on your data and plot size—typically values between 15 and 25 work well.
You can further customize appearance with colors and borders:
ggplot(companies, aes(x = revenue, y = profit_margin, size = employees)) +
geom_point(alpha = 0.6, color = "steelblue", fill = "lightblue", shape = 21, stroke = 1) +
scale_size_area(max_size = 20, name = "Employees")
Using shape = 21 (filled circles) allows separate control of border color (color) and fill (fill). The stroke parameter controls border thickness.
Adding a Fourth Dimension with Color
Mapping a fourth variable to color transforms your bubble chart into an even more powerful analytical tool. You can map either categorical or continuous variables to color.
For categorical variables (like industry):
ggplot(companies, aes(x = revenue, y = profit_margin, size = employees, color = industry)) +
geom_point(alpha = 0.7) +
scale_size_area(max_size = 20, name = "Employees") +
scale_color_brewer(palette = "Set2", name = "Industry")
This immediately reveals industry clusters and patterns. You might notice that cloud companies tend toward higher revenue, while AI startups have lower profit margins.
For continuous variables, use gradient scales:
# Create a metric: revenue per employee
companies <- companies %>%
mutate(revenue_per_employee = revenue / employees * 1000)
ggplot(companies, aes(x = revenue, y = profit_margin,
size = employees, color = revenue_per_employee)) +
geom_point(alpha = 0.7) +
scale_size_area(max_size = 20, name = "Employees") +
scale_color_gradient(low = "yellow", high = "red",
name = "Revenue per\nEmployee ($K)")
This visualization now shows four dimensions: revenue, profit margin, employee count, and efficiency (revenue per employee). The gradient color scale makes it easy to identify the most efficient companies.
Advanced Formatting and Labels
A publication-ready bubble chart needs proper labels, titles, and theme customization. Here’s how to polish your visualization:
ggplot(companies, aes(x = revenue, y = profit_margin,
size = employees, color = industry)) +
geom_point(alpha = 0.7) +
scale_size_area(max_size = 20, name = "Employees") +
scale_color_brewer(palette = "Set2", name = "Industry") +
labs(
title = "Company Performance Analysis",
subtitle = "Revenue vs. Profit Margin by Employee Count",
x = "Revenue ($ millions)",
y = "Profit Margin (%)",
caption = "Source: Company financial data"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 12, color = "gray40"),
legend.position = "right",
panel.grid.minor = element_blank()
)
For specific bubbles you want to highlight, add text labels:
# Label only the largest companies
companies_labeled <- companies %>%
filter(employees > 1000)
ggplot(companies, aes(x = revenue, y = profit_margin,
size = employees, color = industry)) +
geom_point(alpha = 0.7) +
geom_text(data = companies_labeled, aes(label = name),
size = 3, hjust = -0.2, show.legend = FALSE) +
scale_size_area(max_size = 20, name = "Employees") +
scale_color_brewer(palette = "Set2", name = "Industry") +
labs(
title = "Company Performance Analysis",
x = "Revenue ($ millions)",
y = "Profit Margin (%)"
) +
theme_minimal()
The hjust = -0.2 parameter positions labels to the right of bubbles. Adjust this value or use vjust for vertical positioning based on your data distribution.
Practical Use Case: Real-World Example
Let’s create a complete analysis using the gapminder dataset, which contains country-level data on life expectancy, GDP, and population. This is a classic bubble chart use case made famous by Hans Rosling.
# Install gapminder if needed
# install.packages("gapminder")
library(gapminder)
# Filter to 2007 data
gap_2007 <- gapminder %>%
filter(year == 2007)
# Create comprehensive bubble chart
ggplot(gap_2007, aes(x = gdpPercap, y = lifeExp,
size = pop, color = continent)) +
geom_point(alpha = 0.6) +
scale_size_area(max_size = 20, name = "Population",
labels = scales::comma) +
scale_x_log10(labels = scales::dollar) +
scale_color_brewer(palette = "Set2", name = "Continent") +
labs(
title = "Global Health and Wealth in 2007",
subtitle = "Life expectancy vs. GDP per capita by population",
x = "GDP per Capita (log scale)",
y = "Life Expectancy (years)",
caption = "Data: Gapminder Foundation"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
legend.position = "right",
panel.grid.minor = element_blank()
)
This visualization reveals several insights: African countries cluster in the lower-left (low GDP, lower life expectancy), European countries in the upper-right (high GDP, high life expectancy), and Asian countries show the widest spread. The bubble sizes immediately highlight population giants like China and India.
The log scale on the x-axis (scale_x_log10()) is essential here because GDP values span several orders of magnitude. Without it, most countries would be compressed on the left side of the chart.
Bubble charts excel at revealing these multi-dimensional patterns that would require multiple separate plots to discover otherwise. The key is choosing the right variables and applying proper scaling and transparency to make your data tell its story clearly.