How to Create a Heatmap in ggplot2
Heatmaps encode quantitative data using color intensity, making them invaluable for spotting patterns in large datasets. They excel at visualizing correlation matrices, temporal patterns across...
Key Insights
- Use
geom_tile()for heatmaps with categorical axes andgeom_raster()for faster rendering with continuous data on a regular grid - Choose color scales deliberately: sequential for magnitude data, diverging for data with a meaningful midpoint, and viridis palettes for accessibility
- Always reshape data to long format with three columns (x-axis variable, y-axis variable, fill value) before creating heatmaps in ggplot2
Understanding Heatmaps in ggplot2
Heatmaps encode quantitative data using color intensity, making them invaluable for spotting patterns in large datasets. They excel at visualizing correlation matrices, temporal patterns across categories, and any scenario where you need to compare values across two categorical dimensions simultaneously.
Unlike specialized packages that offer limited customization, ggplot2’s grammar of graphics approach gives you complete control over every visual element. This matters when you’re creating publication-ready visualizations or need to match specific brand guidelines.
Building Your First Heatmap
The critical requirement for ggplot2 heatmaps is data structure. You need three columns: one for x-axis categories, one for y-axis categories, and one for the values that determine fill color. This is long format, not the wide matrix format you might start with.
library(ggplot2)
library(dplyr)
# Create sample data: product performance across regions
sales_data <- expand.grid(
product = c("Widget A", "Widget B", "Widget C", "Widget D"),
region = c("North", "South", "East", "West")
) %>%
mutate(sales = c(23, 45, 67, 34, 56, 78, 90, 45,
67, 89, 23, 56, 78, 34, 56, 67))
# Basic heatmap
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile() +
labs(title = "Sales Performance by Region",
x = "Region", y = "Product", fill = "Sales ($K)")
Use geom_tile() as your default. It works with any data structure and allows variable tile sizes. Switch to geom_raster() only when you have continuous, evenly-spaced data and need performance optimization for large datasets (10,000+ cells).
Mastering Color Scales
Color choice makes or breaks a heatmap. The wrong palette obscures patterns; the right one reveals insights instantly.
For data representing magnitude (sales, counts, percentages), use sequential scales:
# Sequential scale - light to dark
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "darkblue") +
labs(title = "Sequential Scale")
For data with a meaningful midpoint (correlations, temperature anomalies, profit/loss), use diverging scales:
# Create data with positive and negative values
performance_data <- sales_data %>%
mutate(variance = sales - mean(sales))
# Diverging scale - highlights deviations from zero
ggplot(performance_data, aes(x = region, y = product, fill = variance)) +
geom_tile() +
scale_fill_gradient2(low = "darkred", mid = "white", high = "darkgreen",
midpoint = 0) +
labs(title = "Diverging Scale", fill = "Variance")
For accessibility and perceptual uniformity, use viridis palettes:
# Viridis scale - colorblind-friendly and perceptually uniform
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile() +
scale_fill_viridis_c(option = "plasma") +
labs(title = "Viridis Scale")
The viridis options (viridis, magma, plasma, inferno, cividis) are designed to maintain contrast in grayscale and for various types of color blindness. Use them as your default unless you have specific branding requirements.
Adding Value Labels
Empty colored tiles force readers to constantly reference the legend. Add direct labels for faster comprehension:
# Heatmap with value labels
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile(color = "white", linewidth = 0.5) +
geom_text(aes(label = paste0("$", sales, "K")),
color = "white", size = 4) +
scale_fill_viridis_c(option = "plasma") +
labs(title = "Sales with Direct Labels") +
theme_minimal()
For better readability on varying backgrounds, adjust text color based on tile value:
# Conditional text color for readability
sales_data <- sales_data %>%
mutate(text_color = ifelse(sales > median(sales), "white", "black"))
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile(color = "white", linewidth = 0.5) +
geom_text(aes(label = paste0("$", sales, "K"), color = text_color),
size = 4) +
scale_fill_gradient(low = "#f0f0f0", high = "#2c3e50") +
scale_color_identity() +
labs(title = "Adaptive Text Color") +
theme_minimal()
Building Correlation Matrix Heatmaps
Correlation matrices are the most common heatmap use case. The workflow requires reshaping the correlation matrix from wide to long format:
library(tidyr)
# Create correlation matrix from mtcars dataset
cor_matrix <- cor(mtcars[, c("mpg", "cyl", "disp", "hp", "wt")])
# Reshape to long format
cor_data <- as.data.frame(cor_matrix) %>%
mutate(var1 = rownames(.)) %>%
pivot_longer(cols = -var1, names_to = "var2", values_to = "correlation")
# Create correlation heatmap
ggplot(cor_data, aes(x = var1, y = var2, fill = correlation)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", correlation)),
color = "black", size = 3) +
scale_fill_gradient2(low = "#d73027", mid = "white", high = "#4575b4",
midpoint = 0, limits = c(-1, 1)) +
labs(title = "Correlation Matrix Heatmap",
x = NULL, y = NULL, fill = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
For cleaner visualizations, show only the lower triangle:
# Keep only lower triangle
cor_data_lower <- cor_data %>%
mutate(var1_num = as.numeric(factor(var1)),
var2_num = as.numeric(factor(var2))) %>%
filter(var1_num > var2_num)
ggplot(cor_data_lower, aes(x = var1, y = var2, fill = correlation)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", correlation)), size = 3) +
scale_fill_gradient2(low = "#d73027", mid = "white", high = "#4575b4",
midpoint = 0, limits = c(-1, 1)) +
coord_fixed() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Professional Styling
Production heatmaps require polish. Remove visual clutter, ensure proper aspect ratios, and guide the reader’s attention:
ggplot(sales_data, aes(x = region, y = product, fill = sales)) +
geom_tile(color = "gray90", linewidth = 0.5) +
geom_text(aes(label = paste0("$", sales, "K")),
color = "white", fontface = "bold", size = 5) +
scale_fill_viridis_c(option = "magma", begin = 0.2, end = 0.9) +
labs(title = "Q4 2024 Sales Performance",
subtitle = "Revenue in thousands by region and product line",
x = NULL, y = NULL, fill = "Revenue ($K)") +
coord_fixed(ratio = 1) + # Square tiles
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "gray40", margin = margin(b = 15)),
axis.text = element_text(color = "gray20"),
panel.grid = element_blank(),
legend.position = "right",
plot.margin = margin(20, 20, 20, 20)
)
The coord_fixed() function maintains square tiles, preventing distortion. Remove grid lines with panel.grid = element_blank() since tile borders provide sufficient structure.
Real-World Example: Website Traffic Analysis
Here’s a complete workflow analyzing hourly website traffic patterns across weekdays:
# Simulate website traffic data
set.seed(42)
traffic_data <- expand.grid(
hour = 0:23,
day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
) %>%
mutate(
day = factor(day, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")),
# Simulate realistic traffic patterns
visitors = case_when(
day %in% c("Sat", "Sun") ~ rpois(n(), lambda = 150 + hour * 5),
hour >= 9 & hour <= 17 ~ rpois(n(), lambda = 300 + hour * 10),
TRUE ~ rpois(n(), lambda = 100 + hour * 3)
)
)
# Create publication-ready heatmap
ggplot(traffic_data, aes(x = hour, y = day, fill = visitors)) +
geom_tile(color = "white", linewidth = 1) +
scale_fill_viridis_c(option = "inferno", labels = scales::comma) +
scale_x_continuous(breaks = seq(0, 23, 3),
labels = paste0(seq(0, 23, 3), ":00")) +
labs(title = "Website Traffic Patterns",
subtitle = "Average hourly visitors by day of week",
x = "Hour of Day", y = NULL, fill = "Visitors") +
coord_fixed(ratio = 1) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 18, margin = margin(b = 5)),
plot.subtitle = element_text(color = "gray40", margin = margin(b = 15)),
axis.text.y = element_text(size = 11),
panel.grid = element_blank(),
legend.position = "right"
)
This visualization immediately reveals weekday business hours drive peak traffic, while weekends show more distributed patterns. These insights inform content scheduling and server capacity planning.
Heatmaps in ggplot2 transform multidimensional data into instantly comprehensible patterns. Master the fundamentals—proper data structure, appropriate color scales, and thoughtful styling—and you’ll create visualizations that drive decisions rather than just decorate reports.