R ggplot2 - Line Plot with Examples

The fundamental structure of a ggplot2 line plot combines the `ggplot()` function with `geom_line()`. The data must include at least two continuous variables: one for the x-axis and one for the...

Key Insights

  • ggplot2’s line plots excel at visualizing trends over continuous variables, with geom_line() as the core layer for connecting data points in sequential order
  • Multiple lines on the same plot require proper grouping through aesthetics (color, linetype) or faceting, with scale_* functions controlling appearance and legends
  • Performance optimization matters for large datasets—use stat_summary() for aggregation, limit point rendering, and consider ggnewscale for complex multi-layer visualizations

Basic Line Plot Structure

The fundamental structure of a ggplot2 line plot combines the ggplot() function with geom_line(). The data must include at least two continuous variables: one for the x-axis and one for the y-axis.

library(ggplot2)

# Create sample time series data
dates <- seq(as.Date("2024-01-01"), as.Date("2024-12-31"), by = "day")
values <- cumsum(rnorm(length(dates), mean = 0.5, sd = 10))
df <- data.frame(date = dates, value = values)

# Basic line plot
ggplot(df, aes(x = date, y = value)) +
  geom_line() +
  labs(title = "Daily Value Progression",
       x = "Date",
       y = "Cumulative Value")

The aes() function maps data columns to visual properties. For line plots, x and y are mandatory. The line connects points in the order they appear in the dataset, so ensure proper sorting for time series data.

Customizing Line Appearance

Control line aesthetics through parameters in geom_line() or within aes() for data-driven styling.

# Fixed aesthetics (outside aes)
ggplot(df, aes(x = date, y = value)) +
  geom_line(color = "#2E86AB", 
            linewidth = 1.2,
            linetype = "solid") +
  theme_minimal()

# Multiple line types available
linetypes <- data.frame(
  x = rep(1:10, 6),
  y = rep(1:6, each = 10),
  type = rep(c("solid", "dashed", "dotted", 
               "dotdash", "longdash", "twodash"), each = 10)
)

ggplot(linetypes, aes(x = x, y = y, linetype = type)) +
  geom_line(linewidth = 1) +
  facet_wrap(~type, ncol = 2) +
  theme_minimal()

Use linewidth instead of the deprecated size parameter. Linetype options include solid, dashed, dotted, dotdash, longdash, and twodash, or specify custom patterns with numeric vectors.

Multiple Lines with Grouping

Plotting multiple lines requires explicit grouping. Map categorical variables to aesthetics like color, linetype, or use the group aesthetic.

# Generate multi-series data
set.seed(42)
months <- seq(as.Date("2024-01-01"), as.Date("2024-12-31"), by = "month")
products <- c("Product_A", "Product_B", "Product_C")

multi_df <- expand.grid(month = months, product = products)
multi_df$revenue <- rnorm(nrow(multi_df), mean = 50000, sd = 10000) +
  as.numeric(multi_df$month) * 100

# Multiple lines by color
ggplot(multi_df, aes(x = month, y = revenue, color = product)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("#E63946", "#457B9D", "#2A9D8F")) +
  labs(title = "Monthly Revenue by Product",
       x = "Month",
       y = "Revenue ($)",
       color = "Product") +
  theme_minimal() +
  theme(legend.position = "bottom")

The scale_color_manual() function provides precise control over colors. For automatic color selection, use scale_color_brewer() for ColorBrewer palettes or scale_color_viridis_d() for colorblind-friendly options.

Combining Points and Lines

Add points to line plots to emphasize individual observations, particularly useful for sparse data or highlighting specific values.

# Quarterly data with points
quarterly <- data.frame(
  quarter = as.Date(c("2024-01-01", "2024-04-01", 
                      "2024-07-01", "2024-10-01")),
  sales = c(120000, 145000, 138000, 162000)
)

ggplot(quarterly, aes(x = quarter, y = sales)) +
  geom_line(linewidth = 1, color = "#264653") +
  geom_point(size = 4, color = "#E76F51") +
  scale_y_continuous(labels = scales::dollar_format()) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b %Y") +
  labs(title = "Quarterly Sales Performance",
       x = NULL,
       y = "Sales") +
  theme_minimal()

Layer order matters: geom_point() after geom_line() ensures points render on top. Use scales::dollar_format() and scales::comma_format() for professional axis labels.

Handling Missing Values

ggplot2 handles NA values by creating breaks in lines. Control this behavior explicitly for different use cases.

# Data with missing values
df_missing <- data.frame(
  x = 1:10,
  y = c(5, 7, NA, 8, 10, NA, NA, 15, 17, 20)
)

# Default behavior (breaks in line)
p1 <- ggplot(df_missing, aes(x = x, y = y)) +
  geom_line() +
  ggtitle("Default: Line Breaks at NA")

# Remove NA values to connect across gaps
df_complete <- na.omit(df_missing)
p2 <- ggplot(df_complete, aes(x = x, y = y)) +
  geom_line() +
  ggtitle("NA Removed: Continuous Line")

# Interpolate missing values
library(zoo)
df_interpolated <- df_missing
df_interpolated$y <- na.approx(df_interpolated$y, na.rm = FALSE)

p3 <- ggplot(df_interpolated, aes(x = x, y = y)) +
  geom_line() +
  ggtitle("Interpolated Values")

Choose the approach based on your data context. Time series often benefit from interpolation, while survey data may require explicit breaks.

Confidence Intervals and Ribbons

Visualize uncertainty with geom_ribbon() or geom_smooth() for trend lines with confidence bands.

# Simulate data with confidence intervals
set.seed(123)
time_points <- 1:50
mean_values <- 10 + 0.5 * time_points + rnorm(50, 0, 2)
se <- 2 + 0.05 * time_points

ci_df <- data.frame(
  time = time_points,
  mean = mean_values,
  lower = mean_values - 1.96 * se,
  upper = mean_values + 1.96 * se
)

ggplot(ci_df, aes(x = time, y = mean)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), 
              fill = "#457B9D", alpha = 0.3) +
  geom_line(color = "#1D3557", linewidth = 1) +
  labs(title = "Trend with 95% Confidence Interval",
       x = "Time",
       y = "Value") +
  theme_minimal()

Set alpha between 0.2-0.4 for ribbons to maintain line visibility. For model predictions, geom_smooth() automatically calculates confidence intervals.

Faceting for Multiple Series

When comparing many groups, faceting often provides clearer visualization than overlapping lines.

# Multi-category time series
set.seed(456)
dates <- seq(as.Date("2024-01-01"), as.Date("2024-06-30"), by = "day")
regions <- c("North", "South", "East", "West")

facet_df <- expand.grid(date = dates, region = regions)
facet_df$value <- rnorm(nrow(facet_df), mean = 100, sd = 15) +
  as.numeric(facet_df$date) * 0.1

ggplot(facet_df, aes(x = date, y = value)) +
  geom_line(color = "#2A9D8F", linewidth = 0.8) +
  facet_wrap(~region, ncol = 2, scales = "free_y") +
  labs(title = "Regional Performance Comparison",
       x = "Date",
       y = "Value") +
  theme_minimal() +
  theme(strip.background = element_rect(fill = "#F1FAEE"),
        strip.text = element_text(face = "bold"))

Use scales = "free_y" when y-axis ranges differ significantly between facets. For time series, keep x-axis scales fixed for temporal alignment.

Performance Optimization

Large datasets require optimization strategies to maintain rendering performance.

# Large dataset simulation
large_df <- data.frame(
  x = 1:100000,
  y = cumsum(rnorm(100000))
)

# Strategy 1: Downsample for visualization
downsample <- large_df[seq(1, nrow(large_df), by = 100), ]

ggplot(downsample, aes(x = x, y = y)) +
  geom_line() +
  ggtitle("Downsampled Data")

# Strategy 2: Aggregate with stat_summary
ggplot(large_df, aes(x = cut(x, breaks = 100), y = y)) +
  stat_summary(fun = mean, geom = "line", 
               aes(group = 1), linewidth = 1) +
  labs(title = "Aggregated Means",
       x = "Binned X") +
  theme_minimal() +
  theme(axis.text.x = element_blank())

For interactive exploration of large datasets, consider plotly::ggplotly() which converts ggplot2 objects to interactive plots with hover information and zoom capabilities.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.