How to Plot the T Distribution in R
The t distribution is the workhorse of inferential statistics when you're dealing with small samples or unknown population variance—which is most real-world scenarios. Developed by William Sealy...
Key Insights
- R provides four core functions for working with t distributions:
dt()for density,pt()for cumulative probability,qt()for quantiles, andrt()for random sampling—master these and you can visualize any aspect of the distribution. - The t distribution approaches the normal distribution as degrees of freedom increase; plotting multiple curves together makes this relationship immediately obvious and aids statistical intuition.
- Shading critical regions with
polygon()in base R orgeom_ribbon()in ggplot2 transforms abstract p-values into concrete visual areas, making hypothesis testing concepts tangible.
Introduction to the T Distribution
The t distribution is the workhorse of inferential statistics when you’re dealing with small samples or unknown population variance—which is most real-world scenarios. Developed by William Sealy Gosset (publishing under the pseudonym “Student”), this distribution accounts for the additional uncertainty that comes from estimating population parameters from limited data.
Unlike the normal distribution, the t distribution has heavier tails. This means extreme values are more likely, which is exactly what you’d expect when working with less information. The shape of the t distribution depends on a single parameter: degrees of freedom (df). As df increases, the distribution converges to the standard normal.
Visualization matters here because the t distribution isn’t just one curve—it’s a family of curves. Plotting them helps you understand why a t-test with 5 observations behaves differently than one with 50. It also makes concepts like critical values and p-values concrete rather than abstract.
Understanding T Distribution Functions in R
R provides four built-in functions for the t distribution, following its standard naming convention for probability distributions:
# dt() - Density function (height of the curve at a point)
dt(x = 0, df = 10)
# [1] 0.3891084
# pt() - Cumulative distribution function (area to the left)
pt(q = 1.96, df = 10)
# [1] 0.9608059
# qt() - Quantile function (inverse of pt)
qt(p = 0.975, df = 10)
# [1] 2.228139
# rt() - Random generation
set.seed(42)
rt(n = 5, df = 10)
# [1] -0.5318628 0.2833269 -0.8274715 0.1528933 -0.1388700
Notice that qt(0.975, df = 10) returns 2.228, not 1.96. This is the critical value for a two-tailed test at α = 0.05 with 10 degrees of freedom. The heavier tails of the t distribution push critical values further out compared to the normal distribution.
These four functions are the foundation for everything that follows. dt() gives you the y-values for plotting curves, pt() calculates areas under the curve, qt() finds cutoff points, and rt() generates simulated data.
Plotting a Basic T Distribution Curve
The simplest way to plot a t distribution uses base R’s curve() function:
# Basic t distribution plot with 10 degrees of freedom
curve(dt(x, df = 10),
from = -4,
to = 4,
main = "T Distribution (df = 10)",
xlab = "t value",
ylab = "Density",
col = "steelblue",
lwd = 2)
# Add a reference line at zero
abline(v = 0, lty = 2, col = "gray50")
The curve() function is elegant because it handles the x-value generation automatically. You specify the range with from and to, and R evaluates the function across that interval.
For more control, you can build the plot manually:
# Manual approach with explicit x values
x <- seq(-4, 4, length.out = 200)
y <- dt(x, df = 10)
plot(x, y,
type = "l",
main = "T Distribution (df = 10)",
xlab = "t value",
ylab = "Density",
col = "steelblue",
lwd = 2)
This approach is useful when you need to store the coordinates for later use, such as shading regions.
Comparing T Distributions with Different Degrees of Freedom
The relationship between degrees of freedom and distribution shape is best understood visually. Let’s plot several t distributions alongside the standard normal:
# Set up the plot with the first curve
curve(dt(x, df = 1),
from = -4,
to = 4,
main = "T Distributions vs Normal Distribution",
xlab = "t value",
ylab = "Density",
col = "red",
lwd = 2,
ylim = c(0, 0.4))
# Add t distributions with increasing df
curve(dt(x, df = 5), add = TRUE, col = "orange", lwd = 2)
curve(dt(x, df = 30), add = TRUE, col = "forestgreen", lwd = 2)
# Add standard normal for comparison
curve(dnorm(x), add = TRUE, col = "black", lwd = 2, lty = 2)
# Add legend
legend("topright",
legend = c("df = 1", "df = 5", "df = 30", "Normal"),
col = c("red", "orange", "forestgreen", "black"),
lty = c(1, 1, 1, 2),
lwd = 2,
bty = "n")
The plot reveals several important patterns. With df = 1 (the Cauchy distribution), the tails are so heavy that the distribution looks almost flat compared to the normal. At df = 5, you can see the characteristic bell shape but with noticeably heavier tails. By df = 30, the t distribution is nearly indistinguishable from the normal—this is why many textbooks use 30 as the threshold for “large sample” approximations.
Shading Critical Regions and P-Values
Static curves become powerful teaching and communication tools when you shade regions corresponding to critical values or p-values. Here’s how to shade the rejection regions for a two-tailed test at α = 0.05:
# Define parameters
df <- 15
alpha <- 0.05
critical_value <- qt(1 - alpha/2, df)
# Create the base plot
x <- seq(-4, 4, length.out = 200)
y <- dt(x, df)
plot(x, y,
type = "l",
main = paste("Two-Tailed Critical Regions (df =", df, ", α =", alpha, ")"),
xlab = "t value",
ylab = "Density",
col = "steelblue",
lwd = 2)
# Shade left tail
x_left <- seq(-4, -critical_value, length.out = 100)
y_left <- dt(x_left, df)
polygon(c(x_left, rev(x_left)),
c(y_left, rep(0, length(y_left))),
col = rgb(1, 0, 0, 0.3),
border = NA)
# Shade right tail
x_right <- seq(critical_value, 4, length.out = 100)
y_right <- dt(x_right, df)
polygon(c(x_right, rev(x_right)),
c(y_right, rep(0, length(y_right))),
col = rgb(1, 0, 0, 0.3),
border = NA)
# Add critical value lines and labels
abline(v = c(-critical_value, critical_value), lty = 2, col = "red")
text(critical_value + 0.3, 0.15,
paste("t =", round(critical_value, 3)),
col = "red", cex = 0.8)
text(-critical_value - 0.3, 0.15,
paste("t =", round(-critical_value, 3)),
col = "red", cex = 0.8)
The polygon() function takes x and y coordinates that trace the outline of the region to fill. The trick is to create a closed shape by going along the curve and then back along the x-axis (y = 0).
For visualizing a specific p-value, say from an observed t-statistic of 2.5:
# Observed t-statistic
t_obs <- 2.5
df <- 15
# Calculate two-tailed p-value
p_value <- 2 * pt(-abs(t_obs), df)
# Create plot with shaded p-value region
x <- seq(-4, 4, length.out = 200)
y <- dt(x, df)
plot(x, y, type = "l", col = "steelblue", lwd = 2,
main = paste("P-value =", round(p_value, 4)),
xlab = "t value", ylab = "Density")
# Shade both tails beyond observed statistic
x_left <- seq(-4, -abs(t_obs), length.out = 100)
x_right <- seq(abs(t_obs), 4, length.out = 100)
polygon(c(x_left, rev(x_left)), c(dt(x_left, df), rep(0, 100)),
col = rgb(0.2, 0.6, 0.8, 0.4), border = NA)
polygon(c(x_right, rev(x_right)), c(dt(x_right, df), rep(0, 100)),
col = rgb(0.2, 0.6, 0.8, 0.4), border = NA)
abline(v = c(-t_obs, t_obs), lty = 2, col = "darkblue")
Creating T Distribution Plots with ggplot2
For publication-quality graphics or integration with the tidyverse workflow, ggplot2 offers a more declarative approach:
library(ggplot2)
# Basic t distribution with ggplot2
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = dt, args = list(df = 10),
color = "steelblue", linewidth = 1) +
labs(title = "T Distribution (df = 10)",
x = "t value",
y = "Density") +
theme_minimal()
Comparing multiple distributions is straightforward with multiple stat_function() layers:
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = dt, args = list(df = 1),
aes(color = "df = 1"), linewidth = 1) +
stat_function(fun = dt, args = list(df = 5),
aes(color = "df = 5"), linewidth = 1) +
stat_function(fun = dt, args = list(df = 30),
aes(color = "df = 30"), linewidth = 1) +
stat_function(fun = dnorm,
aes(color = "Normal"), linewidth = 1, linetype = "dashed") +
scale_color_manual(values = c("df = 1" = "red", "df = 5" = "orange",
"df = 30" = "forestgreen", "Normal" = "black")) +
labs(title = "T Distributions vs Normal",
x = "t value", y = "Density", color = "Distribution") +
theme_minimal()
For shading critical regions in ggplot2, use stat_function() with geom = "area":
df <- 15
cv <- qt(0.975, df)
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) +
stat_function(fun = dt, args = list(df = df),
geom = "area", xlim = c(-4, -cv),
fill = "red", alpha = 0.3) +
stat_function(fun = dt, args = list(df = df),
geom = "area", xlim = c(cv, 4),
fill = "red", alpha = 0.3) +
stat_function(fun = dt, args = list(df = df),
color = "steelblue", linewidth = 1) +
geom_vline(xintercept = c(-cv, cv), linetype = "dashed", color = "red") +
labs(title = "Critical Regions for Two-Tailed Test (α = 0.05)",
x = "t value", y = "Density") +
theme_minimal()
Practical Applications and Summary
These visualization techniques serve several practical purposes:
Hypothesis testing reports: Shaded plots communicate results more effectively than tables of numbers. A reviewer can immediately see where the test statistic falls relative to critical regions.
Teaching materials: Students grasp the relationship between degrees of freedom and distribution shape much faster when they see the curves overlaid.
Assumption checking: Plotting the theoretical t distribution against a histogram of your data helps assess whether the t-test assumptions are reasonable.
Quick reference for the core functions:
| Function | Purpose | Example |
|---|---|---|
dt(x, df) |
Density at point x | dt(0, df=10) → 0.389 |
pt(q, df) |
P(T ≤ q) | pt(2, df=10) → 0.963 |
qt(p, df) |
Value where P(T ≤ x) = p | qt(0.975, df=10) → 2.228 |
rt(n, df) |
Generate n random values | rt(100, df=10) |
The t distribution is fundamental to statistical inference, and R makes it trivial to visualize. Whether you prefer base R’s simplicity or ggplot2’s flexibility, the key is understanding what you’re plotting: density curves that represent probability, with areas under the curve corresponding to the likelihood of observing values in that range.