T Distribution in R: Complete Guide

The t distribution solves a fundamental problem in statistics: what happens when you don't know the population standard deviation and have to estimate it from your sample? William Sealy Gosset...

Key Insights

  • The t distribution is your go-to when working with small samples or unknown population variance—it accounts for the extra uncertainty that the normal distribution ignores in these situations.
  • R provides four essential functions (dt(), pt(), qt(), rt()) that handle every t distribution calculation you’ll need, from density values to random sampling.
  • Always verify normality assumptions before running t-tests; a Shapiro-Wilk test combined with Q-Q plots takes seconds and prevents misleading conclusions.

Introduction to the T Distribution

The t distribution solves a fundamental problem in statistics: what happens when you don’t know the population standard deviation and have to estimate it from your sample? William Sealy Gosset developed it in 1908 while working at Guinness Brewery, publishing under the pseudonym “Student”—hence the name “Student’s t distribution.”

Use the t distribution when your sample size is small (typically n < 30) or when you’re estimating population variance from sample data. As sample size increases, the t distribution converges to the normal distribution, but for smaller samples, it has heavier tails that account for additional uncertainty.

The key parameter is degrees of freedom (df), which equals n - 1 for a single sample. Lower degrees of freedom produce wider, flatter distributions with more probability in the tails.

# Visual comparison: t distribution vs normal distribution
library(ggplot2)

x <- seq(-4, 4, length.out = 200)

comparison_data <- data.frame(
  x = rep(x, 4),
  density = c(dnorm(x), dt(x, df = 3), dt(x, df = 10), dt(x, df = 30)),
  distribution = rep(c("Normal", "t (df=3)", "t (df=10)", "t (df=30)"), each = 200)
)

ggplot(comparison_data, aes(x = x, y = density, color = distribution)) +
  geom_line(linewidth = 1) +
  labs(title = "T Distribution vs Normal Distribution",
       x = "Value", y = "Density") +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Notice how the t distribution with df = 3 has substantially heavier tails than the normal. By df = 30, the difference becomes negligible.

Core T Distribution Functions in R

R provides four functions that follow a consistent naming convention across all probability distributions:

Function Purpose Returns
dt(x, df) Density Height of PDF at x
pt(q, df) Probability Cumulative probability P(T ≤ q)
qt(p, df) Quantile Value where P(T ≤ value) = p
rt(n, df) Random n random samples from t distribution
# dt(): Density at a specific point
dt(0, df = 10)  # Density at t = 0 with 10 degrees of freedom
# [1] 0.3891084

# pt(): Cumulative probability (area to the left)
pt(2, df = 10)  # P(T ≤ 2) with df = 10
# [1] 0.9633048

# qt(): Find the t-value for a given probability
qt(0.975, df = 10)  # 97.5th percentile with df = 10
# [1] 2.228139

# rt(): Generate random samples
set.seed(42)
rt(5, df = 10)  # 5 random values from t distribution
# [1] -0.5013 0.1315 -0.2138 0.6342 1.8073

The ncp parameter allows for non-central t distributions, but you’ll rarely need it in practice.

Calculating Probabilities and Quantiles

Most practical work involves pt() and qt(). Understanding tail probabilities is essential for hypothesis testing.

# Left-tail probability: P(T < 2.5) with df = 10
pt(2.5, df = 10)
# [1] 0.9843426

# Right-tail probability: P(T > 2.5)
pt(2.5, df = 10, lower.tail = FALSE)
# Or equivalently:
1 - pt(2.5, df = 10)
# [1] 0.01565741

# Two-tailed probability: P(|T| > 2.5)
2 * pt(-2.5, df = 10)
# [1] 0.03131483

# Probability between two values: P(-1.5 < T < 1.5)
pt(1.5, df = 10) - pt(-1.5, df = 10)
# [1] 0.8354138

For confidence intervals, you need critical values from qt():

# Critical t-value for 95% CI with df = 10
# We need the value where 2.5% is in each tail
alpha <- 0.05
df <- 10

t_critical <- qt(1 - alpha/2, df = df)
t_critical
# [1] 2.228139

# For a 99% CI
qt(0.995, df = 10)
# [1] 3.169273

One-Sample and Two-Sample T-Tests

The t.test() function handles all common t-test scenarios. Let’s work through each type with real data.

# Create sample data
set.seed(123)
group_a <- c(23.5, 25.1, 22.8, 24.9, 26.3, 21.7, 24.2, 25.8, 23.1, 24.6)
group_b <- c(21.2, 22.8, 20.5, 23.1, 21.9, 22.4, 20.8, 23.5, 21.1, 22.0)

# One-sample t-test: Is the mean of group_a different from 24?
one_sample <- t.test(group_a, mu = 24)
one_sample

#         One Sample t-test
# 
# data:  group_a
# t = 0.26667, df = 9, p-value = 0.7957
# alternative hypothesis: true mean is not equal to 24
# 95 percent confidence interval:
#  23.05845 25.34155
# sample estimates:
# mean of x 
#      24.2

# Independent two-sample t-test
two_sample <- t.test(group_a, group_b)
two_sample

#         Welch Two Sample t-test
# 
# data:  group_a and group_b
# t = 4.4721, df = 17.456, p-value = 0.0003294
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  1.117847 3.082153
# sample estimates:
# mean of x mean of y 
#      24.2      21.9

# Paired t-test (before/after measurements)
before <- c(180, 175, 190, 185, 170, 195, 188, 172, 183, 178)
after <- c(175, 170, 182, 180, 168, 188, 182, 170, 178, 173)

paired <- t.test(before, after, paired = TRUE)
paired

#         Paired t-test
# 
# data:  before and after
# t = 5.2915, df = 9, p-value = 0.0004937
# alternative hypothesis: true mean difference is not equal to 0
# 95 percent confidence interval:
#  3.348842 8.251158
# sample estimates:
# mean difference 
#             5.8

The Welch t-test (default for two samples) doesn’t assume equal variances. Use var.equal = TRUE only when you’ve verified equal variances.

Visualizing the T Distribution

Publication-ready visualizations require ggplot2. Here’s how to create a t distribution plot with shaded rejection regions:

library(ggplot2)

df <- 15
alpha <- 0.05
t_crit <- qt(1 - alpha/2, df)

x <- seq(-4, 4, length.out = 300)
y <- dt(x, df)

plot_data <- data.frame(x = x, y = y)

ggplot(plot_data, aes(x = x, y = y)) +
  geom_line(linewidth = 1, color = "black") +
  geom_area(data = subset(plot_data, x <= -t_crit),
            aes(x = x, y = y), fill = "firebrick", alpha = 0.6) +
  geom_area(data = subset(plot_data, x >= t_crit),
            aes(x = x, y = y), fill = "firebrick", alpha = 0.6) +
  geom_vline(xintercept = c(-t_crit, t_crit), 
             linetype = "dashed", color = "firebrick") +
  annotate("text", x = 0, y = 0.2, 
           label = paste0("Acceptance Region\n(1 - α = ", 1 - alpha, ")"),
           size = 4) +
  annotate("text", x = -3, y = 0.05, 
           label = paste0("α/2 = ", alpha/2), color = "firebrick") +
  annotate("text", x = 3, y = 0.05, 
           label = paste0("α/2 = ", alpha/2), color = "firebrick") +
  labs(title = paste0("T Distribution (df = ", df, ") with α = ", alpha),
       subtitle = paste0("Critical values: ±", round(t_crit, 3)),
       x = "t", y = "Density") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Confidence Intervals Using the T Distribution

Understanding the manual calculation helps you grasp what t.test() does internally:

# Sample data
data <- c(12.5, 14.2, 11.8, 13.9, 15.1, 12.7, 14.5, 13.2)

# Manual calculation
n <- length(data)
sample_mean <- mean(data)
sample_sd <- sd(data)
se <- sample_sd / sqrt(n)
df <- n - 1

alpha <- 0.05
t_crit <- qt(1 - alpha/2, df)

ci_lower <- sample_mean - t_crit * se
ci_upper <- sample_mean + t_crit * se

cat("Manual 95% CI: [", round(ci_lower, 3), ", ", round(ci_upper, 3), "]\n")
# Manual 95% CI: [ 12.437 , 14.538 ]

# Extract from t.test object
test_result <- t.test(data)
test_result$conf.int
# [1] 12.43659 14.53841
# attr(,"conf.level")
# [1] 0.95

# Access specific components
test_result$estimate    # Sample mean
test_result$statistic   # t-statistic
test_result$parameter   # Degrees of freedom
test_result$p.value     # P-value

Use t-based intervals instead of z-based when: (1) sample size is small (n < 30), (2) population standard deviation is unknown, or (3) you want to be conservative. In practice, always use t-based intervals unless you genuinely know the population standard deviation.

Practical Applications and Best Practices

T-tests assume your data comes from a normally distributed population. For small samples, this matters. Here’s how to check:

# Sample data
data <- c(23.1, 25.4, 22.8, 24.9, 26.1, 21.5, 24.3, 25.7, 23.4, 24.8,
          22.9, 25.2, 23.8, 24.1, 25.5)

# Shapiro-Wilk normality test
shapiro.test(data)
# 
#         Shapiro-Wilk normality test
# 
# data:  data
# W = 0.97234, p-value = 0.8869

# p > 0.05 suggests data is consistent with normality

# Q-Q plot for visual assessment
ggplot(data.frame(sample = data), aes(sample = sample)) +
  stat_qq() +
  stat_qq_line(color = "steelblue", linewidth = 1) +
  labs(title = "Q-Q Plot for Normality Assessment",
       x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal()

# Combined diagnostic function
check_normality <- function(x, alpha = 0.05) {
  sw_test <- shapiro.test(x)
  
  cat("Shapiro-Wilk Test:\n")
  cat("  W =", round(sw_test$statistic, 4), "\n")
  cat("  p-value =", round(sw_test$p.value, 4), "\n")
  cat("  Conclusion:", 
      ifelse(sw_test$p.value > alpha, 
             "Normality assumption reasonable", 
             "Normality assumption questionable"), "\n")
  
  invisible(sw_test)
}

check_normality(data)

Common pitfalls to avoid:

  1. Ignoring sample size context: Shapiro-Wilk becomes overly sensitive with large samples, rejecting normality for trivial deviations. With n > 50, rely more on Q-Q plots.

  2. Using the wrong test variant: For unequal variances, Welch’s t-test (the default) is robust. Don’t blindly set var.equal = TRUE.

  3. Multiple comparisons: Running multiple t-tests inflates Type I error. Use ANOVA or adjust p-values with p.adjust().

  4. Confusing one-tailed and two-tailed: Specify alternative = "greater" or alternative = "less" only when you have a directional hypothesis established before seeing the data.

# Checking equal variance assumption for two-sample test
var.test(group_a, group_b)

# If p > 0.05, equal variance assumption is reasonable
# But Welch's test is generally preferred regardless

The t distribution remains foundational for statistical inference. Master these functions, understand the assumptions, and you’ll handle most small-sample inference problems with confidence.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.