T-Test in R: Step-by-Step Guide

T-tests answer a straightforward question: is the difference between means statistically significant, or could it have occurred by chance? Despite their simplicity, t-tests remain among the most...

Key Insights

  • The t.test() function in R handles all three t-test variants (one-sample, two-sample, paired) with a single, flexible interface—master its arguments and you’ll cover 90% of mean-comparison scenarios.
  • R defaults to Welch’s t-test for two-sample comparisons, which doesn’t assume equal variances and is almost always the safer choice in real-world data analysis.
  • Always verify normality assumptions before interpreting results; when they’re violated, non-parametric alternatives like the Wilcoxon test provide more reliable conclusions.

Introduction to T-Tests

T-tests answer a straightforward question: is the difference between means statistically significant, or could it have occurred by chance? Despite their simplicity, t-tests remain among the most frequently used statistical methods in research and industry.

You’ll encounter three flavors of t-test:

  1. One-sample t-test: Compare a sample mean against a known or hypothesized value
  2. Two-sample t-test (independent): Compare means between two unrelated groups
  3. Paired t-test: Compare means from the same subjects measured twice

Each test operates on the same principle: calculate how many standard errors separate your observed difference from zero (or your hypothesized value). The resulting t-statistic, combined with degrees of freedom, produces a p-value. If that p-value falls below your significance threshold (typically 0.05), you reject the null hypothesis that no real difference exists.

R makes running these tests trivially easy. The harder part is knowing which test to use and whether your data meets the underlying assumptions.

Prerequisites and Setup

R’s built-in stats package includes t.test(), so you don’t need to install anything for the core functionality. For visualization and assumption checking, load ggplot2.

# Load required packages
library(ggplot2)

# Create sample datasets for demonstration
set.seed(42)

# One-sample: reaction times in milliseconds
reaction_times <- rnorm(30, mean = 285, sd = 45)

# Two-sample: test scores from two teaching methods
method_a <- rnorm(25, mean = 72, sd = 10)
method_b <- rnorm(25, mean = 78, sd = 12)

# Paired: blood pressure before and after treatment
bp_before <- rnorm(20, mean = 145, sd = 15)
bp_after <- bp_before - rnorm(20, mean = 12, sd = 8)  # Correlated reduction

# Create a data frame for formula-based syntax
scores_df <- data.frame(
  score = c(method_a, method_b),
  method = factor(rep(c("A", "B"), each = 25))
)

I’m using set.seed(42) to ensure reproducibility. In your own analysis, you’ll replace these simulated vectors with actual data loaded via read.csv() or similar functions.

One-Sample T-Test

Use a one-sample t-test when you want to determine if your sample’s mean differs significantly from a specific value. Perhaps you’re testing whether average response times exceed an industry benchmark, or whether a manufacturing process hits its target specification.

# Test if mean reaction time differs from 250ms benchmark
one_sample_result <- t.test(reaction_times, mu = 250)
print(one_sample_result)
	One Sample t-test

data:  reaction_times
t = 3.8547, df = 29, p-value = 0.0005891
alternative hypothesis: true mean is not equal to 250
95 percent confidence interval:
 267.5823 298.7641
sample estimates:
mean of x 
 283.1732 

Interpreting this output:

  • t = 3.85: The sample mean is 3.85 standard errors above the hypothesized mean of 250
  • df = 29: Degrees of freedom (n - 1)
  • p-value = 0.0006: Strong evidence against the null hypothesis
  • 95% CI [267.58, 298.76]: We’re 95% confident the true population mean falls in this range

Since the p-value is well below 0.05 and the confidence interval doesn’t include 250, we conclude that reaction times significantly exceed the 250ms benchmark.

For one-sided tests, specify the alternative argument:

# Test if reaction times are greater than 250ms
t.test(reaction_times, mu = 250, alternative = "greater")

Two-Sample T-Test (Independent)

The two-sample t-test compares means between independent groups. This is your go-to when analyzing A/B tests, comparing treatment versus control groups, or evaluating differences between demographic segments.

# Compare test scores between teaching methods
two_sample_result <- t.test(method_a, method_b)
print(two_sample_result)
	Welch Two Sample t-test

data:  method_a and method_b
t = -2.0143, df = 46.385, p-value = 0.04987
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.96421654  -0.01115569
sample estimates:
mean of x mean of y 
 71.45632  78.44401 

Notice R performs Welch’s t-test by default. This variant doesn’t assume equal variances between groups and is more robust for real-world data. The fractional degrees of freedom (46.385) result from the Welch-Satterthwaite approximation.

If you have theoretical reasons to assume equal variances (rare in practice), specify var.equal = TRUE:

# Student's t-test assuming equal variances
t.test(method_a, method_b, var.equal = TRUE)

For data in long format (one column for values, one for groups), use formula syntax:

# Formula syntax: response ~ grouping variable
t.test(score ~ method, data = scores_df)

This produces identical results and works naturally with data frames imported from CSV files or databases.

Paired T-Test

Paired t-tests apply when measurements come from the same subjects at different times or under different conditions. The classic example is before-and-after studies: blood pressure before and after medication, performance before and after training, or preferences before and after exposure to information.

The key insight is that paired tests analyze the differences within each pair, not the raw values. This controls for individual variation and typically provides more statistical power than independent comparisons.

# Compare blood pressure before and after treatment
paired_result <- t.test(bp_before, bp_after, paired = TRUE)
print(paired_result)
	Paired t-test

data:  bp_before and bp_after
t = 6.7821, df = 19, p-value = 1.789e-06
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
  8.296541 15.891247
sample estimates:
mean difference 
       12.09389 

The output shows a highly significant reduction in blood pressure (mean difference of 12.09 mmHg, p < 0.001). The 95% confidence interval for the mean difference [8.30, 15.89] doesn’t include zero, confirming the treatment effect.

Critical requirement: vectors must be in the same order, with corresponding pairs aligned by index. Misaligned data will produce meaningless results without any error message.

Checking Assumptions

T-tests assume your data comes from a normally distributed population. For small samples (n < 30), this assumption matters. For larger samples, the Central Limit Theorem provides some protection, but severe departures from normality can still distort results.

The Shapiro-Wilk test formally evaluates normality:

# Test normality of reaction times
shapiro.test(reaction_times)
	Shapiro-Wilk normality test

data:  reaction_times
W = 0.97234, p-value = 0.5987

A p-value above 0.05 (like 0.60 here) suggests the data doesn’t significantly deviate from normality. However, Shapiro-Wilk is sensitive to sample size—large samples often produce significant results even for minor deviations.

Visual inspection provides complementary evidence:

# Create Q-Q plot for normality assessment
ggplot(data.frame(x = reaction_times), aes(sample = x)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(title = "Q-Q Plot: Reaction Times",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal()

# Histogram with normal curve overlay
ggplot(data.frame(x = reaction_times), aes(x = x)) +
  geom_histogram(aes(y = after_stat(density)), bins = 12, 
                 fill = "steelblue", alpha = 0.7) +
  stat_function(fun = dnorm, 
                args = list(mean = mean(reaction_times), 
                           sd = sd(reaction_times)),
                color = "red", linewidth = 1) +
  labs(title = "Distribution of Reaction Times",
       x = "Reaction Time (ms)", y = "Density") +
  theme_minimal()

When normality assumptions fail, switch to non-parametric alternatives:

# Wilcoxon signed-rank test (paired, non-parametric)
wilcox.test(bp_before, bp_after, paired = TRUE)

# Wilcoxon rank-sum test (independent, non-parametric)
wilcox.test(method_a, method_b)

These tests compare medians rather than means and don’t require normality.

Visualizing Results

Statistical significance alone doesn’t tell the full story. Effect sizes and data distributions matter for practical interpretation. Publication-ready visualizations communicate both.

# Boxplot comparing teaching methods
ggplot(scores_df, aes(x = method, y = score, fill = method)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  scale_fill_manual(values = c("A" = "#E69F00", "B" = "#56B4E9")) +
  labs(title = "Test Scores by Teaching Method",
       subtitle = paste("Welch's t-test: p =", 
                       round(two_sample_result$p.value, 3)),
       x = "Teaching Method",
       y = "Test Score") +
  theme_minimal() +
  theme(legend.position = "none")

# Bar chart with error bars for paired data
paired_summary <- data.frame(
  time = factor(c("Before", "After"), levels = c("Before", "After")),
  mean_bp = c(mean(bp_before), mean(bp_after)),
  se = c(sd(bp_before)/sqrt(length(bp_before)), 
         sd(bp_after)/sqrt(length(bp_after)))
)

ggplot(paired_summary, aes(x = time, y = mean_bp, fill = time)) +
  geom_col(alpha = 0.8, width = 0.6) +
  geom_errorbar(aes(ymin = mean_bp - se, ymax = mean_bp + se),
                width = 0.2) +
  scale_fill_manual(values = c("Before" = "#D55E00", "After" = "#009E73")) +
  labs(title = "Blood Pressure Before and After Treatment",
       subtitle = paste("Paired t-test: p <", "0.001"),
       x = "", y = "Mean Blood Pressure (mmHg)") +
  theme_minimal() +
  theme(legend.position = "none")

These visualizations show the raw data alongside summary statistics, giving readers the context they need to evaluate both statistical and practical significance.

T-tests in R require minimal code but demand careful thinking about your data structure, assumptions, and research question. Master the t.test() function’s arguments, verify your assumptions, and always visualize your results. The statistics are straightforward—the interpretation is where expertise matters.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.