R - t-test with Examples

• The t-test determines whether means of two groups differ significantly, with three variants: one-sample (comparing to a known value), two-sample (independent groups), and paired (dependent...

Key Insights

• The t-test determines whether means of two groups differ significantly, with three variants: one-sample (comparing to a known value), two-sample (independent groups), and paired (dependent observations) • R’s t.test() function handles all t-test variants with automatic assumption checking, though Welch’s t-test (unequal variances) is the default for two-sample comparisons • Effect size measures like Cohen’s d complement p-values by quantifying practical significance, while assumption validation through normality tests and variance checks ensures valid results

Understanding t-test Fundamentals

The t-test evaluates whether observed differences between means are statistically significant or likely due to random chance. The test statistic follows a t-distribution, which accounts for sample size through degrees of freedom. Small samples have wider distributions, requiring larger differences to achieve significance.

Three primary variants exist: one-sample t-tests compare a sample mean against a hypothesized population value, independent two-sample t-tests compare means from unrelated groups, and paired t-tests analyze differences within matched observations.

One-Sample t-test

One-sample t-tests determine if a sample mean differs significantly from a specified value. This applies when testing whether your data deviates from a known standard or theoretical expectation.

# Sample data: reaction times in milliseconds
reaction_times <- c(245, 258, 231, 267, 243, 255, 239, 261, 248, 252)

# Test if mean differs from expected 250ms
result <- t.test(reaction_times, mu = 250)
print(result)
One Sample t-test

data:  reaction_times
t = -0.28571, df = 9, p-value = 0.7814
alternative hypothesis: true mean is not equal to 250
95 percent confidence interval:
 241.3581 258.4419
sample estimates:
mean of x 
    249.9

The p-value of 0.78 indicates no significant difference from 250ms. The confidence interval contains 250, confirming this conclusion.

# One-sided test: is mean less than 250?
result_less <- t.test(reaction_times, mu = 250, alternative = "less")
print(result_less$p.value)
# [1] 0.3907

# One-sided test: is mean greater than 250?
result_greater <- t.test(reaction_times, mu = 250, alternative = "greater")
print(result_greater$p.value)
# [1] 0.6093

Independent Two-Sample t-test

Independent samples t-tests compare means from two unrelated groups. R defaults to Welch’s t-test, which doesn’t assume equal variances—a safer choice for real-world data.

# Drug trial data: blood pressure reduction
control <- c(5, 8, 6, 9, 7, 10, 6, 8, 7, 9)
treatment <- c(12, 15, 13, 16, 14, 17, 13, 15, 14, 16)

# Welch's t-test (default, unequal variances assumed)
welch_result <- t.test(treatment, control)
print(welch_result)
Welch Two Sample t-test

data:  treatment and control
t = 10.954, df = 18, p-value = 1.793e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 5.961429 8.838571
sample estimates:
mean of x mean of y 
     14.5       7.1

The highly significant p-value (< 0.001) indicates the treatment group has substantially higher blood pressure reduction.

# Student's t-test (assumes equal variances)
student_result <- t.test(treatment, control, var.equal = TRUE)
print(student_result)

# Compare variances with F-test
var.test(treatment, control)
F test to compare two variances

data:  treatment and control
F = 1.2632, num df = 9, denom df = 9, p-value = 0.7579
alternative hypothesis: true ratio of variances is not equal to 1

The F-test p-value > 0.05 suggests variances are similar, but Welch’s test remains the conservative default.

Paired t-test

Paired t-tests analyze dependent samples where observations are naturally matched—before/after measurements, twin studies, or repeated measures on the same subjects.

# Weight loss program: before and after weights (kg)
before <- c(85, 92, 78, 95, 88, 91, 83, 87, 90, 86)
after <- c(82, 89, 76, 91, 85, 88, 81, 84, 87, 83)

# Paired t-test
paired_result <- t.test(before, after, paired = TRUE)
print(paired_result)
Paired t-test

data:  before and after
t = 6.7082, df = 9, p-value = 7.24e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 2.028353 4.171647
sample estimates:
mean difference 
            3.1

The significant result confirms weight loss. The paired approach accounts for individual baseline differences, increasing statistical power.

# Visualize paired differences
differences <- before - after
hist(differences, main = "Weight Loss Distribution", 
     xlab = "Weight Change (kg)", col = "lightblue")
abline(v = mean(differences), col = "red", lwd = 2)

Assumption Validation

T-tests assume approximately normal distributions. For small samples, check normality; for large samples (n > 30), the Central Limit Theorem provides robustness.

# Shapiro-Wilk normality test
shapiro.test(reaction_times)
Shapiro-Wilk normality test

data:  reaction_times
W = 0.95455, p-value = 0.7196

A p-value > 0.05 suggests normality. Visual checks complement formal tests:

# Q-Q plot for normality assessment
qqnorm(reaction_times)
qqline(reaction_times, col = "red")

# For paired tests, check differences
shapiro.test(differences)

When normality fails, consider the Wilcoxon test (non-parametric alternative):

# Non-parametric alternative to paired t-test
wilcox.test(before, after, paired = TRUE)

Effect Size Calculation

P-values indicate statistical significance but not practical importance. Cohen’s d quantifies effect size: small (0.2), medium (0.5), or large (0.8).

# Cohen's d for independent samples
cohens_d <- function(x, y) {
  mean_diff <- mean(x) - mean(y)
  pooled_sd <- sqrt(((length(x) - 1) * var(x) + (length(y) - 1) * var(y)) / 
                    (length(x) + length(y) - 2))
  return(mean_diff / pooled_sd)
}

d_value <- cohens_d(treatment, control)
print(paste("Cohen's d:", round(d_value, 3)))
# [1] "Cohen's d: 4.897"

This large effect size confirms substantial practical significance beyond statistical significance.

# Using effsize package for comprehensive effect size metrics
library(effsize)
cohen.d(treatment, control)

Practical Application: A/B Testing

A complete workflow for comparing conversion rates between website variants:

# Conversion times (seconds) for two landing page designs
design_a <- c(12, 15, 11, 18, 13, 16, 14, 17, 12, 15, 13, 16, 14, 11, 15)
design_b <- c(9, 11, 8, 12, 10, 13, 9, 11, 10, 12, 8, 11, 9, 10, 12)

# Check assumptions
shapiro.test(design_a)  # p = 0.8234
shapiro.test(design_b)  # p = 0.7891
var.test(design_a, design_b)  # p = 0.4523

# Perform t-test
ab_result <- t.test(design_a, design_b)

# Extract key metrics
cat(sprintf("Mean A: %.2f, Mean B: %.2f\n", 
            mean(design_a), mean(design_b)))
cat(sprintf("Difference: %.2f (95%% CI: %.2f to %.2f)\n",
            diff(ab_result$estimate),
            ab_result$conf.int[1],
            ab_result$conf.int[2]))
cat(sprintf("p-value: %.4f\n", ab_result$p.value))
cat(sprintf("Cohen's d: %.3f\n", cohens_d(design_a, design_b)))
Mean A: 14.13, Mean B: 10.33
Difference: 3.80 (95% CI: 2.68 to 4.92)
p-value: 0.0000
Cohen's d: 2.456

Design B significantly reduces conversion time with a large effect size, providing clear evidence for implementation.

Handling Multiple Comparisons

When performing multiple t-tests, adjust p-values to control family-wise error rate:

# Three group comparison
group1 <- rnorm(20, mean = 100, sd = 15)
group2 <- rnorm(20, mean = 110, sd = 15)
group3 <- rnorm(20, mean = 105, sd = 15)

# Pairwise comparisons
p_vals <- c(
  t.test(group1, group2)$p.value,
  t.test(group1, group3)$p.value,
  t.test(group2, group3)$p.value
)

# Bonferroni correction
p.adjust(p_vals, method = "bonferroni")

# Holm correction (less conservative)
p.adjust(p_vals, method = "holm")

The t-test remains a foundational tool for statistical inference. Understanding its variants, validating assumptions, and complementing p-values with effect sizes enables robust data-driven decisions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.