R - t-test with Examples
• The t-test determines whether means of two groups differ significantly, with three variants: one-sample (comparing to a known value), two-sample (independent groups), and paired (dependent...
Key Insights
• The t-test determines whether means of two groups differ significantly, with three variants: one-sample (comparing to a known value), two-sample (independent groups), and paired (dependent observations)
• R’s t.test() function handles all t-test variants with automatic assumption checking, though Welch’s t-test (unequal variances) is the default for two-sample comparisons
• Effect size measures like Cohen’s d complement p-values by quantifying practical significance, while assumption validation through normality tests and variance checks ensures valid results
Understanding t-test Fundamentals
The t-test evaluates whether observed differences between means are statistically significant or likely due to random chance. The test statistic follows a t-distribution, which accounts for sample size through degrees of freedom. Small samples have wider distributions, requiring larger differences to achieve significance.
Three primary variants exist: one-sample t-tests compare a sample mean against a hypothesized population value, independent two-sample t-tests compare means from unrelated groups, and paired t-tests analyze differences within matched observations.
One-Sample t-test
One-sample t-tests determine if a sample mean differs significantly from a specified value. This applies when testing whether your data deviates from a known standard or theoretical expectation.
# Sample data: reaction times in milliseconds
reaction_times <- c(245, 258, 231, 267, 243, 255, 239, 261, 248, 252)
# Test if mean differs from expected 250ms
result <- t.test(reaction_times, mu = 250)
print(result)
One Sample t-test
data: reaction_times
t = -0.28571, df = 9, p-value = 0.7814
alternative hypothesis: true mean is not equal to 250
95 percent confidence interval:
241.3581 258.4419
sample estimates:
mean of x
249.9
The p-value of 0.78 indicates no significant difference from 250ms. The confidence interval contains 250, confirming this conclusion.
# One-sided test: is mean less than 250?
result_less <- t.test(reaction_times, mu = 250, alternative = "less")
print(result_less$p.value)
# [1] 0.3907
# One-sided test: is mean greater than 250?
result_greater <- t.test(reaction_times, mu = 250, alternative = "greater")
print(result_greater$p.value)
# [1] 0.6093
Independent Two-Sample t-test
Independent samples t-tests compare means from two unrelated groups. R defaults to Welch’s t-test, which doesn’t assume equal variances—a safer choice for real-world data.
# Drug trial data: blood pressure reduction
control <- c(5, 8, 6, 9, 7, 10, 6, 8, 7, 9)
treatment <- c(12, 15, 13, 16, 14, 17, 13, 15, 14, 16)
# Welch's t-test (default, unequal variances assumed)
welch_result <- t.test(treatment, control)
print(welch_result)
Welch Two Sample t-test
data: treatment and control
t = 10.954, df = 18, p-value = 1.793e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
5.961429 8.838571
sample estimates:
mean of x mean of y
14.5 7.1
The highly significant p-value (< 0.001) indicates the treatment group has substantially higher blood pressure reduction.
# Student's t-test (assumes equal variances)
student_result <- t.test(treatment, control, var.equal = TRUE)
print(student_result)
# Compare variances with F-test
var.test(treatment, control)
F test to compare two variances
data: treatment and control
F = 1.2632, num df = 9, denom df = 9, p-value = 0.7579
alternative hypothesis: true ratio of variances is not equal to 1
The F-test p-value > 0.05 suggests variances are similar, but Welch’s test remains the conservative default.
Paired t-test
Paired t-tests analyze dependent samples where observations are naturally matched—before/after measurements, twin studies, or repeated measures on the same subjects.
# Weight loss program: before and after weights (kg)
before <- c(85, 92, 78, 95, 88, 91, 83, 87, 90, 86)
after <- c(82, 89, 76, 91, 85, 88, 81, 84, 87, 83)
# Paired t-test
paired_result <- t.test(before, after, paired = TRUE)
print(paired_result)
Paired t-test
data: before and after
t = 6.7082, df = 9, p-value = 7.24e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
2.028353 4.171647
sample estimates:
mean difference
3.1
The significant result confirms weight loss. The paired approach accounts for individual baseline differences, increasing statistical power.
# Visualize paired differences
differences <- before - after
hist(differences, main = "Weight Loss Distribution",
xlab = "Weight Change (kg)", col = "lightblue")
abline(v = mean(differences), col = "red", lwd = 2)
Assumption Validation
T-tests assume approximately normal distributions. For small samples, check normality; for large samples (n > 30), the Central Limit Theorem provides robustness.
# Shapiro-Wilk normality test
shapiro.test(reaction_times)
Shapiro-Wilk normality test
data: reaction_times
W = 0.95455, p-value = 0.7196
A p-value > 0.05 suggests normality. Visual checks complement formal tests:
# Q-Q plot for normality assessment
qqnorm(reaction_times)
qqline(reaction_times, col = "red")
# For paired tests, check differences
shapiro.test(differences)
When normality fails, consider the Wilcoxon test (non-parametric alternative):
# Non-parametric alternative to paired t-test
wilcox.test(before, after, paired = TRUE)
Effect Size Calculation
P-values indicate statistical significance but not practical importance. Cohen’s d quantifies effect size: small (0.2), medium (0.5), or large (0.8).
# Cohen's d for independent samples
cohens_d <- function(x, y) {
mean_diff <- mean(x) - mean(y)
pooled_sd <- sqrt(((length(x) - 1) * var(x) + (length(y) - 1) * var(y)) /
(length(x) + length(y) - 2))
return(mean_diff / pooled_sd)
}
d_value <- cohens_d(treatment, control)
print(paste("Cohen's d:", round(d_value, 3)))
# [1] "Cohen's d: 4.897"
This large effect size confirms substantial practical significance beyond statistical significance.
# Using effsize package for comprehensive effect size metrics
library(effsize)
cohen.d(treatment, control)
Practical Application: A/B Testing
A complete workflow for comparing conversion rates between website variants:
# Conversion times (seconds) for two landing page designs
design_a <- c(12, 15, 11, 18, 13, 16, 14, 17, 12, 15, 13, 16, 14, 11, 15)
design_b <- c(9, 11, 8, 12, 10, 13, 9, 11, 10, 12, 8, 11, 9, 10, 12)
# Check assumptions
shapiro.test(design_a) # p = 0.8234
shapiro.test(design_b) # p = 0.7891
var.test(design_a, design_b) # p = 0.4523
# Perform t-test
ab_result <- t.test(design_a, design_b)
# Extract key metrics
cat(sprintf("Mean A: %.2f, Mean B: %.2f\n",
mean(design_a), mean(design_b)))
cat(sprintf("Difference: %.2f (95%% CI: %.2f to %.2f)\n",
diff(ab_result$estimate),
ab_result$conf.int[1],
ab_result$conf.int[2]))
cat(sprintf("p-value: %.4f\n", ab_result$p.value))
cat(sprintf("Cohen's d: %.3f\n", cohens_d(design_a, design_b)))
Mean A: 14.13, Mean B: 10.33
Difference: 3.80 (95% CI: 2.68 to 4.92)
p-value: 0.0000
Cohen's d: 2.456
Design B significantly reduces conversion time with a large effect size, providing clear evidence for implementation.
Handling Multiple Comparisons
When performing multiple t-tests, adjust p-values to control family-wise error rate:
# Three group comparison
group1 <- rnorm(20, mean = 100, sd = 15)
group2 <- rnorm(20, mean = 110, sd = 15)
group3 <- rnorm(20, mean = 105, sd = 15)
# Pairwise comparisons
p_vals <- c(
t.test(group1, group2)$p.value,
t.test(group1, group3)$p.value,
t.test(group2, group3)$p.value
)
# Bonferroni correction
p.adjust(p_vals, method = "bonferroni")
# Holm correction (less conservative)
p.adjust(p_vals, method = "holm")
The t-test remains a foundational tool for statistical inference. Understanding its variants, validating assumptions, and complementing p-values with effect sizes enables robust data-driven decisions.