R - Hypothesis Testing Basics
Hypothesis testing follows a structured approach: formulate a null hypothesis (H0) representing no effect or difference, define an alternative hypothesis (H1), collect data, calculate a test...
Key Insights
- Hypothesis testing in R provides statistical evidence to make data-driven decisions, with common tests including t-tests, chi-square tests, and ANOVA accessible through base R functions
- Understanding p-values, significance levels, and test assumptions is critical—a p-value below your alpha threshold (typically 0.05) suggests rejecting the null hypothesis
- R’s built-in functions like
t.test(),chisq.test(), andaov()handle the mathematical complexity, but interpreting results correctly requires understanding the underlying statistical framework
Understanding the Hypothesis Testing Framework
Hypothesis testing follows a structured approach: formulate a null hypothesis (H0) representing no effect or difference, define an alternative hypothesis (H1), collect data, calculate a test statistic, and determine whether to reject H0 based on probability.
The p-value represents the probability of observing your data (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) suggests your data is unlikely under H0, providing evidence to reject it.
# Set up reproducible example
set.seed(123)
# Generate two samples
group_a <- rnorm(30, mean = 100, sd = 15)
group_b <- rnorm(30, mean = 105, sd = 15)
# Basic structure of hypothesis testing
# H0: mean(group_a) = mean(group_b)
# H1: mean(group_a) ≠ mean(group_b)
# Alpha level: 0.05
One-Sample t-Test
Use a one-sample t-test when comparing a sample mean against a known population value. This tests whether your sample likely comes from a population with a specific mean.
# Test if group_a has mean of 100
result <- t.test(group_a, mu = 100)
print(result)
# Extract key components
cat("Test statistic:", result$statistic, "\n")
cat("P-value:", result$p.value, "\n")
cat("95% Confidence Interval:", result$conf.int, "\n")
# Interpretation function
interpret_result <- function(p_value, alpha = 0.05) {
if (p_value < alpha) {
return("Reject null hypothesis - significant difference detected")
} else {
return("Fail to reject null hypothesis - insufficient evidence")
}
}
cat(interpret_result(result$p.value))
The confidence interval provides a range where the true population mean likely falls. If your hypothesized value (mu) falls outside this interval, you’ll reject H0.
Two-Sample t-Test
Two-sample t-tests compare means between two independent groups. R defaults to Welch’s t-test, which doesn’t assume equal variances.
# Independent samples t-test
independent_test <- t.test(group_a, group_b)
print(independent_test)
# Traditional Student's t-test (assumes equal variance)
student_test <- t.test(group_a, group_b, var.equal = TRUE)
# Compare results
data.frame(
Method = c("Welch", "Student"),
P_Value = c(independent_test$p.value, student_test$p.value),
Statistic = c(independent_test$statistic, student_test$statistic)
)
# Check variance assumption
var.test(group_a, group_b) # F-test for equal variances
For paired data (before/after measurements, matched samples), use paired = TRUE:
# Paired samples
before <- c(120, 135, 128, 140, 132, 125, 138, 142, 130, 136)
after <- c(115, 130, 122, 135, 128, 120, 132, 138, 125, 130)
paired_test <- t.test(before, after, paired = TRUE)
print(paired_test)
# Equivalent to one-sample test on differences
differences <- before - after
t.test(differences, mu = 0)
Chi-Square Tests
Chi-square tests evaluate relationships between categorical variables. The test compares observed frequencies against expected frequencies under independence.
# Create contingency table
treatment <- factor(rep(c("A", "B"), each = 50))
outcome <- factor(c(
rep(c("Success", "Failure"), c(35, 15)), # Treatment A
rep(c("Success", "Failure"), c(25, 25)) # Treatment B
))
contingency_table <- table(treatment, outcome)
print(contingency_table)
# Chi-square test for independence
chi_result <- chisq.test(contingency_table)
print(chi_result)
# View expected frequencies
print(chi_result$expected)
# Calculate effect size (Cramér's V)
cramers_v <- sqrt(chi_result$statistic / sum(contingency_table))
cat("Cramér's V:", cramers_v, "\n")
Check the assumption that expected frequencies are at least 5. If violated, consider Fisher’s exact test:
# Fisher's exact test (for small samples)
fisher.test(contingency_table)
Analysis of Variance (ANOVA)
ANOVA tests whether means differ across three or more groups. It’s an extension of the t-test for multiple groups.
# Generate data for three groups
set.seed(456)
values <- c(
rnorm(20, mean = 100, sd = 10),
rnorm(20, mean = 105, sd = 10),
rnorm(20, mean = 110, sd = 10)
)
groups <- factor(rep(c("Control", "Treatment1", "Treatment2"), each = 20))
# Create data frame
df <- data.frame(values, groups)
# Perform ANOVA
anova_result <- aov(values ~ groups, data = df)
summary(anova_result)
# Post-hoc pairwise comparisons
TukeyHSD(anova_result)
ANOVA assumes normality and equal variances. Test these assumptions:
# Test normality of residuals
shapiro.test(residuals(anova_result))
# Test homogeneity of variance
bartlett.test(values ~ groups, data = df)
# Visual diagnostics
par(mfrow = c(2, 2))
plot(anova_result)
par(mfrow = c(1, 1))
If assumptions fail, consider the non-parametric Kruskal-Wallis test:
kruskal.test(values ~ groups, data = df)
Non-Parametric Alternatives
When data violates normality assumptions or is ordinal, use non-parametric tests.
# Wilcoxon rank-sum test (Mann-Whitney U)
# Alternative to independent samples t-test
wilcox.test(group_a, group_b)
# Wilcoxon signed-rank test
# Alternative to paired t-test
wilcox.test(before, after, paired = TRUE)
# Example with clearly non-normal data
skewed_a <- rexp(30, rate = 0.1)
skewed_b <- rexp(30, rate = 0.15)
# Compare parametric vs non-parametric
t.test(skewed_a, skewed_b)
wilcox.test(skewed_a, skewed_b)
Power Analysis and Sample Size
Determine required sample size before collecting data to ensure adequate statistical power (typically 80%).
# Install if needed: install.packages("pwr")
library(pwr)
# Power analysis for t-test
# Effect size: Cohen's d = (mean1 - mean2) / pooled_sd
# Small = 0.2, Medium = 0.5, Large = 0.8
power_result <- pwr.t.test(
d = 0.5, # medium effect size
sig.level = 0.05, # alpha
power = 0.80, # desired power
type = "two.sample"
)
print(power_result)
# Calculate power for existing study
pwr.t.test(
n = 30,
d = 0.5,
sig.level = 0.05,
type = "two.sample"
)
# Chi-square power analysis
pwr.chisq.test(
w = 0.3, # effect size
df = 1, # degrees of freedom
sig.level = 0.05,
power = 0.80
)
Practical Workflow
Combine these concepts into a systematic testing workflow:
hypothesis_test_workflow <- function(data, group_var, value_var, alpha = 0.05) {
# Check normality
groups <- unique(data[[group_var]])
normality_ok <- all(sapply(groups, function(g) {
subset_data <- data[data[[group_var]] == g, value_var]
shapiro.test(subset_data)$p.value > 0.05
}))
# Choose appropriate test
if (length(groups) == 2) {
if (normality_ok) {
test <- t.test(as.formula(paste(value_var, "~", group_var)), data = data)
method <- "t-test"
} else {
test <- wilcox.test(as.formula(paste(value_var, "~", group_var)), data = data)
method <- "Wilcoxon test"
}
} else {
if (normality_ok) {
test <- aov(as.formula(paste(value_var, "~", group_var)), data = data)
method <- "ANOVA"
} else {
test <- kruskal.test(as.formula(paste(value_var, "~", group_var)), data = data)
method <- "Kruskal-Wallis test"
}
}
list(method = method, result = test, normality_ok = normality_ok)
}
# Example usage
result <- hypothesis_test_workflow(df, "groups", "values")
print(result$method)
print(result$result)
This workflow automates test selection based on data characteristics, ensuring appropriate statistical methods for your analysis context.