R - Hypothesis Testing Basics | Application Architect

Key Insights

Hypothesis testing in R provides statistical evidence to make data-driven decisions, with common tests including t-tests, chi-square tests, and ANOVA accessible through base R functions
Understanding p-values, significance levels, and test assumptions is critical—a p-value below your alpha threshold (typically 0.05) suggests rejecting the null hypothesis
R’s built-in functions like t.test(), chisq.test(), and aov() handle the mathematical complexity, but interpreting results correctly requires understanding the underlying statistical framework

Understanding the Hypothesis Testing Framework

Hypothesis testing follows a structured approach: formulate a null hypothesis (H0) representing no effect or difference, define an alternative hypothesis (H1), collect data, calculate a test statistic, and determine whether to reject H0 based on probability.

The p-value represents the probability of observing your data (or more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) suggests your data is unlikely under H0, providing evidence to reject it.

# Set up reproducible example
set.seed(123)

# Generate two samples
group_a <- rnorm(30, mean = 100, sd = 15)
group_b <- rnorm(30, mean = 105, sd = 15)

# Basic structure of hypothesis testing
# H0: mean(group_a) = mean(group_b)
# H1: mean(group_a) ≠ mean(group_b)
# Alpha level: 0.05

One-Sample t-Test

Use a one-sample t-test when comparing a sample mean against a known population value. This tests whether your sample likely comes from a population with a specific mean.

# Test if group_a has mean of 100
result <- t.test(group_a, mu = 100)
print(result)

# Extract key components
cat("Test statistic:", result$statistic, "\n")
cat("P-value:", result$p.value, "\n")
cat("95% Confidence Interval:", result$conf.int, "\n")

# Interpretation function
interpret_result <- function(p_value, alpha = 0.05) {
  if (p_value < alpha) {
    return("Reject null hypothesis - significant difference detected")
  } else {
    return("Fail to reject null hypothesis - insufficient evidence")
  }
}

cat(interpret_result(result$p.value))

The confidence interval provides a range where the true population mean likely falls. If your hypothesized value (mu) falls outside this interval, you’ll reject H0.

Two-Sample t-Test

Two-sample t-tests compare means between two independent groups. R defaults to Welch’s t-test, which doesn’t assume equal variances.

# Independent samples t-test
independent_test <- t.test(group_a, group_b)
print(independent_test)

# Traditional Student's t-test (assumes equal variance)
student_test <- t.test(group_a, group_b, var.equal = TRUE)

# Compare results
data.frame(
  Method = c("Welch", "Student"),
  P_Value = c(independent_test$p.value, student_test$p.value),
  Statistic = c(independent_test$statistic, student_test$statistic)
)

# Check variance assumption
var.test(group_a, group_b)  # F-test for equal variances

For paired data (before/after measurements, matched samples), use paired = TRUE:

# Paired samples
before <- c(120, 135, 128, 140, 132, 125, 138, 142, 130, 136)
after <- c(115, 130, 122, 135, 128, 120, 132, 138, 125, 130)

paired_test <- t.test(before, after, paired = TRUE)
print(paired_test)

# Equivalent to one-sample test on differences
differences <- before - after
t.test(differences, mu = 0)

Chi-Square Tests

Chi-square tests evaluate relationships between categorical variables. The test compares observed frequencies against expected frequencies under independence.

# Create contingency table
treatment <- factor(rep(c("A", "B"), each = 50))
outcome <- factor(c(
  rep(c("Success", "Failure"), c(35, 15)),  # Treatment A
  rep(c("Success", "Failure"), c(25, 25))   # Treatment B
))

contingency_table <- table(treatment, outcome)
print(contingency_table)

# Chi-square test for independence
chi_result <- chisq.test(contingency_table)
print(chi_result)

# View expected frequencies
print(chi_result$expected)

# Calculate effect size (Cramér's V)
cramers_v <- sqrt(chi_result$statistic / sum(contingency_table))
cat("Cramér's V:", cramers_v, "\n")

Check the assumption that expected frequencies are at least 5. If violated, consider Fisher’s exact test:

# Fisher's exact test (for small samples)
fisher.test(contingency_table)

Analysis of Variance (ANOVA)

ANOVA tests whether means differ across three or more groups. It’s an extension of the t-test for multiple groups.

# Generate data for three groups
set.seed(456)
values <- c(
  rnorm(20, mean = 100, sd = 10),
  rnorm(20, mean = 105, sd = 10),
  rnorm(20, mean = 110, sd = 10)
)
groups <- factor(rep(c("Control", "Treatment1", "Treatment2"), each = 20))

# Create data frame
df <- data.frame(values, groups)

# Perform ANOVA
anova_result <- aov(values ~ groups, data = df)
summary(anova_result)

# Post-hoc pairwise comparisons
TukeyHSD(anova_result)

ANOVA assumes normality and equal variances. Test these assumptions:

# Test normality of residuals
shapiro.test(residuals(anova_result))

# Test homogeneity of variance
bartlett.test(values ~ groups, data = df)

# Visual diagnostics
par(mfrow = c(2, 2))
plot(anova_result)
par(mfrow = c(1, 1))

If assumptions fail, consider the non-parametric Kruskal-Wallis test:

kruskal.test(values ~ groups, data = df)

Non-Parametric Alternatives

When data violates normality assumptions or is ordinal, use non-parametric tests.

# Wilcoxon rank-sum test (Mann-Whitney U)
# Alternative to independent samples t-test
wilcox.test(group_a, group_b)

# Wilcoxon signed-rank test
# Alternative to paired t-test
wilcox.test(before, after, paired = TRUE)

# Example with clearly non-normal data
skewed_a <- rexp(30, rate = 0.1)
skewed_b <- rexp(30, rate = 0.15)

# Compare parametric vs non-parametric
t.test(skewed_a, skewed_b)
wilcox.test(skewed_a, skewed_b)

Power Analysis and Sample Size

Determine required sample size before collecting data to ensure adequate statistical power (typically 80%).

# Install if needed: install.packages("pwr")
library(pwr)

# Power analysis for t-test
# Effect size: Cohen's d = (mean1 - mean2) / pooled_sd
# Small = 0.2, Medium = 0.5, Large = 0.8

power_result <- pwr.t.test(
  d = 0.5,           # medium effect size
  sig.level = 0.05,  # alpha
  power = 0.80,      # desired power
  type = "two.sample"
)
print(power_result)

# Calculate power for existing study
pwr.t.test(
  n = 30,
  d = 0.5,
  sig.level = 0.05,
  type = "two.sample"
)

# Chi-square power analysis
pwr.chisq.test(
  w = 0.3,           # effect size
  df = 1,            # degrees of freedom
  sig.level = 0.05,
  power = 0.80
)

Practical Workflow

Combine these concepts into a systematic testing workflow:

hypothesis_test_workflow <- function(data, group_var, value_var, alpha = 0.05) {
  # Check normality
  groups <- unique(data[[group_var]])
  normality_ok <- all(sapply(groups, function(g) {
    subset_data <- data[data[[group_var]] == g, value_var]
    shapiro.test(subset_data)$p.value > 0.05
  }))
  
  # Choose appropriate test
  if (length(groups) == 2) {
    if (normality_ok) {
      test <- t.test(as.formula(paste(value_var, "~", group_var)), data = data)
      method <- "t-test"
    } else {
      test <- wilcox.test(as.formula(paste(value_var, "~", group_var)), data = data)
      method <- "Wilcoxon test"
    }
  } else {
    if (normality_ok) {
      test <- aov(as.formula(paste(value_var, "~", group_var)), data = data)
      method <- "ANOVA"
    } else {
      test <- kruskal.test(as.formula(paste(value_var, "~", group_var)), data = data)
      method <- "Kruskal-Wallis test"
    }
  }
  
  list(method = method, result = test, normality_ok = normality_ok)
}

# Example usage
result <- hypothesis_test_workflow(df, "groups", "values")
print(result$method)
print(result$result)

This workflow automates test selection based on data characteristics, ensuring appropriate statistical methods for your analysis context.