R - Chi-Square Test

• Chi-square tests evaluate relationships between categorical variables, with the test of independence being most common for analyzing contingency tables and the goodness-of-fit test validating...

Key Insights

• Chi-square tests evaluate relationships between categorical variables, with the test of independence being most common for analyzing contingency tables and the goodness-of-fit test validating whether observed frequencies match expected distributions. • R provides chisq.test() as the primary function for chi-square analysis, automatically calculating expected frequencies, test statistics, and p-values while handling Yates’ correction for 2x2 tables. • Effect size measures like Cramér’s V and examining standardized residuals are critical for interpreting practical significance beyond statistical significance, especially with large sample sizes where trivial differences become statistically significant.

Understanding Chi-Square Tests

The chi-square test determines whether there’s a significant association between categorical variables. Unlike t-tests or ANOVA that work with continuous data, chi-square tests analyze frequency counts in categories. Two primary variants exist: the test of independence (examining relationships between two categorical variables) and the goodness-of-fit test (comparing observed frequencies to expected theoretical distributions).

The test statistic follows this formula: χ² = Σ[(O - E)² / E], where O represents observed frequencies and E represents expected frequencies. Larger χ² values indicate greater divergence from independence or the expected distribution.

Chi-Square Test of Independence

This test examines whether two categorical variables are related. Consider analyzing whether customer purchase behavior differs across regions.

# Create sample data: Product preference by region
data <- matrix(c(45, 32, 23,
                 38, 41, 31,
                 27, 35, 48), 
               nrow = 3, byrow = TRUE,
               dimnames = list(
                 Region = c("North", "South", "West"),
                 Product = c("A", "B", "C")
               ))

print(data)
# Perform chi-square test
chi_result <- chisq.test(data)
print(chi_result)

# Access specific components
cat("Chi-square statistic:", chi_result$statistic, "\n")
cat("Degrees of freedom:", chi_result$parameter, "\n")
cat("P-value:", chi_result$p.value, "\n")

The degrees of freedom equal (rows - 1) × (columns - 1). For a 3×3 table, df = 4. A p-value below 0.05 typically indicates significant association.

# Examine expected frequencies
print(chi_result$expected)

# View observed frequencies
print(chi_result$observed)

Expected frequencies should exceed 5 for reliable results. When this assumption fails, consider Fisher’s exact test or combining categories.

Analyzing Residuals

Standardized residuals reveal which cells contribute most to the chi-square statistic, showing where observed and expected frequencies diverge significantly.

# Calculate standardized residuals
std_residuals <- chi_result$stdres
print(round(std_residuals, 2))

# Visualize residuals
library(corrplot)
corrplot(std_residuals, is.corr = FALSE, 
         method = "color", 
         cl.lim = c(-3, 3),
         tl.col = "black",
         addCoef.col = "black")

Standardized residuals exceeding ±2 indicate cells significantly different from expected values under independence. Positive values show observed counts exceed expected; negative values indicate the opposite.

Effect Size Calculation

Statistical significance doesn’t equal practical significance. With large samples, trivial associations become statistically significant. Effect size measures address this limitation.

# Cramér's V for effect size
library(rcompanion)
cramerV(data)

# Manual calculation
n <- sum(data)
min_dim <- min(dim(data)) - 1
cramers_v <- sqrt(chi_result$statistic / (n * min_dim))
cat("Cramér's V:", cramers_v, "\n")

Cramér’s V ranges from 0 (no association) to 1 (perfect association). Interpretation guidelines: 0.1 = small effect, 0.3 = medium effect, 0.5 = large effect.

Chi-Square Goodness-of-Fit Test

This variant tests whether observed frequencies match a theoretical distribution. Example: testing if dice rolls are fair.

# Observed dice rolls
observed <- c(18, 22, 15, 20, 17, 23)
categories <- 1:6

# Test for uniform distribution (fair die)
gof_test <- chisq.test(observed)
print(gof_test)

For non-uniform expected distributions, specify probabilities explicitly:

# Test against weighted distribution
# Expecting 40% category 1, 30% category 2, 30% category 3
observed_weighted <- c(85, 62, 53)
expected_probs <- c(0.4, 0.3, 0.3)

gof_weighted <- chisq.test(observed_weighted, p = expected_probs)
print(gof_weighted)

Working with Data Frames

Real data typically arrives in data frames rather than matrices. Convert appropriately for chi-square analysis.

# Sample data frame
customer_data <- data.frame(
  gender = sample(c("Male", "Female"), 200, replace = TRUE),
  subscription = sample(c("Basic", "Premium", "Enterprise"), 200, 
                       replace = TRUE, prob = c(0.5, 0.3, 0.2))
)

# Create contingency table
cont_table <- table(customer_data$gender, customer_data$subscription)
print(cont_table)

# Perform test
chi_df_test <- chisq.test(cont_table)
print(chi_df_test)

The table() function converts data frame columns into the required contingency table format.

Handling Small Sample Sizes

When expected frequencies fall below 5, chi-square test assumptions break down. Apply corrections or alternative tests.

# Small sample data
small_data <- matrix(c(3, 7, 8, 2), nrow = 2)

# Chi-square with Yates' continuity correction (automatic for 2x2)
chi_yates <- chisq.test(small_data)
print(chi_yates)

# Fisher's exact test (better for small samples)
fisher_result <- fisher.test(small_data)
print(fisher_result)

Fisher’s exact test calculates exact probabilities rather than approximations, making it superior for small samples or sparse tables.

Simulated P-Values

For complex tables or marginal expected frequencies, simulate p-values using Monte Carlo methods.

# Chi-square with simulated p-value
chi_sim <- chisq.test(data, simulate.p.value = TRUE, B = 10000)
print(chi_sim)

The B parameter specifies simulation iterations. More iterations increase precision but require more computation.

Multiple Testing Considerations

Testing multiple hypotheses simultaneously inflates Type I error rates. Apply corrections when conducting multiple chi-square tests.

# Multiple tests example
test1 <- chisq.test(table(customer_data$gender, customer_data$subscription))
test2 <- chisq.test(matrix(c(30, 20, 25, 35), nrow = 2))
test3 <- chisq.test(matrix(c(15, 25, 20, 30), nrow = 2))

# Collect p-values
p_values <- c(test1$p.value, test2$p.value, test3$p.value)

# Bonferroni correction
p_adjusted_bonf <- p.adjust(p_values, method = "bonferroni")

# Benjamini-Hochberg (FDR) correction
p_adjusted_bh <- p.adjust(p_values, method = "BH")

results <- data.frame(
  test = paste("Test", 1:3),
  original_p = p_values,
  bonferroni_p = p_adjusted_bonf,
  bh_p = p_adjusted_bh
)
print(results)

Bonferroni correction is conservative; Benjamini-Hochberg controls false discovery rate with less stringency.

Practical Implementation Checklist

Before applying chi-square tests, verify assumptions: independence of observations, adequate sample size (expected frequencies ≥ 5), and categorical data. Check expected frequencies using chi_result$expected. For 2×2 tables with small samples, prefer Fisher’s exact test. Always report effect sizes alongside p-values to assess practical significance. Examine standardized residuals to identify which categories drive significant results, providing actionable insights beyond binary significance testing.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.