Chi-Square Test in R: Step-by-Step Guide
Chi-square tests are workhorses for analyzing categorical data. Unlike t-tests or ANOVA that compare means of continuous variables, chi-square tests examine whether the distribution of categorical...
Key Insights
- Chi-square tests analyze relationships between categorical variables—use goodness-of-fit to test if observed frequencies match expected distributions, and test of independence to check if two categorical variables are associated.
- The expected frequency assumption (typically ≥5 per cell) matters more than most analysts realize—violating it inflates Type I error rates and produces unreliable p-values.
- Always report effect size (Cramér’s V) alongside p-values; statistical significance tells you nothing about practical importance, especially with large samples.
Introduction to Chi-Square Tests
Chi-square tests are workhorses for analyzing categorical data. Unlike t-tests or ANOVA that compare means of continuous variables, chi-square tests examine whether the distribution of categorical variables deviates from what we’d expect by chance.
You’ll encounter two main types:
Goodness-of-fit test: Compares observed frequencies in a single categorical variable against an expected distribution. Use this when you have a theoretical expectation—like testing if a die is fair or if customer complaints are evenly distributed across weekdays.
Test of independence: Examines whether two categorical variables are associated. Use this when you want to know if there’s a relationship—like whether gender affects product preference or if treatment group relates to recovery status.
The distinction is straightforward: one variable versus two. Get this wrong, and you’ll run the wrong test entirely.
Assumptions and Prerequisites
Chi-square tests have requirements that analysts frequently ignore. Here’s what actually matters:
Expected frequency requirement: Each cell should have an expected count of at least 5. Some sources say 80% of cells need ≥5 with none below 1. This isn’t pedantic—low expected frequencies produce unreliable test statistics.
Independence of observations: Each observation must be independent. No repeated measures, no clustered data without adjustment. If the same person appears multiple times, your p-values are meaningless.
Sample size: There’s no strict minimum, but the expected frequency requirement effectively sets a floor. With too few observations, you won’t meet the assumption regardless of your table dimensions.
Let’s set up our environment:
# Core packages for chi-square analysis
library(stats) # Base R - chisq.test(), fisher.test()
library(vcd) # Visualizing categorical data, effect sizes
library(ggplot2) # Modern visualization
# Optional but useful
library(dplyr) # Data manipulation
Chi-Square Goodness-of-Fit Test
The goodness-of-fit test answers: “Do my observed frequencies match what I expected?”
Consider testing whether a six-sided die is fair. With 600 rolls, we’d expect 100 of each outcome if the die is unbiased.
# Observed frequencies from 600 dice rolls
observed <- c(89, 103, 118, 94, 108, 88)
names(observed) <- 1:6
# Expected frequencies (null hypothesis: fair die)
expected_probs <- rep(1/6, 6)
# Run the test
dice_test <- chisq.test(observed, p = expected_probs)
print(dice_test)
Output:
Chi-squared test for given probabilities
data: observed
X-squared = 7.48, df = 5, p-value = 0.1873
Interpreting the output:
- X-squared (7.48): The test statistic measuring overall deviation from expected frequencies
- df (5): Degrees of freedom = number of categories - 1
- p-value (0.1873): Probability of observing this deviation (or more extreme) if the die is fair
With p = 0.187, we fail to reject the null hypothesis. The die appears fair—or at least, we don’t have evidence it isn’t.
You can also specify non-uniform expected distributions:
# Testing if website traffic matches expected seasonal pattern
observed_traffic <- c(2400, 2100, 2800, 3200) # Q1-Q4
expected_pattern <- c(0.20, 0.20, 0.25, 0.35) # Expected proportions
traffic_test <- chisq.test(observed_traffic, p = expected_pattern)
print(traffic_test)
Chi-Square Test of Independence
This is where chi-square tests shine—testing whether two categorical variables are related.
Let’s analyze whether customer segment relates to product preference:
# Create sample data
set.seed(42)
n <- 300
customer_data <- data.frame(
segment = sample(c("Budget", "Premium", "Enterprise"), n,
replace = TRUE, prob = c(0.4, 0.35, 0.25)),
preference = sample(c("Basic", "Standard", "Pro"), n,
replace = TRUE, prob = c(0.3, 0.45, 0.25))
)
# Create contingency table
contingency_table <- table(customer_data$segment, customer_data$preference)
print(contingency_table)
Output:
Basic Pro Standard
Budget 39 28 53
Premium 27 25 52
Enterprise 24 22 30
Now run the test:
# Chi-square test of independence
independence_test <- chisq.test(contingency_table)
print(independence_test)
Output:
Pearson's Chi-squared test
data: contingency_table
X-squared = 2.9847, df = 4, p-value = 0.5604
With p = 0.56, there’s no significant association between customer segment and product preference in this data.
Accessing detailed results:
# Expected frequencies under independence
independence_test$expected
# Observed frequencies
independence_test$observed
# Residuals (observed - expected)
independence_test$residuals
# Standardized residuals (more useful)
independence_test$stdres
Visualizing Results
Visualization makes chi-square results interpretable for stakeholders who don’t speak statistics.
Mosaic plots show both the contingency table structure and deviations from independence:
# Basic mosaic plot
mosaicplot(contingency_table,
main = "Customer Segment vs. Product Preference",
color = TRUE,
shade = TRUE) # Shading indicates residual magnitude
The shade = TRUE argument colors cells by standardized residuals—blue for higher than expected, red for lower.
ggplot2 bar charts work well for presentations:
# Prepare data for ggplot
plot_data <- as.data.frame(contingency_table)
names(plot_data) <- c("Segment", "Preference", "Count")
# Grouped bar chart
ggplot(plot_data, aes(x = Segment, y = Count, fill = Preference)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Product Preference by Customer Segment",
x = "Customer Segment",
y = "Count") +
theme_minimal()
# Proportional stacked bar (often more useful)
ggplot(plot_data, aes(x = Segment, y = Count, fill = Preference)) +
geom_bar(stat = "identity", position = "fill") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Product Preference Distribution by Segment",
x = "Customer Segment",
y = "Proportion") +
theme_minimal()
For larger tables, balloon plots from the gplots package show cell frequencies as proportionally-sized circles, making patterns visible at a glance.
Post-Hoc Analysis and Effect Size
A significant chi-square test tells you an association exists—not where it is or how strong it is. You need post-hoc analysis.
Standardized residuals identify which cells drive the significant result:
# Standardized residuals > |2| suggest significant deviation
round(independence_test$stdres, 2)
Values exceeding ±2 indicate cells where observed counts differ meaningfully from expected. Values beyond ±3 are highly significant.
Cramér’s V measures effect size—how strong the association is:
# Using vcd package
library(vcd)
assoc_stats <- assocstats(contingency_table)
print(assoc_stats)
Output includes:
X^2 df P(> X^2)
Likelihood Ratio 2.9732 4 0.56234
Pearson 2.9847 4 0.56037
Phi-Coefficient : NA
Contingency Coeff.: 0.099
Cramer's V : 0.071
Interpreting Cramér’s V:
- 0.1 = small effect
- 0.3 = medium effect
- 0.5 = large effect
Our V = 0.071 indicates virtually no association—consistent with our non-significant p-value. But here’s the key insight: even with a significant p-value, a small V means the relationship has little practical importance.
Common Pitfalls and Alternatives
Problem: Small expected frequencies
When expected counts fall below 5, the chi-square approximation breaks down. R warns you:
# Small sample example
small_table <- matrix(c(8, 2, 1, 4), nrow = 2)
chisq.test(small_table)
# Warning: Chi-squared approximation may be incorrect
Solution: Fisher’s exact test
Fisher’s test calculates exact probabilities without relying on the chi-square approximation:
# Fisher's exact test - works with small samples
fisher_result <- fisher.test(small_table)
print(fisher_result)
Output:
Fisher's Exact Test for Count Data
data: small_table
p-value = 0.08359
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.6372541 314.4692498
sample estimates:
odds ratio
11.05691
Fisher’s test also provides an odds ratio, which chi-square doesn’t.
When to use Fisher’s over chi-square:
- Any cell has expected frequency < 5
- Total sample size < 20
- 2×2 tables with small samples (Fisher is exact, chi-square is approximate)
Other common mistakes:
-
Using percentages instead of counts:
chisq.test()needs raw frequencies, not proportions. -
Including the same observation multiple times: Violates independence. Each row in your data should be one unique observation.
-
Ignoring the continuity correction: For 2×2 tables, R applies Yates’ correction by default. Disable it with
correct = FALSEif you want the uncorrected statistic.
# 2x2 table comparison
two_by_two <- matrix(c(45, 55, 30, 70), nrow = 2)
# With Yates' correction (default)
chisq.test(two_by_two)
# Without correction
chisq.test(two_by_two, correct = FALSE)
The corrected version is more conservative—use it unless you have specific reasons not to.
Chi-square tests are straightforward once you understand the mechanics. Choose the right type, verify assumptions, run the test, then report effect size alongside your p-value. That’s the complete workflow.