Chi-Square Test in R: Step-by-Step Guide

Chi-square tests are workhorses for analyzing categorical data. Unlike t-tests or ANOVA that compare means of continuous variables, chi-square tests examine whether the distribution of categorical...

Key Insights

  • Chi-square tests analyze relationships between categorical variables—use goodness-of-fit to test if observed frequencies match expected distributions, and test of independence to check if two categorical variables are associated.
  • The expected frequency assumption (typically ≥5 per cell) matters more than most analysts realize—violating it inflates Type I error rates and produces unreliable p-values.
  • Always report effect size (Cramér’s V) alongside p-values; statistical significance tells you nothing about practical importance, especially with large samples.

Introduction to Chi-Square Tests

Chi-square tests are workhorses for analyzing categorical data. Unlike t-tests or ANOVA that compare means of continuous variables, chi-square tests examine whether the distribution of categorical variables deviates from what we’d expect by chance.

You’ll encounter two main types:

Goodness-of-fit test: Compares observed frequencies in a single categorical variable against an expected distribution. Use this when you have a theoretical expectation—like testing if a die is fair or if customer complaints are evenly distributed across weekdays.

Test of independence: Examines whether two categorical variables are associated. Use this when you want to know if there’s a relationship—like whether gender affects product preference or if treatment group relates to recovery status.

The distinction is straightforward: one variable versus two. Get this wrong, and you’ll run the wrong test entirely.

Assumptions and Prerequisites

Chi-square tests have requirements that analysts frequently ignore. Here’s what actually matters:

Expected frequency requirement: Each cell should have an expected count of at least 5. Some sources say 80% of cells need ≥5 with none below 1. This isn’t pedantic—low expected frequencies produce unreliable test statistics.

Independence of observations: Each observation must be independent. No repeated measures, no clustered data without adjustment. If the same person appears multiple times, your p-values are meaningless.

Sample size: There’s no strict minimum, but the expected frequency requirement effectively sets a floor. With too few observations, you won’t meet the assumption regardless of your table dimensions.

Let’s set up our environment:

# Core packages for chi-square analysis
library(stats)    # Base R - chisq.test(), fisher.test()
library(vcd)      # Visualizing categorical data, effect sizes
library(ggplot2)  # Modern visualization

# Optional but useful
library(dplyr)    # Data manipulation

Chi-Square Goodness-of-Fit Test

The goodness-of-fit test answers: “Do my observed frequencies match what I expected?”

Consider testing whether a six-sided die is fair. With 600 rolls, we’d expect 100 of each outcome if the die is unbiased.

# Observed frequencies from 600 dice rolls
observed <- c(89, 103, 118, 94, 108, 88)
names(observed) <- 1:6

# Expected frequencies (null hypothesis: fair die)
expected_probs <- rep(1/6, 6)

# Run the test
dice_test <- chisq.test(observed, p = expected_probs)
print(dice_test)

Output:

	Chi-squared test for given probabilities

data:  observed
X-squared = 7.48, df = 5, p-value = 0.1873

Interpreting the output:

  • X-squared (7.48): The test statistic measuring overall deviation from expected frequencies
  • df (5): Degrees of freedom = number of categories - 1
  • p-value (0.1873): Probability of observing this deviation (or more extreme) if the die is fair

With p = 0.187, we fail to reject the null hypothesis. The die appears fair—or at least, we don’t have evidence it isn’t.

You can also specify non-uniform expected distributions:

# Testing if website traffic matches expected seasonal pattern
observed_traffic <- c(2400, 2100, 2800, 3200)  # Q1-Q4
expected_pattern <- c(0.20, 0.20, 0.25, 0.35)  # Expected proportions

traffic_test <- chisq.test(observed_traffic, p = expected_pattern)
print(traffic_test)

Chi-Square Test of Independence

This is where chi-square tests shine—testing whether two categorical variables are related.

Let’s analyze whether customer segment relates to product preference:

# Create sample data
set.seed(42)
n <- 300

customer_data <- data.frame(
  segment = sample(c("Budget", "Premium", "Enterprise"), n, 
                   replace = TRUE, prob = c(0.4, 0.35, 0.25)),
  preference = sample(c("Basic", "Standard", "Pro"), n, 
                      replace = TRUE, prob = c(0.3, 0.45, 0.25))
)

# Create contingency table
contingency_table <- table(customer_data$segment, customer_data$preference)
print(contingency_table)

Output:

            Basic Pro Standard
  Budget       39  28       53
  Premium      27  25       52
  Enterprise   24  22       30

Now run the test:

# Chi-square test of independence
independence_test <- chisq.test(contingency_table)
print(independence_test)

Output:

	Pearson's Chi-squared test

data:  contingency_table
X-squared = 2.9847, df = 4, p-value = 0.5604

With p = 0.56, there’s no significant association between customer segment and product preference in this data.

Accessing detailed results:

# Expected frequencies under independence
independence_test$expected

# Observed frequencies
independence_test$observed

# Residuals (observed - expected)
independence_test$residuals

# Standardized residuals (more useful)
independence_test$stdres

Visualizing Results

Visualization makes chi-square results interpretable for stakeholders who don’t speak statistics.

Mosaic plots show both the contingency table structure and deviations from independence:

# Basic mosaic plot
mosaicplot(contingency_table, 
           main = "Customer Segment vs. Product Preference",
           color = TRUE,
           shade = TRUE)  # Shading indicates residual magnitude

The shade = TRUE argument colors cells by standardized residuals—blue for higher than expected, red for lower.

ggplot2 bar charts work well for presentations:

# Prepare data for ggplot
plot_data <- as.data.frame(contingency_table)
names(plot_data) <- c("Segment", "Preference", "Count")

# Grouped bar chart
ggplot(plot_data, aes(x = Segment, y = Count, fill = Preference)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Product Preference by Customer Segment",
       x = "Customer Segment",
       y = "Count") +
  theme_minimal()

# Proportional stacked bar (often more useful)
ggplot(plot_data, aes(x = Segment, y = Count, fill = Preference)) +
  geom_bar(stat = "identity", position = "fill") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Product Preference Distribution by Segment",
       x = "Customer Segment",
       y = "Proportion") +
  theme_minimal()

For larger tables, balloon plots from the gplots package show cell frequencies as proportionally-sized circles, making patterns visible at a glance.

Post-Hoc Analysis and Effect Size

A significant chi-square test tells you an association exists—not where it is or how strong it is. You need post-hoc analysis.

Standardized residuals identify which cells drive the significant result:

# Standardized residuals > |2| suggest significant deviation
round(independence_test$stdres, 2)

Values exceeding ±2 indicate cells where observed counts differ meaningfully from expected. Values beyond ±3 are highly significant.

Cramér’s V measures effect size—how strong the association is:

# Using vcd package
library(vcd)
assoc_stats <- assocstats(contingency_table)
print(assoc_stats)

Output includes:

                    X^2 df  P(> X^2)
Likelihood Ratio 2.9732  4   0.56234
Pearson          2.9847  4   0.56037

Phi-Coefficient   : NA 
Contingency Coeff.: 0.099 
Cramer's V        : 0.071 

Interpreting Cramér’s V:

  • 0.1 = small effect
  • 0.3 = medium effect
  • 0.5 = large effect

Our V = 0.071 indicates virtually no association—consistent with our non-significant p-value. But here’s the key insight: even with a significant p-value, a small V means the relationship has little practical importance.

Common Pitfalls and Alternatives

Problem: Small expected frequencies

When expected counts fall below 5, the chi-square approximation breaks down. R warns you:

# Small sample example
small_table <- matrix(c(8, 2, 1, 4), nrow = 2)
chisq.test(small_table)
# Warning: Chi-squared approximation may be incorrect

Solution: Fisher’s exact test

Fisher’s test calculates exact probabilities without relying on the chi-square approximation:

# Fisher's exact test - works with small samples
fisher_result <- fisher.test(small_table)
print(fisher_result)

Output:

	Fisher's Exact Test for Count Data

data:  small_table
p-value = 0.08359
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.6372541 314.4692498
sample estimates:
odds ratio 
  11.05691 

Fisher’s test also provides an odds ratio, which chi-square doesn’t.

When to use Fisher’s over chi-square:

  • Any cell has expected frequency < 5
  • Total sample size < 20
  • 2×2 tables with small samples (Fisher is exact, chi-square is approximate)

Other common mistakes:

  1. Using percentages instead of counts: chisq.test() needs raw frequencies, not proportions.

  2. Including the same observation multiple times: Violates independence. Each row in your data should be one unique observation.

  3. Ignoring the continuity correction: For 2×2 tables, R applies Yates’ correction by default. Disable it with correct = FALSE if you want the uncorrected statistic.

# 2x2 table comparison
two_by_two <- matrix(c(45, 55, 30, 70), nrow = 2)

# With Yates' correction (default)
chisq.test(two_by_two)

# Without correction
chisq.test(two_by_two, correct = FALSE)

The corrected version is more conservative—use it unless you have specific reasons not to.

Chi-square tests are straightforward once you understand the mechanics. Choose the right type, verify assumptions, run the test, then report effect size alongside your p-value. That’s the complete workflow.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.