How to Perform a Chi-Square Test of Independence in R

The chi-square test of independence answers a simple question: are two categorical variables related, or are they independent? Unlike correlation tests for continuous data, this test works...

Key Insights

  • The chi-square test of independence determines whether two categorical variables are related, making it essential for survey analysis, A/B testing, and any scenario involving count data across categories.
  • Always verify that expected frequencies are at least 5 per cell before trusting chi-square results; use Fisher’s Exact Test when this assumption fails.
  • Report effect size (Cramér’s V) alongside p-values—statistical significance doesn’t tell you whether the relationship is practically meaningful.

Introduction to Chi-Square Test of Independence

The chi-square test of independence answers a simple question: are two categorical variables related, or are they independent? Unlike correlation tests for continuous data, this test works exclusively with counts and categories.

You’ll reach for this test when analyzing survey responses (Is political affiliation related to education level?), evaluating A/B test results (Does button color affect conversion?), or examining medical data (Is treatment type associated with recovery outcome?). The test compares observed frequencies against what you’d expect if the variables were truly independent.

The chi-square test differs from related tests in important ways. Use it instead of a chi-square goodness-of-fit test when you have two variables rather than comparing one variable to a known distribution. Choose it over a t-test or ANOVA when your outcome variable is categorical, not continuous. If you’re comparing proportions between exactly two groups, you could use a z-test for proportions, but the chi-square test generalizes to any number of groups.

Assumptions and Requirements

Before running the test, verify these assumptions hold:

Expected frequency requirement: Each cell in your contingency table should have an expected frequency of at least 5. Some sources allow 80% of cells to meet this threshold, but I recommend the stricter rule for reliable results.

Independence of observations: Each observation contributes to only one cell. A single person can’t appear in multiple categories. This rules out repeated measures or paired designs.

Categorical data: Both variables must be categorical (nominal or ordinal). If you have continuous data, you’ll need to bin it first—though this loses information.

Here’s a function to check expected frequencies before proceeding:

check_expected_frequencies <- function(observed_table) {
  # Run chi-square test to get expected values
  test_result <- chisq.test(observed_table)
  expected <- test_result$expected
  
  # Check the assumption
  min_expected <- min(expected)
  cells_below_5 <- sum(expected < 5)
  total_cells <- length(expected)
  
  cat("Expected Frequencies:\n")
  print(round(expected, 2))
  cat("\nMinimum expected frequency:", round(min_expected, 2))
  cat("\nCells with expected < 5:", cells_below_5, "of", total_cells)
  
  if (min_expected < 5) {
    cat("\n\nWARNING: Consider Fisher's Exact Test instead.\n")
  } else {
    cat("\n\nAssumption satisfied. Chi-square test is appropriate.\n")
  }
  
  invisible(expected)
}

Preparing Your Data

Most real-world data arrives as individual observations, not pre-counted tables. You’ll need to create a contingency table first.

Consider this survey dataset examining the relationship between smoking status and exercise frequency:

# Create sample survey data
set.seed(42)
n <- 200

survey_data <- data.frame(
  smoking_status = sample(c("Never", "Former", "Current"), n, 
                          replace = TRUE, prob = c(0.5, 0.3, 0.2)),
  exercise_freq = sample(c("Rarely", "Weekly", "Daily"), n,
                         replace = TRUE, prob = c(0.3, 0.45, 0.25))
)

# Create contingency table from raw data
smoking_exercise_table <- table(survey_data$smoking_status, 
                                 survey_data$exercise_freq)
print(smoking_exercise_table)

Output:

         Daily Rarely Weekly
  Current    10     13     20
  Former     17     15     27
  Never      25     32     41

For more complex data with additional variables or weights, use xtabs():

# xtabs with formula interface
smoking_exercise_table2 <- xtabs(~ smoking_status + exercise_freq, 
                                  data = survey_data)

# If you have pre-aggregated counts
aggregated_data <- data.frame(
  smoking = c("Current", "Current", "Former", "Former", "Never", "Never"),
  exercise = c("Daily", "Weekly", "Daily", "Weekly", "Daily", "Weekly"),
  count = c(15, 25, 20, 35, 30, 50)
)

# Create table from counts
weighted_table <- xtabs(count ~ smoking + exercise, data = aggregated_data)

Running the Chi-Square Test

With your contingency table ready, the test itself is straightforward:

# Run the chi-square test
chi_result <- chisq.test(smoking_exercise_table)
print(chi_result)

Output:

	Pearson's Chi-squared test

data:  smoking_exercise_table
X-squared = 1.5873, df = 4, p-value = 0.8108

Let’s break down this output:

  • X-squared (1.5873): The test statistic measuring how far observed frequencies deviate from expected frequencies. Larger values indicate greater deviation from independence.
  • df (4): Degrees of freedom, calculated as (rows - 1) × (columns - 1). Here: (3-1) × (3-1) = 4.
  • p-value (0.8108): The probability of observing this much deviation (or more) if the variables were truly independent. At α = 0.05, we fail to reject the null hypothesis—no significant relationship detected.

Access additional information from the result object:

# Expected frequencies under independence
chi_result$expected

# Observed frequencies (same as input)
chi_result$observed

# Pearson residuals: (observed - expected) / sqrt(expected)
chi_result$residuals

# Standardized residuals (more useful for identifying patterns)
chi_result$stdres

Standardized residuals greater than 2 or less than -2 indicate cells contributing substantially to the chi-square statistic. These reveal where the relationship is strongest.

Handling Small Sample Sizes

By default, chisq.test() applies Yates’ continuity correction for 2×2 tables. This makes the test more conservative but can be overly so:

# 2x2 table example
treatment_outcome <- matrix(c(15, 5, 8, 12), nrow = 2,
                            dimnames = list(Treatment = c("Drug", "Placebo"),
                                          Outcome = c("Improved", "No Change")))

# With Yates' correction (default)
chisq.test(treatment_outcome)

# Without correction
chisq.test(treatment_outcome, correct = FALSE)

When expected frequencies fall below 5, switch to Fisher’s Exact Test:

# Small sample scenario
small_table <- matrix(c(3, 7, 8, 2), nrow = 2,
                      dimnames = list(Group = c("A", "B"),
                                    Response = c("Yes", "No")))

# Check expected frequencies
check_expected_frequencies(small_table)

# Fisher's Exact Test
fisher_result <- fisher.test(small_table)
print(fisher_result)

Fisher’s test computes exact probabilities rather than relying on the chi-square approximation. It also provides an odds ratio with confidence interval for 2×2 tables—useful for quantifying the strength of association.

For tables larger than 2×2 with small expected frequencies, Fisher’s test can be computationally intensive. Consider collapsing categories or using simulation-based approaches:

# Simulation-based chi-square p-value
chisq.test(smoking_exercise_table, simulate.p.value = TRUE, B = 10000)

Visualizing Results

Visualization helps communicate findings and identify patterns that raw numbers obscure.

A stacked bar chart shows proportions across categories:

library(ggplot2)

# Calculate proportions within each smoking status
survey_data$smoking_status <- factor(survey_data$smoking_status,
                                      levels = c("Never", "Former", "Current"))

ggplot(survey_data, aes(x = smoking_status, fill = exercise_freq)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_brewer(palette = "Set2", name = "Exercise Frequency") +
  labs(x = "Smoking Status", 
       y = "Proportion",
       title = "Exercise Frequency by Smoking Status") +
  theme_minimal()

Mosaic plots display both the proportions and the relative sizes of each group:

library(vcd)

# Basic mosaic plot
mosaic(smoking_exercise_table, 
       shade = TRUE,  # Color by residuals
       legend = TRUE)

# The shading indicates cells with larger-than-expected (blue) 
# or smaller-than-expected (red) frequencies

For residual analysis, create a heatmap of standardized residuals:

library(reshape2)

# Extract standardized residuals
std_res <- as.data.frame(as.table(chi_result$stdres))
names(std_res) <- c("Smoking", "Exercise", "Residual")

ggplot(std_res, aes(x = Exercise, y = Smoking, fill = Residual)) +
  geom_tile() +
  geom_text(aes(label = round(Residual, 2)), color = "white") +
  scale_fill_gradient2(low = "red", mid = "white", high = "blue",
                       midpoint = 0, limits = c(-3, 3)) +
  labs(title = "Standardized Residuals") +
  theme_minimal()

Reporting and Interpretation

Statistical significance alone doesn’t tell you whether a relationship matters practically. Calculate Cramér’s V to measure effect size:

# Manual calculation of Cramér's V
cramers_v <- function(chi_result) {
  chi_sq <- chi_result$statistic
  n <- sum(chi_result$observed)
  min_dim <- min(nrow(chi_result$observed), ncol(chi_result$observed)) - 1
  
  v <- sqrt(chi_sq / (n * min_dim))
  return(as.numeric(v))
}

v <- cramers_v(chi_result)
cat("Cramér's V:", round(v, 3))

# Or use the rcompanion package
library(rcompanion)
cramerV(smoking_exercise_table)

Interpret Cramér’s V using these guidelines: 0.1 = small effect, 0.3 = medium effect, 0.5 = large effect. These thresholds depend on your field and the degrees of freedom, so context matters.

For APA-style reporting, include the test statistic, degrees of freedom, p-value, and effect size:

A chi-square test of independence examined the relationship between smoking status and exercise frequency. The relationship was not statistically significant, χ²(4) = 1.59, p = .811, Cramér’s V = .06, indicating no association between smoking status and exercise habits in this sample.

If the test were significant, you’d add interpretation of the standardized residuals to describe the nature of the relationship—which cells deviated most from independence and in which direction.

The chi-square test of independence is a workhorse for categorical data analysis. Master the workflow—check assumptions, run the test, visualize patterns, report effect sizes—and you’ll extract meaningful insights from count data across countless applications.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.