Chi-Square Distribution in R: Complete Guide

Key Insights

The chi-square distribution is fundamental for categorical data analysis, with R providing four core functions (dchisq, pchisq, qchisq, rchisq) that handle density, probability, quantiles, and random generation respectively.
Always verify that expected cell frequencies meet the minimum threshold (typically ≥5) before trusting chi-square test results—when they don’t, switch to Fisher’s exact test or combine categories.
Effect size measures like Cramér’s V are essential for practical interpretation; a statistically significant chi-square result doesn’t necessarily indicate a meaningful relationship.

Introduction to the Chi-Square Distribution

The chi-square (χ²) distribution is a continuous probability distribution that arises when you sum the squares of independent standard normal random variables. It’s defined by a single parameter: degrees of freedom (df). As df increases, the distribution shifts rightward and becomes more symmetric, approaching a normal distribution.

Three properties make chi-square essential for statistical work: it’s always non-negative, it’s right-skewed (especially at low df), and its mean equals the degrees of freedom while its variance equals 2×df.

You’ll encounter chi-square in three primary contexts: testing whether observed frequencies match expected frequencies (goodness-of-fit), testing whether two categorical variables are independent, and constructing confidence intervals for population variance.

Let’s visualize how the distribution changes with degrees of freedom:

library(ggplot2)

# Create data for multiple chi-square distributions
x <- seq(0, 30, length.out = 500)
df_values <- c(2, 4, 6, 10, 15)

chi_data <- do.call(rbind, lapply(df_values, function(df) {
  data.frame(x = x, density = dchisq(x, df), df = factor(df))
}))

ggplot(chi_data, aes(x = x, y = density, color = df)) +
  geom_line(linewidth = 1.2) +
  labs(
    title = "Chi-Square Distributions by Degrees of Freedom",
    x = "Value", y = "Density", color = "df"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Core R Functions for Chi-Square

R provides four functions following its standard distribution naming convention:

Function	Purpose	Returns
`dchisq(x, df)`	Probability density	Height of the PDF at x
`pchisq(q, df)`	Cumulative probability	P(X ≤ q)
`qchisq(p, df)`	Quantile function	Value where P(X ≤ value) = p
`rchisq(n, df)`	Random generation	n random chi-square values

Here’s each function in action:

# Density: probability density at x = 5 with df = 3
dchisq(5, df = 3)
# [1] 0.08727

# Cumulative probability: P(X <= 7.81) with df = 3
pchisq(7.81, df = 3)
# [1] 0.9499

# Critical value: find x where P(X <= x) = 0.95 with df = 3
qchisq(0.95, df = 3)
# [1] 7.815

# Random generation: 5 random values with df = 3
set.seed(42)
rchisq(5, df = 3)
# [1] 2.024 5.127 1.893 3.445 2.891

# Calculate critical values for common significance levels
alpha_levels <- c(0.10, 0.05, 0.01)
df <- 5

critical_values <- sapply(alpha_levels, function(a) qchisq(1 - a, df))
names(critical_values) <- paste0("α = ", alpha_levels)
print(critical_values)
# α = 0.1  α = 0.05  α = 0.01 
#    9.24     11.07     15.09

Chi-Square Goodness-of-Fit Test

The goodness-of-fit test compares observed frequencies against expected frequencies under a hypothesized distribution. The null hypothesis states that the data follows the expected distribution.

Let’s test whether a die is fair:

# Observed counts from 120 die rolls
observed <- c(18, 22, 17, 25, 19, 19)
names(observed) <- 1:6

# Expected: equal probability for fair die
expected_prob <- rep(1/6, 6)

# Perform chi-square test
dice_test <- chisq.test(observed, p = expected_prob)
print(dice_test)
# Chi-squared test for given probabilities
# 
# X-squared = 2.1, df = 5, p-value = 0.8354

# Access components
dice_test$statistic  # Chi-square value
dice_test$expected   # Expected frequencies
dice_test$residuals  # Pearson residuals

With p = 0.84, we fail to reject the null hypothesis—the die appears fair.

Now consider a survey where you expect responses to follow a specific distribution:

# Customer satisfaction survey (1-5 scale)
# Expected: 5% very dissatisfied, 10% dissatisfied, 20% neutral,
#           35% satisfied, 30% very satisfied
observed_responses <- c(12, 28, 45, 85, 80)
expected_probs <- c(0.05, 0.10, 0.20, 0.35, 0.30)

survey_test <- chisq.test(observed_responses, p = expected_probs)
print(survey_test)
# X-squared = 8.619, df = 4, p-value = 0.0714

# Examine where deviations occur
data.frame(
  Category = c("Very Dissatisfied", "Dissatisfied", "Neutral", 
               "Satisfied", "Very Satisfied"),
  Observed = observed_responses,
  Expected = round(survey_test$expected, 1),
  Residual = round(survey_test$residuals, 2)
)

Chi-Square Test of Independence

The independence test determines whether two categorical variables are related. You work with contingency tables where rows represent one variable and columns represent another.

# Create contingency table: Gender vs Product Preference
# Data from 200 customers
product_data <- matrix(
  c(45, 30, 25,   # Male: Product A, B, C
    35, 40, 25),  # Female: Product A, B, C
  nrow = 2, byrow = TRUE,
  dimnames = list(
    Gender = c("Male", "Female"),
    Product = c("A", "B", "C")
  )
)

print(product_data)
#         Product
# Gender    A  B  C
#   Male   45 30 25
#   Female 35 40 25

# Test independence
independence_test <- chisq.test(product_data)
print(independence_test)
# Pearson's Chi-squared test
# 
# X-squared = 3.571, df = 2, p-value = 0.1677

# View expected frequencies
round(independence_test$expected, 1)
#         Product
# Gender     A    B    C
#   Male  40.0 35.0 25.0
#   Female 40.0 35.0 25.0

The p-value of 0.17 suggests no significant relationship between gender and product preference.

Assumptions and Diagnostics

Chi-square tests require:

Independence: Observations must be independent
Expected frequency: All expected cell counts should be ≥5 (some sources accept ≥1 if 80% are ≥5)
Mutually exclusive categories: Each observation belongs to exactly one cell

When expected frequencies are too low, use Fisher’s exact test or apply Yates’ continuity correction for 2×2 tables:

# Small sample 2x2 table
small_table <- matrix(c(8, 2, 3, 7), nrow = 2,
  dimnames = list(Treatment = c("Drug", "Placebo"),
                  Outcome = c("Improved", "No Change")))

# Check expected frequencies
chisq.test(small_table)$expected
#          Outcome
# Treatment Improved No Change
#   Drug         5.5       4.5
#   Placebo      5.5       4.5

# Standard chi-square (may be unreliable)
chisq.test(small_table, correct = FALSE)
# X-squared = 5.051, df = 1, p-value = 0.0246

# With Yates' correction (more conservative)
chisq.test(small_table, correct = TRUE)
# X-squared = 3.232, df = 1, p-value = 0.0722

# Fisher's exact test (preferred for small samples)
fisher.test(small_table)
# p-value = 0.06978

Notice how Yates’ correction and Fisher’s test give more conservative p-values—this matters when expected frequencies are marginal.

Visualizing Chi-Square Results

Mosaic plots display contingency table structure with tile areas proportional to cell frequencies:

library(vcd)

# Larger dataset for visualization
survey_data <- matrix(
  c(120, 80, 50,
    90, 110, 60,
    40, 60, 90),
  nrow = 3, byrow = TRUE,
  dimnames = list(
    Age = c("18-30", "31-50", "51+"),
    Preference = c("Online", "In-Store", "Both")
  )
)

# Mosaic plot with shading based on residuals
mosaic(survey_data, shade = TRUE, legend = TRUE,
       main = "Shopping Preference by Age Group")

# Residual heatmap
test_result <- chisq.test(survey_data)
residuals_df <- as.data.frame(as.table(test_result$residuals))
names(residuals_df) <- c("Age", "Preference", "Residual")

ggplot(residuals_df, aes(x = Preference, y = Age, fill = Residual)) +
  geom_tile() +
  geom_text(aes(label = round(Residual, 2)), color = "white", size = 5) +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  labs(title = "Standardized Residuals Heatmap") +
  theme_minimal()

Residuals greater than |2| indicate cells contributing significantly to the chi-square statistic.

Practical Applications and Effect Size

Statistical significance doesn’t equal practical importance. Effect size measures quantify the strength of association:

Phi (φ): For 2×2 tables, ranges from 0 to 1
Cramér’s V: For larger tables, also 0 to 1

# Complete A/B testing workflow
ab_data <- matrix(
  c(145, 355,   # Control: converted, not converted
    180, 320),  # Treatment: converted, not converted
  nrow = 2, byrow = TRUE,
  dimnames = list(
    Group = c("Control", "Treatment"),
    Outcome = c("Converted", "Not Converted")
  )
)

# Run chi-square test
ab_test <- chisq.test(ab_data, correct = FALSE)
print(ab_test)
# X-squared = 6.349, df = 1, p-value = 0.01175

# Calculate effect size (Cramér's V / Phi for 2x2)
n <- sum(ab_data)
k <- min(nrow(ab_data), ncol(ab_data))
cramers_v <- sqrt(ab_test$statistic / (n * (k - 1)))
names(cramers_v) <- "Cramér's V"
print(cramers_v)
# Cramér's V: 0.0797

# Interpretation function
interpret_cramers_v <- function(v) {
  if (v < 0.1) return("Negligible")
  if (v < 0.3) return("Small")
  if (v < 0.5) return("Medium")
  return("Large")
}

# Full report
cat("\n=== A/B Test Results ===\n")
cat("Chi-square:", round(ab_test$statistic, 3), "\n")
cat("p-value:", round(ab_test$p.value, 4), "\n")
cat("Effect size (Cramér's V):", round(cramers_v, 4), "\n")
cat("Interpretation:", interpret_cramers_v(cramers_v), "effect\n")

# Conversion rates
control_rate <- ab_data[1,1] / sum(ab_data[1,])
treatment_rate <- ab_data[2,1] / sum(ab_data[2,])
cat("\nControl conversion:", scales::percent(control_rate, 0.1), "\n")
cat("Treatment conversion:", scales::percent(treatment_rate, 0.1), "\n")
cat("Lift:", scales::percent((treatment_rate - control_rate) / control_rate, 0.1), "\n")

The test shows statistical significance (p = 0.012), but Cramér’s V of 0.08 indicates a negligible practical effect. The treatment improved conversion from 29% to 36%—a 24% relative lift—but the absolute difference is small. Whether this justifies implementation depends on business context, not just the p-value.

The chi-square distribution and its associated tests form the backbone of categorical data analysis. Master these techniques, always check assumptions, and never report significance without effect size. Your stakeholders care about practical impact, not just whether you can reject a null hypothesis.