How to Perform a Chi-Square Goodness of Fit Test in R
The chi-square goodness of fit test answers a simple question: does my observed data match what I expected to see? You're comparing the frequency distribution of a single categorical variable against...
Key Insights
- The chi-square goodness of fit test compares observed categorical frequencies against expected frequencies to determine if your data follows a hypothesized distribution—use it when you have a single categorical variable and want to test distributional assumptions.
- R’s built-in
chisq.test()function handles everything from equal probability distributions to custom expected proportions, but you must ensure expected frequencies are at least 5 per category for valid results. - When assumptions aren’t met, use
simulate.p.value = TRUEto generate Monte Carlo p-values, or combine sparse categories—ignoring these issues leads to unreliable conclusions.
Introduction to Chi-Square Goodness of Fit
The chi-square goodness of fit test answers a simple question: does my observed data match what I expected to see? You’re comparing the frequency distribution of a single categorical variable against a theoretical or hypothesized distribution.
Use this test when you need to verify assumptions about categorical data. Common scenarios include testing whether a die is fair, checking if survey responses match population demographics, or validating that manufacturing defects are uniformly distributed across production lines.
The test calculates a chi-square statistic using this formula:
χ² = Σ[(Observed - Expected)² / Expected]
Larger values indicate greater deviation from expected frequencies, suggesting your data doesn’t fit the hypothesized distribution.
Three assumptions must hold for valid results:
- Independence: Each observation must be independent of others
- Categorical data: Your variable must have discrete categories
- Expected frequency rule: Each category should have an expected count of at least 5
Violating the third assumption inflates your Type I error rate. We’ll address workarounds later.
Preparing Your Data in R
Chi-square tests in R work with frequency vectors, not raw data frames. You need two pieces: observed counts and expected proportions.
# Observed frequencies: counts in each category
observed <- c(45, 32, 28, 35, 40)
# Expected probabilities (must sum to 1)
expected_prob <- c(0.25, 0.20, 0.15, 0.20, 0.20)
# Verify probabilities sum to 1
sum(expected_prob)
# [1] 1
If you’re starting with raw categorical data, use table() to generate frequencies:
# Raw survey responses
responses <- c("A", "A", "B", "C", "A", "B", "A", "C", "C", "B",
"A", "B", "C", "A", "A", "B", "C", "C", "A", "B")
# Convert to frequency table
observed <- table(responses)
observed
# responses
# A B C
# 8 6 6
For equal expected proportions across k categories, you don’t need to specify probabilities—R assumes uniformity by default.
Running the Test with chisq.test()
The chisq.test() function is your workhorse. Its key parameters:
x: Vector of observed frequenciesp: Vector of expected probabilities (optional; defaults to equal)rescale.p: If TRUE, rescales probabilities to sum to 1
Testing against equal proportions:
# Six-sided die rolled 120 times
die_rolls <- c(18, 22, 17, 25, 19, 19)
# Test for fairness (equal probability = 1/6 each)
chisq.test(die_rolls)
Output:
Chi-squared test for given probabilities
data: die_rolls
X-squared = 2.2, df = 5, p-value = 0.8208
Testing against custom proportions:
# Customer preference survey: 200 responses across 4 products
observed <- c(70, 55, 45, 30)
# Expected market share from previous year
expected_share <- c(0.40, 0.30, 0.20, 0.10)
chisq.test(observed, p = expected_share)
Output:
Chi-squared test for given probabilities
data: observed
X-squared = 3.5417, df = 3, p-value = 0.3153
The rescale.p parameter helps when your probabilities don’t quite sum to 1 due to rounding:
# Proportions that don't sum to exactly 1
messy_probs <- c(0.33, 0.33, 0.34) # Sums to 1.00, but sometimes you get 0.333...
chisq.test(c(40, 35, 45), p = messy_probs, rescale.p = TRUE)
Interpreting the Output
The test returns three critical values:
- X-squared (χ²): The test statistic measuring deviation from expected
- df: Degrees of freedom (number of categories - 1)
- p-value: Probability of observing this deviation if H₀ is true
Your decision rule is straightforward:
- p-value < α (typically 0.05): Reject H₀. Evidence suggests data doesn’t fit expected distribution.
- p-value ≥ α: Fail to reject H₀. No evidence against the expected distribution.
Extract individual components from the test object for programmatic use:
# Run test and store result
result <- chisq.test(c(70, 55, 45, 30), p = c(0.40, 0.30, 0.20, 0.10))
# Extract components
result$statistic # Chi-square value
# X-squared
# 3.541667
result$parameter # Degrees of freedom
# df
# 3
result$p.value # P-value
# [1] 0.3152847
result$observed # Observed frequencies
# [1] 70 55 45 30
result$expected # Expected frequencies (calculated from n * p)
# [1] 80 60 40 20
result$residuals # Pearson residuals
# [1] -1.118034 -0.645497 0.790569 2.236068
Pearson residuals help identify which categories deviate most. Values beyond ±2 indicate substantial departure from expected frequencies.
Worked Example: Website Traffic Distribution
Let’s work through a complete example. A marketing team believes website traffic should be distributed across five channels according to their ad spend allocation:
- Organic Search: 35%
- Paid Search: 25%
- Social Media: 20%
- Email: 15%
- Direct: 5%
Last month, they observed 500 visits distributed as follows:
# Set up the analysis
channels <- c("Organic", "Paid", "Social", "Email", "Direct")
observed_visits <- c(160, 140, 95, 80, 25)
expected_allocation <- c(0.35, 0.25, 0.20, 0.15, 0.05)
# Name the vector for clearer output
names(observed_visits) <- channels
# Verify total visits
sum(observed_visits)
# [1] 500
# Run the chi-square test
traffic_test <- chisq.test(observed_visits, p = expected_allocation)
print(traffic_test)
Output:
Chi-squared test for given probabilities
data: observed_visits
X-squared = 5.119, df = 4, p-value = 0.2752
Interpretation: With χ² = 5.119, df = 4, and p-value = 0.275, we fail to reject the null hypothesis at α = 0.05. The observed traffic distribution is consistent with the expected allocation based on ad spend.
Let’s examine the details:
# Compare observed vs expected
comparison <- data.frame(
Channel = channels,
Observed = traffic_test$observed,
Expected = traffic_test$expected,
Residual = traffic_test$residuals
)
print(comparison)
Output:
Channel Observed Expected Residual
1 Organic 160 175 -1.1338934
2 Paid 140 125 1.3416408
3 Social 95 100 -0.5000000
4 Email 80 75 0.5773503
5 Direct 25 25 0.0000000
No residuals exceed ±2, confirming no single channel dramatically deviates from expectations.
Visualizing Results
Visualization makes chi-square results accessible to stakeholders. Here’s a grouped bar chart using ggplot2:
library(ggplot2)
library(tidyr)
# Create comparison data frame
viz_data <- data.frame(
Channel = factor(channels, levels = channels),
Observed = as.numeric(traffic_test$observed),
Expected = as.numeric(traffic_test$expected)
)
# Reshape for ggplot
viz_long <- pivot_longer(viz_data,
cols = c(Observed, Expected),
names_to = "Type",
values_to = "Visits")
# Create grouped bar chart
ggplot(viz_long, aes(x = Channel, y = Visits, fill = Type)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_manual(values = c("Expected" = "#2C3E50", "Observed" = "#E74C3C")) +
labs(
title = "Website Traffic: Observed vs Expected Distribution",
subtitle = sprintf("χ² = %.2f, p = %.3f",
traffic_test$statistic,
traffic_test$p.value),
x = "Traffic Channel",
y = "Number of Visits"
) +
theme_minimal() +
theme(legend.position = "top")
For base R users, here’s a simpler alternative:
# Base R grouped barplot
bar_matrix <- rbind(traffic_test$observed, traffic_test$expected)
rownames(bar_matrix) <- c("Observed", "Expected")
barplot(bar_matrix,
beside = TRUE,
col = c("#E74C3C", "#2C3E50"),
legend = TRUE,
main = "Traffic Distribution: Observed vs Expected",
xlab = "Channel",
ylab = "Visits")
Common Pitfalls and Best Practices
Problem 1: Expected frequencies below 5
When expected counts fall below 5, the chi-square approximation becomes unreliable. You have two options:
Option A: Combine categories
# Original data with sparse category
observed_sparse <- c(45, 38, 8, 6, 3) # Last category too small
expected_sparse <- c(0.40, 0.35, 0.10, 0.10, 0.05)
# Check expected counts
sum(observed_sparse) * expected_sparse
# [1] 40.0 35.0 10.0 10.0 5.0 # Last one is borderline
# Combine last two categories
observed_combined <- c(45, 38, 8, 9) # 6 + 3 = 9
expected_combined <- c(0.40, 0.35, 0.10, 0.15) # 0.10 + 0.05 = 0.15
chisq.test(observed_combined, p = expected_combined)
Option B: Use Monte Carlo simulation
# Simulated p-value for small samples
chisq.test(observed_sparse,
p = expected_sparse,
simulate.p.value = TRUE,
B = 10000)
Output:
Chi-squared test for given probabilities with simulated p-value
(based on 10000 replicates)
data: observed_sparse
X-squared = 4.125, df = NA, p-value = 0.3886
The B parameter controls the number of Monte Carlo replicates. Use at least 2000 for stable results; 10000 is better for publication.
Problem 2: Confusing goodness of fit with independence tests
The goodness of fit test examines one variable against a theoretical distribution. If you’re comparing two categorical variables against each other, you need the chi-square test of independence—same function, different input structure (a contingency table instead of a vector).
Problem 3: Treating percentages as frequencies
# WRONG: Using percentages
wrong_data <- c(35, 25, 20, 15, 5) # These are percentages, not counts
chisq.test(wrong_data) # Runs but results are meaningless
# RIGHT: Use actual counts
correct_data <- c(175, 125, 100, 75, 25) # Actual frequencies
chisq.test(correct_data)
Best practice checklist:
- Verify expected frequencies ≥ 5 before interpreting results
- Always examine residuals to identify problematic categories
- Report the test statistic, degrees of freedom, and p-value together
- Use simulation for small samples or sparse categories
- Visualize results—stakeholders understand bar charts better than p-values
The chi-square goodness of fit test remains one of the most practical tools for validating distributional assumptions. Master these fundamentals in R, and you’ll have a reliable method for testing whether your categorical data behaves as expected.