How to Perform a Chi-Square Goodness of Fit Test in R

Key Insights

The chi-square goodness of fit test compares observed categorical frequencies against expected frequencies to determine if your data follows a hypothesized distribution—use it when you have a single categorical variable and want to test distributional assumptions.
R’s built-in chisq.test() function handles everything from equal probability distributions to custom expected proportions, but you must ensure expected frequencies are at least 5 per category for valid results.
When assumptions aren’t met, use simulate.p.value = TRUE to generate Monte Carlo p-values, or combine sparse categories—ignoring these issues leads to unreliable conclusions.

Introduction to Chi-Square Goodness of Fit

The chi-square goodness of fit test answers a simple question: does my observed data match what I expected to see? You’re comparing the frequency distribution of a single categorical variable against a theoretical or hypothesized distribution.

Use this test when you need to verify assumptions about categorical data. Common scenarios include testing whether a die is fair, checking if survey responses match population demographics, or validating that manufacturing defects are uniformly distributed across production lines.

The test calculates a chi-square statistic using this formula:

χ² = Σ[(Observed - Expected)² / Expected]

Larger values indicate greater deviation from expected frequencies, suggesting your data doesn’t fit the hypothesized distribution.

Three assumptions must hold for valid results:

Independence: Each observation must be independent of others
Categorical data: Your variable must have discrete categories
Expected frequency rule: Each category should have an expected count of at least 5

Violating the third assumption inflates your Type I error rate. We’ll address workarounds later.

Preparing Your Data in R

Chi-square tests in R work with frequency vectors, not raw data frames. You need two pieces: observed counts and expected proportions.

# Observed frequencies: counts in each category
observed <- c(45, 32, 28, 35, 40)

# Expected probabilities (must sum to 1)
expected_prob <- c(0.25, 0.20, 0.15, 0.20, 0.20)

# Verify probabilities sum to 1
sum(expected_prob)
# [1] 1

If you’re starting with raw categorical data, use table() to generate frequencies:

# Raw survey responses
responses <- c("A", "A", "B", "C", "A", "B", "A", "C", "C", "B", 
               "A", "B", "C", "A", "A", "B", "C", "C", "A", "B")

# Convert to frequency table
observed <- table(responses)
observed
# responses
#  A  B  C 
#  8  6  6

For equal expected proportions across k categories, you don’t need to specify probabilities—R assumes uniformity by default.

Running the Test with chisq.test()

The chisq.test() function is your workhorse. Its key parameters:

x: Vector of observed frequencies
p: Vector of expected probabilities (optional; defaults to equal)
rescale.p: If TRUE, rescales probabilities to sum to 1

Testing against equal proportions:

# Six-sided die rolled 120 times
die_rolls <- c(18, 22, 17, 25, 19, 19)

# Test for fairness (equal probability = 1/6 each)
chisq.test(die_rolls)

Output:

	Chi-squared test for given probabilities

data:  die_rolls
X-squared = 2.2, df = 5, p-value = 0.8208

Testing against custom proportions:

# Customer preference survey: 200 responses across 4 products
observed <- c(70, 55, 45, 30)

# Expected market share from previous year
expected_share <- c(0.40, 0.30, 0.20, 0.10)

chisq.test(observed, p = expected_share)

Output:

	Chi-squared test for given probabilities

data:  observed
X-squared = 3.5417, df = 3, p-value = 0.3153

The rescale.p parameter helps when your probabilities don’t quite sum to 1 due to rounding:

# Proportions that don't sum to exactly 1
messy_probs <- c(0.33, 0.33, 0.34)  # Sums to 1.00, but sometimes you get 0.333...

chisq.test(c(40, 35, 45), p = messy_probs, rescale.p = TRUE)

Interpreting the Output

The test returns three critical values:

X-squared (χ²): The test statistic measuring deviation from expected
df: Degrees of freedom (number of categories - 1)
p-value: Probability of observing this deviation if H₀ is true

Your decision rule is straightforward:

p-value < α (typically 0.05): Reject H₀. Evidence suggests data doesn’t fit expected distribution.
p-value ≥ α: Fail to reject H₀. No evidence against the expected distribution.

Extract individual components from the test object for programmatic use:

# Run test and store result
result <- chisq.test(c(70, 55, 45, 30), p = c(0.40, 0.30, 0.20, 0.10))

# Extract components
result$statistic    # Chi-square value
# X-squared 
#  3.541667

result$parameter    # Degrees of freedom
# df 
#  3

result$p.value      # P-value
# [1] 0.3152847

result$observed     # Observed frequencies
# [1] 70 55 45 30

result$expected     # Expected frequencies (calculated from n * p)
# [1] 80 60 40 20

result$residuals    # Pearson residuals
# [1] -1.118034 -0.645497  0.790569  2.236068

Pearson residuals help identify which categories deviate most. Values beyond ±2 indicate substantial departure from expected frequencies.

Worked Example: Website Traffic Distribution

Let’s work through a complete example. A marketing team believes website traffic should be distributed across five channels according to their ad spend allocation:

Organic Search: 35%
Paid Search: 25%
Social Media: 20%
Email: 15%
Direct: 5%

Last month, they observed 500 visits distributed as follows:

# Set up the analysis
channels <- c("Organic", "Paid", "Social", "Email", "Direct")
observed_visits <- c(160, 140, 95, 80, 25)
expected_allocation <- c(0.35, 0.25, 0.20, 0.15, 0.05)

# Name the vector for clearer output
names(observed_visits) <- channels

# Verify total visits
sum(observed_visits)
# [1] 500

# Run the chi-square test
traffic_test <- chisq.test(observed_visits, p = expected_allocation)
print(traffic_test)

Output:

	Chi-squared test for given probabilities

data:  observed_visits
X-squared = 5.119, df = 4, p-value = 0.2752

Interpretation: With χ² = 5.119, df = 4, and p-value = 0.275, we fail to reject the null hypothesis at α = 0.05. The observed traffic distribution is consistent with the expected allocation based on ad spend.

Let’s examine the details:

# Compare observed vs expected
comparison <- data.frame(
  Channel = channels,
  Observed = traffic_test$observed,
  Expected = traffic_test$expected,
  Residual = traffic_test$residuals
)
print(comparison)

Output:

  Channel Observed Expected   Residual
1 Organic      160      175 -1.1338934
2    Paid      140      125  1.3416408
3  Social       95      100 -0.5000000
4   Email       80       75  0.5773503
5  Direct       25       25  0.0000000

No residuals exceed ±2, confirming no single channel dramatically deviates from expectations.

Visualizing Results

Visualization makes chi-square results accessible to stakeholders. Here’s a grouped bar chart using ggplot2:

library(ggplot2)
library(tidyr)

# Create comparison data frame
viz_data <- data.frame(
  Channel = factor(channels, levels = channels),
  Observed = as.numeric(traffic_test$observed),
  Expected = as.numeric(traffic_test$expected)
)

# Reshape for ggplot
viz_long <- pivot_longer(viz_data, 
                         cols = c(Observed, Expected),
                         names_to = "Type",
                         values_to = "Visits")

# Create grouped bar chart
ggplot(viz_long, aes(x = Channel, y = Visits, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +
  scale_fill_manual(values = c("Expected" = "#2C3E50", "Observed" = "#E74C3C")) +
  labs(
    title = "Website Traffic: Observed vs Expected Distribution",
    subtitle = sprintf("χ² = %.2f, p = %.3f", 
                      traffic_test$statistic, 
                      traffic_test$p.value),
    x = "Traffic Channel",
    y = "Number of Visits"
  ) +
  theme_minimal() +
  theme(legend.position = "top")

For base R users, here’s a simpler alternative:

# Base R grouped barplot
bar_matrix <- rbind(traffic_test$observed, traffic_test$expected)
rownames(bar_matrix) <- c("Observed", "Expected")

barplot(bar_matrix, 
        beside = TRUE, 
        col = c("#E74C3C", "#2C3E50"),
        legend = TRUE,
        main = "Traffic Distribution: Observed vs Expected",
        xlab = "Channel",
        ylab = "Visits")

Common Pitfalls and Best Practices

Problem 1: Expected frequencies below 5

When expected counts fall below 5, the chi-square approximation becomes unreliable. You have two options:

Option A: Combine categories

# Original data with sparse category
observed_sparse <- c(45, 38, 8, 6, 3)  # Last category too small
expected_sparse <- c(0.40, 0.35, 0.10, 0.10, 0.05)

# Check expected counts
sum(observed_sparse) * expected_sparse
# [1] 40.0 35.0 10.0 10.0  5.0  # Last one is borderline

# Combine last two categories
observed_combined <- c(45, 38, 8, 9)  # 6 + 3 = 9
expected_combined <- c(0.40, 0.35, 0.10, 0.15)  # 0.10 + 0.05 = 0.15

chisq.test(observed_combined, p = expected_combined)

Option B: Use Monte Carlo simulation

# Simulated p-value for small samples
chisq.test(observed_sparse, 
           p = expected_sparse, 
           simulate.p.value = TRUE, 
           B = 10000)

Output:

	Chi-squared test for given probabilities with simulated p-value
	(based on 10000 replicates)

data:  observed_sparse
X-squared = 4.125, df = NA, p-value = 0.3886

The B parameter controls the number of Monte Carlo replicates. Use at least 2000 for stable results; 10000 is better for publication.

Problem 2: Confusing goodness of fit with independence tests

The goodness of fit test examines one variable against a theoretical distribution. If you’re comparing two categorical variables against each other, you need the chi-square test of independence—same function, different input structure (a contingency table instead of a vector).

Problem 3: Treating percentages as frequencies

# WRONG: Using percentages
wrong_data <- c(35, 25, 20, 15, 5)  # These are percentages, not counts
chisq.test(wrong_data)  # Runs but results are meaningless

# RIGHT: Use actual counts
correct_data <- c(175, 125, 100, 75, 25)  # Actual frequencies
chisq.test(correct_data)

Best practice checklist:

Verify expected frequencies ≥ 5 before interpreting results
Always examine residuals to identify problematic categories
Report the test statistic, degrees of freedom, and p-value together
Use simulation for small samples or sparse categories
Visualize results—stakeholders understand bar charts better than p-values

The chi-square goodness of fit test remains one of the most practical tools for validating distributional assumptions. Master these fundamentals in R, and you’ll have a reliable method for testing whether your categorical data behaves as expected.

Introduction to Chi-Square Goodness of Fit

Preparing Your Data in R

Running the Test with chisq.test()

Interpreting the Output

Worked Example: Website Traffic Distribution

Visualizing Results

Common Pitfalls and Best Practices

Liked this? There's more.

Similar Articles