How to Plot the Chi-Square Distribution in R
The chi-square (χ²) distribution is one of the workhorses of statistical inference. You'll encounter it when running goodness-of-fit tests, testing independence in contingency tables, and...
Key Insights
- The chi-square distribution is defined by a single parameter—degrees of freedom—which controls both its shape and spread, making it essential to visualize how different df values affect your statistical tests.
- R provides four core functions (
dchisq(),pchisq(),qchisq(),rchisq()) that handle every chi-square calculation you’ll need, from plotting densities to finding critical values. - Shading rejection regions on chi-square plots transforms abstract p-values into intuitive visual representations, making your statistical results far easier to communicate and interpret.
Introduction to the Chi-Square Distribution
The chi-square (χ²) distribution is one of the workhorses of statistical inference. You’ll encounter it when running goodness-of-fit tests, testing independence in contingency tables, and constructing confidence intervals for variance. Understanding how to visualize this distribution isn’t just academic—it’s a practical skill that helps you interpret test results and communicate findings.
The chi-square distribution has several distinctive properties. It’s defined by a single parameter: degrees of freedom (df). It only takes non-negative values (the distribution starts at zero). And it’s right-skewed, though this skewness decreases as degrees of freedom increase. At high df values, it begins to approximate a normal distribution.
When you plot chi-square distributions with different degrees of freedom, you’ll see how the shape transforms from highly skewed (low df) to nearly symmetric (high df). This visual intuition matters when you’re interpreting test statistics and deciding whether observed values fall in rejection regions.
Understanding Key R Functions for Chi-Square
R provides four functions for working with the chi-square distribution, following its standard naming convention for probability distributions:
# dchisq(): Density function (PDF) - height of curve at point x
dchisq(x = 5, df = 3)
# [1] 0.08065691
# pchisq(): Distribution function (CDF) - P(X <= x)
pchisq(q = 5, df = 3)
# [1] 0.8282029
# qchisq(): Quantile function - inverse of CDF
qchisq(p = 0.95, df = 3)
# [1] 7.814728
# rchisq(): Random generation - simulate chi-square values
set.seed(42)
rchisq(n = 5, df = 3)
# [1] 1.5119616 4.4552640 1.5895625 0.3315563 5.9200227
Here’s when to use each:
dchisq(): Use this for plotting density curves. It gives you the y-value (probability density) for any x-value.pchisq(): Use this to calculate p-values or cumulative probabilities. Essential for hypothesis testing.qchisq(): Use this to find critical values. Pass in your significance level to get the threshold for rejection.rchisq(): Use this for simulations and Monte Carlo methods.
Plotting the Probability Density Function (PDF)
Let’s start with base R plotting. The curve() function pairs naturally with dchisq() for quick visualizations:
# Basic chi-square density plot with df = 5
curve(dchisq(x, df = 5),
from = 0,
to = 20,
main = "Chi-Square Distribution (df = 5)",
xlab = "x",
ylab = "Density",
col = "steelblue",
lwd = 2)
This gives you a clean density curve. But the real insight comes from comparing multiple degrees of freedom on one plot:
# Compare multiple degrees of freedom
df_values <- c(2, 4, 6, 10, 15)
colors <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
# Set up the plot with the first curve
curve(dchisq(x, df = df_values[1]),
from = 0,
to = 30,
ylim = c(0, 0.25),
main = "Chi-Square Distributions by Degrees of Freedom",
xlab = "x",
ylab = "Density",
col = colors[1],
lwd = 2)
# Add remaining curves
for (i in 2:length(df_values)) {
curve(dchisq(x, df = df_values[i]),
add = TRUE,
col = colors[i],
lwd = 2)
}
# Add legend
legend("topright",
legend = paste("df =", df_values),
col = colors,
lwd = 2,
bty = "n")
Notice how the distribution shifts right and becomes more symmetric as df increases. The peak (mode) of a chi-square distribution occurs at df - 2 for df ≥ 2, which explains this rightward shift.
Plotting the Cumulative Distribution Function (CDF)
The CDF shows the probability that a chi-square random variable takes a value less than or equal to x. This is directly useful for understanding p-values:
# CDF plot for chi-square with df = 5
curve(pchisq(x, df = 5),
from = 0,
to = 20,
main = "Chi-Square CDF (df = 5)",
xlab = "x",
ylab = "Cumulative Probability",
col = "darkred",
lwd = 2)
# Add reference lines for common significance levels
abline(h = 0.95, lty = 2, col = "gray50")
abline(v = qchisq(0.95, df = 5), lty = 2, col = "gray50")
# Annotate the critical value
text(qchisq(0.95, df = 5) + 1, 0.5,
paste("Critical value =", round(qchisq(0.95, df = 5), 2)),
pos = 4)
The horizontal line at 0.95 intersects the CDF at the critical value for a 5% significance level. Any test statistic beyond this point falls in the rejection region.
Enhanced Visualizations with ggplot2
For publication-quality graphics, ggplot2 offers more control and cleaner aesthetics. The stat_function() layer is your tool for plotting mathematical functions:
library(ggplot2)
# Basic ggplot2 chi-square density
ggplot(data.frame(x = c(0, 25)), aes(x = x)) +
stat_function(fun = dchisq, args = list(df = 5),
color = "steelblue", linewidth = 1.2) +
labs(title = "Chi-Square Distribution (df = 5)",
x = "x",
y = "Density") +
theme_minimal()
Now let’s create something more useful: a density plot with the rejection region shaded. This visualization makes p-values tangible:
library(ggplot2)
df <- 5
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = df)
# Create the plot with shaded rejection region
ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
# Main density curve
stat_function(fun = dchisq, args = list(df = df),
color = "steelblue", linewidth = 1.2) +
# Shaded rejection region
stat_function(fun = dchisq, args = list(df = df),
xlim = c(critical_value, 20),
geom = "area",
fill = "tomato",
alpha = 0.4) +
# Mark critical value
geom_vline(xintercept = critical_value,
linetype = "dashed",
color = "darkred") +
# Annotation
annotate("text", x = critical_value + 2, y = 0.12,
label = paste("Critical value =", round(critical_value, 2)),
hjust = 0) +
annotate("text", x = critical_value + 3, y = 0.02,
label = paste("α =", alpha),
color = "darkred") +
labs(title = "Chi-Square Distribution with Rejection Region",
subtitle = paste("df =", df, "| α =", alpha),
x = "Chi-Square Statistic",
y = "Density") +
theme_minimal()
Practical Application: Visualizing Test Results
Here’s a complete example that ties everything together. Suppose you’ve run a chi-square test and want to visualize where your test statistic falls:
library(ggplot2)
# Simulate a chi-square test scenario
# Example: Testing independence in a contingency table
observed_statistic <- 9.8
df <- 4
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = df)
p_value <- 1 - pchisq(observed_statistic, df = df)
# Build the visualization
ggplot(data.frame(x = c(0, 18)), aes(x = x)) +
# Density curve
stat_function(fun = dchisq, args = list(df = df),
color = "gray30", linewidth = 1) +
# Rejection region
stat_function(fun = dchisq, args = list(df = df),
xlim = c(critical_value, 18),
geom = "area",
fill = "firebrick",
alpha = 0.3) +
# Area beyond observed statistic (p-value region)
stat_function(fun = dchisq, args = list(df = df),
xlim = c(observed_statistic, 18),
geom = "area",
fill = "steelblue",
alpha = 0.5) +
# Critical value line
geom_vline(xintercept = critical_value,
linetype = "dashed",
color = "firebrick",
linewidth = 0.8) +
# Observed statistic line
geom_vline(xintercept = observed_statistic,
linetype = "solid",
color = "steelblue",
linewidth = 1.2) +
# Annotations
annotate("text", x = critical_value, y = 0.18,
label = paste("Critical value\n", round(critical_value, 2)),
hjust = 1.1, size = 3.5, color = "firebrick") +
annotate("text", x = observed_statistic, y = 0.15,
label = paste("Observed\nχ² =", observed_statistic),
hjust = -0.1, size = 3.5, color = "steelblue") +
annotate("text", x = 14, y = 0.08,
label = paste("p-value =", round(p_value, 4)),
size = 4, fontface = "bold") +
labs(title = "Chi-Square Test Result Visualization",
subtitle = paste("df =", df, "| α =", alpha,
"| Decision:", ifelse(observed_statistic > critical_value,
"Reject H₀", "Fail to reject H₀")),
x = "Chi-Square Statistic",
y = "Density") +
theme_minimal() +
theme(plot.subtitle = element_text(face = "italic"))
This visualization immediately shows that the observed statistic (9.8) exceeds the critical value (9.49), placing it in the rejection region. The blue shaded area represents the p-value—the probability of observing a test statistic this extreme or more extreme under the null hypothesis.
Summary
Here’s a quick reference for chi-square plotting in R:
| Task | Function | Example |
|---|---|---|
| Plot density (PDF) | dchisq() |
curve(dchisq(x, df=5), 0, 20) |
| Plot CDF | pchisq() |
curve(pchisq(x, df=5), 0, 20) |
| Find critical value | qchisq() |
qchisq(0.95, df=5) |
| Calculate p-value | pchisq() |
1 - pchisq(stat, df=5) |
| ggplot2 density | stat_function() |
stat_function(fun=dchisq, args=list(df=5)) |
The chi-square distribution appears throughout statistical practice. By mastering these visualization techniques, you transform abstract test statistics into interpretable graphics. Start with base R for quick exploration, then move to ggplot2 when you need polished output for reports or publications. The shaded rejection region technique is particularly valuable—it makes the connection between significance levels, critical values, and p-values visually explicit.