How to Perform the Shapiro-Wilk Test in R
Many statistical methods—t-tests, ANOVA, linear regression—assume your data follows a normal distribution. Violate this assumption badly enough, and your p-values become unreliable. The Shapiro-Wilk...
Key Insights
- The Shapiro-Wilk test is the most powerful normality test for small to medium samples (n < 5000), but should always be paired with visual inspection since statistical significance doesn’t equal practical significance.
- A non-significant p-value (> 0.05) means you fail to reject normality—it doesn’t prove your data is normal, only that there’s insufficient evidence against normality.
- For large samples, the test becomes overly sensitive and will reject normality for trivial deviations; rely more heavily on Q-Q plots and consider whether your analysis is robust to minor departures from normality.
Introduction to the Shapiro-Wilk Test
Many statistical methods—t-tests, ANOVA, linear regression—assume your data follows a normal distribution. Violate this assumption badly enough, and your p-values become unreliable. The Shapiro-Wilk test gives you a formal way to check.
Developed in 1965 by Samuel Shapiro and Martin Wilk, this test compares your sample’s distribution against a theoretical normal distribution. It’s particularly good at detecting departures from normality in small samples, making it the go-to choice when you have fewer than 5,000 observations.
Use the Shapiro-Wilk test when:
- You need to verify normality assumptions before running parametric tests
- Your sample size is between 3 and 5,000 observations
- You want a formal statistical test to complement visual inspection
Don’t use it as your only diagnostic. A histogram that looks like a camel’s back tells you more than a p-value ever will.
Understanding the Hypothesis Framework
The Shapiro-Wilk test uses a straightforward hypothesis structure:
- Null hypothesis (H₀): The data comes from a normally distributed population
- Alternative hypothesis (H₁): The data does not come from a normally distributed population
The test produces two key outputs:
The W statistic ranges from 0 to 1. Values close to 1 indicate the data closely matches a normal distribution. Lower values suggest departures from normality. Think of W as a correlation coefficient between your data and the expected normal values.
The p-value tells you the probability of observing your data (or more extreme) if it truly came from a normal distribution. The interpretation follows standard hypothesis testing logic:
- p > 0.05: Fail to reject H₀. No significant evidence against normality.
- p ≤ 0.05: Reject H₀. Significant evidence that data is not normal.
The 0.05 threshold is conventional, not sacred. In exploratory analysis, you might accept more risk. For critical applications, you might demand stricter evidence.
Basic Implementation with shapiro.test()
R’s built-in shapiro.test() function makes normality testing trivial. Here’s the simplest case:
# Generate some normally distributed data
set.seed(42)
normal_data <- rnorm(100, mean = 50, sd = 10)
# Run the Shapiro-Wilk test
shapiro.test(normal_data)
Output:
Shapiro-Wilk normality test
data: normal_data
W = 0.99404, p-value = 0.9408
With W = 0.994 and p = 0.94, we have no evidence against normality. Exactly what we’d expect from rnorm().
Now let’s test non-normal data:
# Exponentially distributed data (definitely not normal)
skewed_data <- rexp(100, rate = 0.5)
shapiro.test(skewed_data)
Output:
Shapiro-Wilk normality test
data: skewed_data
W = 0.87234, p-value = 1.234e-07
The low W (0.872) and tiny p-value confirm what we already know: exponential data isn’t normal.
When working with dataframes, extract the column you need:
# Testing a column from a dataframe
data(mtcars)
shapiro.test(mtcars$mpg)
The output is an object of class htest. You can extract components programmatically:
result <- shapiro.test(mtcars$mpg)
result$statistic # W value
result$p.value # p-value
result$method # Test name
Visualizing Normality Alongside the Test
Never trust a single diagnostic. Combine the Shapiro-Wilk test with visual inspection for reliable conclusions.
Histogram with normal curve overlay:
# Create histogram with normal curve
hist_with_normal <- function(x, main = "Histogram with Normal Curve") {
h <- hist(x, breaks = "Sturges", col = "lightblue",
main = main, xlab = "Value", freq = FALSE)
# Overlay normal curve
x_range <- seq(min(x), max(x), length.out = 100)
y_normal <- dnorm(x_range, mean = mean(x), sd = sd(x))
lines(x_range, y_normal, col = "red", lwd = 2)
# Add Shapiro-Wilk result to plot
sw_test <- shapiro.test(x)
legend("topright",
legend = paste("W =", round(sw_test$statistic, 4),
"\np =", round(sw_test$p.value, 4)),
bty = "n")
}
hist_with_normal(mtcars$mpg)
Q-Q plot for normality assessment:
The Q-Q plot is your most powerful visual tool. Points falling along the diagonal line indicate normality; systematic deviations reveal the type of non-normality.
# Q-Q plot with reference line
qq_plot_with_test <- function(x, main = "Normal Q-Q Plot") {
qqnorm(x, main = main, pch = 19, col = "steelblue")
qqline(x, col = "red", lwd = 2)
# Add test results
sw_test <- shapiro.test(x)
mtext(paste("Shapiro-Wilk: W =", round(sw_test$statistic, 4),
", p =", round(sw_test$p.value, 4)),
side = 3, line = 0.5, cex = 0.8)
}
par(mfrow = c(1, 2))
qq_plot_with_test(normal_data, "Normal Data")
qq_plot_with_test(skewed_data, "Skewed Data")
par(mfrow = c(1, 1))
Interpretation patterns:
- S-shaped curve: Heavy tails (leptokurtic)
- Inverted S: Light tails (platykurtic)
- Curved up at both ends: Right skew
- Curved down at both ends: Left skew
Testing Multiple Groups or Variables
Real-world analysis often requires testing normality across groups or multiple variables simultaneously.
Using tapply() for grouped data:
# Test normality of mpg for each cylinder group
tapply(mtcars$mpg, mtcars$cyl, function(x) {
test <- shapiro.test(x)
c(W = test$statistic, p.value = test$p.value)
})
Using dplyr for a cleaner workflow:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarise(
n = n(),
W = shapiro.test(mpg)$statistic,
p_value = shapiro.test(mpg)$p.value,
normal = ifelse(p_value > 0.05, "Yes", "No")
)
Testing multiple columns:
# Test all numeric columns in a dataframe
test_all_columns <- function(df) {
numeric_cols <- sapply(df, is.numeric)
results <- lapply(df[, numeric_cols], function(col) {
if (length(col) >= 3 && length(col) <= 5000) {
test <- shapiro.test(col)
return(c(W = test$statistic, p.value = test$p.value))
} else {
return(c(W = NA, p.value = NA))
}
})
do.call(rbind, results)
}
test_all_columns(mtcars)
Common Pitfalls and Best Practices
The large sample problem: With large samples, the Shapiro-Wilk test detects trivial departures from normality that have no practical significance. A dataset of 10,000 observations might produce p < 0.001 despite being “normal enough” for any reasonable analysis.
# Demonstrating oversensitivity with large samples
set.seed(123)
large_sample <- rnorm(3000) # Perfectly normal data
shapiro.test(large_sample) # Might still reject!
For n > 1000, weight visual inspection more heavily than the p-value.
Handling missing values:
shapiro.test() doesn’t have an na.rm argument—it throws an error on NA values. Clean your data first:
# Remove NAs before testing
data_with_na <- c(1, 2, NA, 4, 5, 6, 7, 8, 9, 10)
shapiro.test(na.omit(data_with_na))
# Or use complete.cases for dataframes
clean_data <- mtcars[complete.cases(mtcars$mpg), ]
shapiro.test(clean_data$mpg)
When normality fails—your options:
- Transform the data: Log, square root, or Box-Cox transformations often normalize skewed data
- Use non-parametric alternatives: Wilcoxon test instead of t-test, Kruskal-Wallis instead of ANOVA
- Proceed anyway: Many tests are robust to moderate violations, especially with larger samples
- Bootstrap: Resampling methods don’t require normality assumptions
Practical Example: Real-World Workflow
Here’s a complete workflow for checking regression assumptions:
# Fit a linear model
model <- lm(mpg ~ wt + hp + qsec, data = mtcars)
# Extract residuals
residuals <- resid(model)
# Step 1: Visual inspection
par(mfrow = c(1, 2))
# Histogram
hist(residuals, breaks = 10, col = "lightblue",
main = "Residual Distribution", xlab = "Residuals", freq = FALSE)
curve(dnorm(x, mean = mean(residuals), sd = sd(residuals)),
add = TRUE, col = "red", lwd = 2)
# Q-Q plot
qqnorm(residuals, pch = 19, col = "steelblue")
qqline(residuals, col = "red", lwd = 2)
par(mfrow = c(1, 1))
# Step 2: Formal test
sw_result <- shapiro.test(residuals)
print(sw_result)
# Step 3: Decision logic
if (sw_result$p.value > 0.05) {
message("✓ Residuals appear normally distributed (p = ",
round(sw_result$p.value, 4), ")")
message(" Proceed with standard inference")
} else {
message("✗ Evidence against normality (p = ",
round(sw_result$p.value, 4), ")")
message(" Consider: robust standard errors, transformation, or bootstrapping")
}
Decision tree based on results:
- p > 0.10 and Q-Q plot looks good: Proceed confidently with parametric methods
- 0.05 < p < 0.10: Borderline—check Q-Q plot carefully, consider sample size
- p < 0.05 but Q-Q plot shows minor deviations: For robust methods (like regression with n > 30), often acceptable
- p < 0.05 with obvious visual non-normality: Transform data or switch to non-parametric methods
The Shapiro-Wilk test is a tool, not an oracle. Use it alongside visual diagnostics, consider your sample size, and remember that practical significance matters more than statistical significance. A slightly non-normal distribution rarely invalidates an otherwise sound analysis.