How to Perform a One-Way ANOVA in R
One-way ANOVA (Analysis of Variance) answers a simple question: do the means of three or more independent groups differ significantly? You could run multiple t-tests, but that inflates your Type I...
Key Insights
- One-way ANOVA tests whether the means of three or more groups differ significantly, but you must verify normality and homogeneity of variance assumptions before trusting the results.
- A significant ANOVA result only tells you that at least one group differs—you need post-hoc tests like Tukey’s HSD to identify which specific groups differ from each other.
- When assumptions are violated, use Welch’s ANOVA for unequal variances or the Kruskal-Wallis test for non-normal data instead of forcing a standard ANOVA.
Introduction to One-Way ANOVA
One-way ANOVA (Analysis of Variance) answers a simple question: do the means of three or more independent groups differ significantly? You could run multiple t-tests, but that inflates your Type I error rate. ANOVA handles this by testing all groups simultaneously.
Use one-way ANOVA when you have:
- One continuous dependent variable (e.g., crop yield, test scores, reaction time)
- One categorical independent variable with three or more levels (e.g., fertilizer type, teaching method, drug dosage)
- Independent observations across groups
Consider a practical example: an agricultural researcher tests three fertilizer types on wheat yield. They apply Fertilizer A, B, or C to different plots and measure yield in bushels per acre. One-way ANOVA determines whether fertilizer type significantly affects yield.
Before running ANOVA, understand its assumptions:
- Independence: Observations within and between groups are independent
- Normality: The dependent variable is approximately normally distributed within each group
- Homogeneity of variance: Variance is roughly equal across groups
Violating these assumptions doesn’t automatically invalidate your analysis, but severe violations require alternative approaches.
Preparing Your Data
ANOVA requires data in long format: one column for the response variable and one column for the grouping variable. Each row represents a single observation.
R includes the PlantGrowth dataset, which is perfect for demonstrating one-way ANOVA. It contains plant weights under three treatment conditions.
# Load the dataset
data("PlantGrowth")
# Examine the structure
str(PlantGrowth)
'data.frame': 30 obs. of 2 variables:
$ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
$ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
# Summary statistics
summary(PlantGrowth)
weight group
Min. :3.590 ctrl:10
1st Qu.:4.550 trt1:10
Median :5.155 trt2:10
Mean :5.073
3rd Qu.:5.530
Max. :6.310
The dataset has 30 observations: 10 plants in each of three groups (control, treatment 1, treatment 2). This balanced design simplifies interpretation.
For your own data, ensure the grouping variable is a factor:
# If loading from CSV
# df <- read.csv("your_data.csv")
# df$group <- as.factor(df$group)
# Check group means
aggregate(weight ~ group, data = PlantGrowth, FUN = mean)
group weight
1 ctrl 5.032
2 trt1 4.661
3 trt2 5.526
Treatment 2 shows the highest mean weight, but is this difference statistically significant? That’s what ANOVA will tell us.
Checking ANOVA Assumptions
Never skip assumption checking. Invalid assumptions can lead to incorrect conclusions.
Testing Normality
The Shapiro-Wilk test checks whether data deviates significantly from a normal distribution. Apply it to each group separately:
# Shapiro-Wilk test by group
by(PlantGrowth$weight, PlantGrowth$group, shapiro.test)
PlantGrowth$group: ctrl
Shapiro-Wilk normality test
data: dd[x, ]
W = 0.95668, p-value = 0.7475
PlantGrowth$group: trt1
Shapiro-Wilk normality test
data: dd[x, ]
W = 0.93041, p-value = 0.4519
PlantGrowth$group: trt2
Shapiro-Wilk normality test
data: dd[x, ]
W = 0.94101, p-value = 0.5643
All p-values exceed 0.05, so we fail to reject the null hypothesis of normality. The data appears normally distributed within each group.
Visualize normality with Q-Q plots:
library(ggplot2)
ggplot(PlantGrowth, aes(sample = weight)) +
stat_qq() +
stat_qq_line(color = "red") +
facet_wrap(~group) +
labs(title = "Q-Q Plots by Treatment Group",
x = "Theoretical Quantiles",
y = "Sample Quantiles") +
theme_minimal()
Points should fall approximately along the red line. Deviations at the tails are common with small samples and usually acceptable.
Testing Homogeneity of Variance
Levene’s test checks whether group variances are equal:
library(car)
leveneTest(weight ~ group, data = PlantGrowth)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 1.1192 0.3412
27
The p-value of 0.34 exceeds 0.05, indicating no significant difference in variances. The homogeneity assumption holds.
Alternatively, use Bartlett’s test for normally distributed data:
bartlett.test(weight ~ group, data = PlantGrowth)
Both assumptions are satisfied. We can proceed with standard one-way ANOVA.
Running the One-Way ANOVA
Use the aov() function to fit the ANOVA model:
# Fit the ANOVA model
anova_model <- aov(weight ~ group, data = PlantGrowth)
# View the results
summary(anova_model)
Df Sum Sq Mean Sq F value Pr(>F)
group 2 3.766 1.8832 4.846 0.0159 *
Residuals 27 10.492 0.3886
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting the ANOVA Table
- Df (Degrees of Freedom): Group has 2 (number of groups minus 1), Residuals has 27 (total observations minus number of groups)
- Sum Sq: Variability explained by group differences (3.766) versus unexplained variability (10.492)
- Mean Sq: Sum of squares divided by degrees of freedom
- F value: Ratio of between-group variance to within-group variance (4.846)
- Pr(>F): The p-value (0.0159)
The p-value of 0.0159 is less than 0.05, so we reject the null hypothesis. At least one group mean differs significantly from the others.
But which groups differ? ANOVA doesn’t tell us. We need post-hoc analysis.
Post-Hoc Analysis
A significant ANOVA result means “something differs somewhere.” Tukey’s Honest Significant Difference (HSD) test performs pairwise comparisons while controlling for multiple testing.
# Tukey's HSD test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = weight ~ group, data = PlantGrowth)
$group
diff lwr upr p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
Interpreting Tukey Results
Each row compares two groups:
- diff: Difference in means
- lwr/upr: 95% confidence interval bounds
- p adj: Adjusted p-value
The results reveal:
- trt1 vs ctrl: No significant difference (p = 0.39)
- trt2 vs ctrl: No significant difference (p = 0.20)
- trt2 vs trt1: Significant difference (p = 0.012)
Treatment 2 produces significantly higher plant weights than Treatment 1, but neither treatment differs significantly from the control. This nuanced finding wouldn’t emerge from the overall ANOVA alone.
Visualize the Tukey results:
# Plot confidence intervals
plot(tukey_results, las = 1)
Intervals that don’t cross zero indicate significant differences.
Visualizing Results
Clear visualizations communicate findings better than tables alone.
Boxplot with Individual Points
ggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
geom_boxplot(alpha = 0.7, outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.5) +
labs(title = "Plant Weight by Treatment Group",
x = "Treatment",
y = "Weight (g)") +
theme_minimal() +
theme(legend.position = "none") +
scale_fill_brewer(palette = "Set2")
Mean Plot with Error Bars
ggplot(PlantGrowth, aes(x = group, y = weight)) +
stat_summary(fun = mean, geom = "point", size = 3) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
labs(title = "Mean Plant Weight by Treatment (±SE)",
x = "Treatment",
y = "Weight (g)") +
theme_minimal()
Adding Significance Annotations
The ggpubr package simplifies adding statistical annotations:
library(ggpubr)
ggboxplot(PlantGrowth, x = "group", y = "weight",
fill = "group", palette = "Set2") +
stat_compare_means(method = "anova", label.y = 6.5) +
stat_compare_means(comparisons = list(c("trt1", "trt2")),
method = "t.test", label = "p.signif")
Conclusion and Alternatives
The one-way ANOVA workflow follows these steps:
- Prepare data in long format with a factor grouping variable
- Check normality (Shapiro-Wilk) and homogeneity of variance (Levene’s test)
- Run ANOVA with
aov()and examine the F-statistic and p-value - If significant, perform Tukey’s HSD for pairwise comparisons
- Visualize results with boxplots or mean plots
When assumptions fail, use alternatives:
Welch’s ANOVA handles unequal variances:
oneway.test(weight ~ group, data = PlantGrowth, var.equal = FALSE)
Kruskal-Wallis test is the non-parametric alternative for non-normal data:
kruskal.test(weight ~ group, data = PlantGrowth)
Follow significant Kruskal-Wallis results with Dunn’s test for pairwise comparisons:
library(dunn.test)
dunn.test(PlantGrowth$weight, PlantGrowth$group, method = "bonferroni")
One-way ANOVA remains a foundational statistical tool. Master it, understand its limitations, and know when to reach for alternatives. Your analyses will be more robust and your conclusions more defensible.