Fisher's Exact Test in R: Step-by-Step Guide

Key Insights

Fisher’s exact test is the gold standard for analyzing 2x2 contingency tables when sample sizes are small (typically n < 1000) or when expected cell counts fall below 5, making it essential for early-stage A/B tests and medical studies.
Unlike chi-square’s approximation, Fisher’s test calculates exact p-values using the hypergeometric distribution, giving you reliable results even with sparse data.
R’s built-in fisher.test() function handles everything from basic 2x2 tables to larger contingency tables, returning p-values, odds ratios, and confidence intervals in a single call.

Introduction to Fisher’s Exact Test

Fisher’s exact test solves a specific problem: determining whether two categorical variables are associated when your sample size is too small for chi-square approximations to be reliable. Developed by Ronald Fisher in the 1930s, it calculates the exact probability of observing your data (or more extreme data) under the null hypothesis of independence.

Use Fisher’s exact test when:

Your contingency table has expected cell counts below 5
Your total sample size is small (under 1000 observations)
You need exact p-values rather than approximations
You’re analyzing 2x2 tables (though R extends this to larger tables)

The chi-square test approximates the sampling distribution, which breaks down with small samples. Fisher’s test doesn’t approximate—it enumerates all possible tables with the same marginal totals and calculates exact probabilities.

In practice, you’ll encounter this test constantly in A/B testing (especially early experiments with limited traffic), clinical trials with rare outcomes, and quality control scenarios where defects are infrequent.

Understanding the Mathematics

Fisher’s exact test relies on the hypergeometric distribution. Given a 2x2 table with fixed row and column totals, the probability of observing exactly the values in your table follows:

$$P = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{n!a!b!c!d!}$$

Where a, b, c, d are the four cell counts and n is the total sample size.

The p-value sums probabilities of all tables as extreme or more extreme than the observed table. For a two-tailed test, “extreme” means tables with probabilities less than or equal to your observed table’s probability.

Here’s how to calculate this manually in R:

# Manual Fisher's exact probability calculation
fisher_probability <- function(a, b, c, d) {
  n <- a + b + c + d
  numerator <- factorial(a + b) * factorial(c + d) * 
               factorial(a + c) * factorial(b + d)
  denominator <- factorial(n) * factorial(a) * factorial(b) * 
                 factorial(c) * factorial(d)
  return(numerator / denominator)
}

# Example: Treatment vs. Control outcome table
# Treatment: 8 success, 2 failure
# Control: 3 success, 7 failure
observed_prob <- fisher_probability(8, 2, 3, 7)
print(paste("Probability of observed table:", round(observed_prob, 6)))
# [1] "Probability of observed table: 0.034965"

For one-tailed tests, you specify a direction: is the association positive or negative? Two-tailed tests (the default) test for any association regardless of direction.

Preparing Your Data in R

R expects contingency tables as matrices. You can create them from summary counts or raw data.

From summary counts:

# Method 1: Direct matrix creation
# Rows: Treatment group (Treatment, Control)
# Columns: Outcome (Success, Failure)
contingency_table <- matrix(
  c(8, 3, 2, 7),  # Fill by column: (8,3) then (2,7)
  nrow = 2,
  dimnames = list(
    Group = c("Treatment", "Control"),
    Outcome = c("Success", "Failure")
  )
)

print(contingency_table)
#           Outcome
# Group      Success Failure
#   Treatment       8       2
#   Control         3       7

From raw data:

# Method 2: From raw observations
raw_data <- data.frame(
  group = c(rep("Treatment", 10), rep("Control", 10)),
  outcome = c(
    rep("Success", 8), rep("Failure", 2),  # Treatment outcomes
    rep("Success", 3), rep("Failure", 7)   # Control outcomes
  )
)

# Create contingency table using table()
contingency_from_raw <- table(raw_data$group, raw_data$outcome)

# Reorder if needed (table() sorts alphabetically)
contingency_from_raw <- contingency_from_raw[c("Treatment", "Control"), 
                                              c("Success", "Failure")]
print(contingency_from_raw)

Critical requirement: your table must contain counts, not proportions or percentages. Each cell represents the number of observations falling into that combination of categories.

Running Fisher’s Exact Test

The fisher.test() function is straightforward:

# Basic Fisher's exact test
result <- fisher.test(contingency_table)
print(result)

# Output:
#   Fisher's Exact Test for Count Data
# 
# data:  contingency_table
# p-value = 0.06978
# alternative hypothesis: true odds ratio is not equal to 1
# 95 percent confidence interval:
#   0.8641831 113.2113943
# sample estimates:
# odds ratio 
#   8.369477

Key parameters you should know:

# One-tailed test: Treatment is BETTER than Control
result_greater <- fisher.test(
  contingency_table, 
  alternative = "greater"  # Tests if odds ratio > 1
)
print(paste("One-tailed p-value:", round(result_greater$p.value, 4)))
# [1] "One-tailed p-value: 0.0349"

# Custom confidence level
result_99 <- fisher.test(
  contingency_table,
  conf.level = 0.99  # 99% confidence interval
)
print(result_99$conf.int)
# [1]   0.5765174 188.7688851
# attr(,"conf.level")
# [1] 0.99

# Disable confidence interval calculation (slightly faster)
result_no_ci <- fisher.test(contingency_table, conf.int = FALSE)

Interpreting the output:

p-value: Probability of observing this association (or stronger) if variables were independent
odds ratio: How much more likely success is in Treatment vs. Control (8.37x here)
confidence interval: Range of plausible odds ratios (wide intervals indicate uncertainty)

Practical Example: A/B Testing Scenario

Let’s walk through a complete analysis. You’re testing a new checkout button design. After one week, you have limited data:

# A/B test data: New button design vs. Original
# Conversions tracked over 7 days with limited traffic

ab_test_data <- matrix(
  c(23, 12, 77, 88),  # New: 23 convert, 77 don't; Original: 12 convert, 88 don't
  nrow = 2,
  dimnames = list(
    Variant = c("New_Button", "Original"),
    Action = c("Converted", "Not_Converted")
  )
)

print(ab_test_data)
#             Action
# Variant      Converted Not_Converted
#   New_Button        23            77
#   Original          12            88

# Check expected counts (chi-square assumption check)
expected <- chisq.test(ab_test_data)$expected
print(expected)
#             Action
# Variant      Converted Not_Converted
#   New_Button      17.5          82.5
#   Original        17.5          82.5

# All expected counts > 5, but sample is small (200 total)
# Fisher's exact test is still appropriate for certainty

# Run Fisher's exact test
ab_result <- fisher.test(ab_test_data, alternative = "greater")

# Comprehensive output
cat("A/B Test Results: New Button vs. Original\n")
cat("==========================================\n")
cat(sprintf("New Button Conversion Rate: %.1f%%\n", 100 * 23/100))
cat(sprintf("Original Conversion Rate: %.1f%%\n", 100 * 12/100))
cat(sprintf("Odds Ratio: %.2f\n", ab_result$estimate))
cat(sprintf("95%% CI: [%.2f, %.2f]\n", 
            ab_result$conf.int[1], ab_result$conf.int[2]))
cat(sprintf("P-value (one-tailed): %.4f\n", ab_result$p.value))
cat(sprintf("Significant at α=0.05: %s\n", 
            ifelse(ab_result$p.value < 0.05, "YES", "NO")))

Output:

A/B Test Results: New Button vs. Original
==========================================
New Button Conversion Rate: 23.0%
Original Conversion Rate: 12.0%
Odds Ratio: 2.19
95% CI: [0.99, Inf]
P-value (one-tailed): 0.0427
Significant at α=0.05: YES

Business conclusion: The new button shows a statistically significant improvement (p = 0.043). Users are approximately 2.2 times more likely to convert with the new design. However, the confidence interval barely excludes 1, suggesting you should continue the test to narrow uncertainty.

Extensions and Alternatives

Fisher’s test extends to larger tables:

# 2x3 table: Three treatment groups
treatment_comparison <- matrix(
  c(15, 8, 12, 5, 12, 8),
  nrow = 2,
  dimnames = list(
    Outcome = c("Success", "Failure"),
    Treatment = c("Drug_A", "Drug_B", "Placebo")
  )
)

# Fisher's test works on r x c tables
result_2x3 <- fisher.test(treatment_comparison)
print(result_2x3)
# Note: No single odds ratio for tables larger than 2x2

For 2x2 tables with very small samples, Barnard’s test can be more powerful:

# Install if needed: install.packages("Barnard")
library(Barnard)

# Barnard's test - unconditional exact test
barnard_result <- barnard.test(8, 2, 3, 7)
print(barnard_result)
# Barnard's test doesn't condition on margins, sometimes giving more power

When running multiple comparisons, apply corrections:

# Multiple A/B tests - adjust p-values
p_values <- c(0.043, 0.12, 0.008, 0.51)
adjusted_p <- p.adjust(p_values, method = "BH")  # Benjamini-Hochberg
print(data.frame(original = p_values, adjusted = adjusted_p))

Summary and Best Practices

Decision checklist for Fisher’s exact test:

Is your outcome binary (or categorical with few levels)?
Is your predictor binary (or categorical with few levels)?
Do you have small samples OR expected cell counts below 5?
Do you need exact (not approximate) p-values?

If you answered yes to these, use Fisher’s exact test.

Common pitfalls:

Using proportions instead of counts in your table
Forgetting to specify alternative for directional hypotheses
Ignoring the confidence interval width when interpreting significance
Not applying multiple testing corrections when running many tests

Quick reference:

# Standard Fisher's exact test workflow
data_matrix <- matrix(c(a, b, c, d), nrow = 2)
result <- fisher.test(data_matrix, alternative = "two.sided")
print(result$p.value)      # Statistical significance
print(result$estimate)     # Effect size (odds ratio)
print(result$conf.int)     # Uncertainty range

Fisher’s exact test remains the reliable choice when chi-square assumptions fail. In R, it’s a single function call that gives you everything needed for sound statistical inference.