How to Perform Fisher's Exact Test in Python

Key Insights

Fisher’s exact test calculates exact p-values using the hypergeometric distribution, making it the gold standard for analyzing 2x2 contingency tables with small sample sizes where chi-square approximations break down.
SciPy’s fisher_exact() function returns both an odds ratio and p-value, but you must understand the difference between one-tailed and two-tailed tests to interpret results correctly.
While Fisher’s test is mathematically elegant, it’s limited to 2x2 tables—for larger contingency tables or when you need more power, consider alternatives like Barnard’s exact test or the chi-square test with Yates’ correction.

Introduction to Fisher’s Exact Test

Fisher’s exact test is a statistical significance test used to determine whether there’s a non-random association between two categorical variables in a 2x2 contingency table. Unlike the chi-square test, which relies on large-sample approximations, Fisher’s test computes exact probabilities directly from the data.

The test was developed by Ronald Fisher in the 1930s and remains essential when dealing with small sample sizes—typically when any cell in your contingency table has an expected frequency below 5. In these situations, the chi-square test’s asymptotic approximation becomes unreliable, and you need the precision that only an exact test can provide.

You’ll encounter Fisher’s exact test in several practical scenarios:

A/B testing: When you’re running experiments with limited traffic and need to determine if a conversion rate difference is statistically significant
Medical studies: Analyzing treatment outcomes in clinical trials with small patient groups
Quality control: Comparing defect rates between manufacturing processes or suppliers
Bioinformatics: Gene enrichment analysis and association studies

The key advantage is reliability. When your sample is small, Fisher’s exact test gives you a p-value you can trust.

Understanding the Mathematics

Fisher’s exact test is built on the hypergeometric distribution, which models the probability of drawing a specific number of successes from a finite population without replacement.

Consider a 2x2 contingency table:

	Outcome A	Outcome B	Row Total
Group 1	a	b	a + b
Group 2	c	d	c + d
Col Total	a + c	b + d	n

Given fixed marginal totals (row and column sums), the probability of observing exactly this table configuration is:

$$P = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{n!a!b!c!d!}$$

The p-value is calculated by summing the probabilities of all tables as extreme or more extreme than the observed table, while keeping the marginals fixed.

For one-tailed tests, you sum probabilities in one direction only (testing if one group is specifically greater or less than the other). For two-tailed tests, you sum probabilities in both directions, which is appropriate when you’re testing for any difference without a directional hypothesis.

Here’s how you’d manually calculate the probability for a simple table:

from math import factorial
from functools import reduce

def fisher_exact_probability(a, b, c, d):
    """
    Calculate the exact probability of observing this specific 2x2 table
    given the marginal totals, using the hypergeometric distribution.
    """
    numerator = (
        factorial(a + b) * 
        factorial(c + d) * 
        factorial(a + c) * 
        factorial(b + d)
    )
    denominator = (
        factorial(a + b + c + d) * 
        factorial(a) * 
        factorial(b) * 
        factorial(c) * 
        factorial(d)
    )
    return numerator / denominator

# Example: A clinical trial table
# Treatment group: 8 recovered, 2 did not
# Control group: 3 recovered, 7 did not
a, b, c, d = 8, 2, 3, 7
prob = fisher_exact_probability(a, b, c, d)
print(f"Probability of this exact table: {prob:.6f}")
# Output: Probability of this exact table: 0.034965

This gives you the probability of observing exactly this configuration—but for a p-value, you need to sum all equally or more extreme tables.

Implementing Fisher’s Exact Test with SciPy

In practice, you should use SciPy’s optimized implementation rather than rolling your own. The scipy.stats.fisher_exact() function handles all the complexity for you:

import numpy as np
from scipy import stats

# Create the contingency table as a 2D array
# Rows: Treatment vs Control
# Columns: Recovered vs Not Recovered
table = np.array([
    [8, 2],   # Treatment: 8 recovered, 2 didn't
    [3, 7]    # Control: 3 recovered, 7 didn't
])

# Run Fisher's exact test (two-tailed by default)
odds_ratio, p_value = stats.fisher_exact(table)

print(f"Odds Ratio: {odds_ratio:.3f}")
print(f"P-value: {p_value:.4f}")
# Output:
# Odds Ratio: 9.333
# P-value: 0.0350

Interpreting the results:

Odds ratio: Measures the strength of association. An odds ratio of 9.333 means the odds of recovery in the treatment group are about 9.3 times higher than in the control group. An odds ratio of 1 indicates no association.
P-value: The probability of observing data as extreme as this (or more extreme) if there were no true association. At p = 0.035, we’d reject the null hypothesis at the α = 0.05 significance level.

You can also specify the alternative hypothesis for one-tailed tests:

# One-tailed: Is treatment GREATER than control?
_, p_greater = stats.fisher_exact(table, alternative='greater')

# One-tailed: Is treatment LESS than control?
_, p_less = stats.fisher_exact(table, alternative='less')

print(f"P-value (greater): {p_greater:.4f}")
print(f"P-value (less): {p_less:.4f}")
# Output:
# P-value (greater): 0.0176
# P-value (less): 0.9965

Practical Example: A/B Testing Analysis

Let’s work through a realistic A/B testing scenario. You’re testing a new checkout button design and have collected conversion data:

import pandas as pd
from scipy import stats

# Simulated A/B test data
data = {
    'variant': ['control'] * 150 + ['treatment'] * 150,
    'converted': (
        [1] * 12 + [0] * 138 +  # Control: 12 conversions out of 150
        [1] * 22 + [0] * 128     # Treatment: 22 conversions out of 150
    )
}
df = pd.DataFrame(data)

# Create contingency table using pandas crosstab
contingency_table = pd.crosstab(
    df['variant'], 
    df['converted'],
    margins=True
)
print("Contingency Table:")
print(contingency_table)
print()

# Extract the 2x2 table (excluding margins)
table_array = contingency_table.iloc[0:2, 0:2].values

# Run Fisher's exact test
odds_ratio, p_value = stats.fisher_exact(table_array)

# Calculate conversion rates for context
control_rate = df[df['variant'] == 'control']['converted'].mean()
treatment_rate = df[df['variant'] == 'treatment']['converted'].mean()
relative_lift = (treatment_rate - control_rate) / control_rate * 100

print(f"Control conversion rate: {control_rate:.2%}")
print(f"Treatment conversion rate: {treatment_rate:.2%}")
print(f"Relative lift: {relative_lift:.1f}%")
print(f"\nFisher's Exact Test Results:")
print(f"  Odds Ratio: {odds_ratio:.3f}")
print(f"  P-value: {p_value:.4f}")
print(f"  Significant at α=0.05: {p_value < 0.05}")

Output:

Contingency Table:
converted    0   1  All
variant                 
control    138  12  150
treatment  128  22  150
All        266  34  300

Control conversion rate: 8.00%
Treatment conversion rate: 14.67%
Relative lift: 83.3%

Fisher's Exact Test Results:
  Odds Ratio: 1.976
  P-value: 0.0726
  Significant at α=0.05: False

Despite an 83% relative lift, the result isn’t statistically significant at α = 0.05. This is the value of Fisher’s exact test—it prevents you from drawing false conclusions from small samples.

Handling Edge Cases and Limitations

Fisher’s exact test has important limitations you should understand before applying it.

Tables larger than 2x2: Fisher’s test only works for 2x2 tables. For larger tables, consider the Freeman-Halton extension or Barnard’s exact test:

from scipy.stats import barnard_exact

# Barnard's test is more powerful than Fisher's for 2x2 tables
# but is computationally more expensive
table = np.array([[8, 2], [3, 7]])
result = barnard_exact(table)
print(f"Barnard's exact test p-value: {result.pvalue:.4f}")

Input validation: Always validate your data before running the test:

def validate_contingency_table(table):
    """
    Validate a 2x2 contingency table before running Fisher's exact test.
    Returns (is_valid, message).
    """
    table = np.asarray(table)
    
    # Check dimensions
    if table.shape != (2, 2):
        return False, f"Table must be 2x2, got {table.shape}"
    
    # Check for negative values
    if np.any(table < 0):
        return False, "Table cannot contain negative values"
    
    # Check for non-integer values (warning, not error)
    if not np.allclose(table, table.astype(int)):
        return False, "Table should contain integer counts"
    
    # Warn about zero cells
    if np.any(table == 0):
        return True, "Warning: Table contains zero cells (test still valid)"
    
    # Check if sample is large enough that chi-square might be preferred
    n = table.sum()
    if n > 1000:
        return True, "Note: Large sample size—chi-square test may be faster"
    
    return True, "Table is valid for Fisher's exact test"

# Test the validator
test_table = np.array([[8, 2], [3, 7]])
is_valid, message = validate_contingency_table(test_table)
print(f"Valid: {is_valid}, Message: {message}")

Visualizing and Reporting Results

Clear visualization helps communicate your findings to stakeholders:

import matplotlib.pyplot as plt
import seaborn as sns

def visualize_fisher_test(table, group_labels, outcome_labels, 
                          odds_ratio, p_value, alpha=0.05):
    """
    Create a visualization of Fisher's exact test results.
    """
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Heatmap of the contingency table
    ax1 = axes[0]
    sns.heatmap(
        table, 
        annot=True, 
        fmt='d', 
        cmap='Blues',
        xticklabels=outcome_labels,
        yticklabels=group_labels,
        ax=ax1,
        cbar_kws={'label': 'Count'}
    )
    ax1.set_title('Contingency Table', fontsize=12, fontweight='bold')
    ax1.set_xlabel('Outcome')
    ax1.set_ylabel('Group')
    
    # Results summary panel
    ax2 = axes[1]
    ax2.axis('off')
    
    significance = "Significant" if p_value < alpha else "Not Significant"
    sig_color = "green" if p_value < alpha else "red"
    
    results_text = (
        f"Fisher's Exact Test Results\n"
        f"{'=' * 30}\n\n"
        f"Odds Ratio: {odds_ratio:.3f}\n\n"
        f"P-value: {p_value:.4f}\n\n"
        f"Result at α={alpha}: {significance}"
    )
    
    ax2.text(0.1, 0.5, results_text, fontsize=14, 
             verticalalignment='center', fontfamily='monospace',
             bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    
    plt.tight_layout()
    plt.savefig('fisher_test_results.png', dpi=150, bbox_inches='tight')
    plt.show()

# Generate visualization
table = np.array([[8, 2], [3, 7]])
odds_ratio, p_value = stats.fisher_exact(table)

visualize_fisher_test(
    table,
    group_labels=['Treatment', 'Control'],
    outcome_labels=['Recovered', 'Not Recovered'],
    odds_ratio=odds_ratio,
    p_value=p_value
)

Conclusion

Fisher’s exact test is an essential tool when you need reliable statistical inference from small samples. Here’s your decision checklist:

Use Fisher’s exact test when:

You have a 2x2 contingency table
Any expected cell frequency is below 5
Sample size is small (typically n < 1000)
You need exact p-values rather than approximations

Consider alternatives when:

Your table is larger than 2x2 (use chi-square or Freeman-Halton)
You need more statistical power (consider Barnard’s exact test)
Sample size is very large (chi-square is faster and equally accurate)

The SciPy implementation makes the test trivial to run—the harder part is knowing when to apply it and how to interpret the results correctly. Focus on understanding your hypothesis (one-tailed vs. two-tailed), validate your input data, and always report both the odds ratio and p-value to give your audience the complete picture.