How to Calculate P-Values in R

Key Insights

P-values in R are consistently extracted from test objects using the $p.value accessor, making it straightforward to programmatically work with hypothesis test results across different statistical functions.
Manual p-value calculation using distribution functions like pt(), pnorm(), and pchisq() gives you precise control and deeper understanding of what these values actually represent.
Multiple testing correction is non-negotiable when running many tests—R’s p.adjust() function handles Bonferroni, FDR, and other corrections with a single line of code.

Introduction to P-Values

A p-value answers a specific question: if the null hypothesis were true, what’s the probability of observing data at least as extreme as what we actually observed? It’s not the probability that the null hypothesis is true, and it’s not the probability that your results occurred by chance. This distinction matters.

In practice, you compare your p-value against a predetermined significance level, typically α = 0.05. If p < α, you reject the null hypothesis. This threshold is arbitrary but conventional—there’s nothing magical about 0.05. Some fields use 0.01 or 0.001 for stricter standards.

R makes p-value calculation straightforward. Every statistical test function returns an object containing the p-value, and you can extract it programmatically. Let’s work through the major scenarios.

P-Values from T-Tests

The t-test is your workhorse for comparing means. R’s t.test() function handles one-sample, two-sample, and paired designs.

# Generate sample data
set.seed(42)
group_a <- rnorm(30, mean = 100, sd = 15)
group_b <- rnorm(30, mean = 110, sd = 15)

# Two-sample t-test (independent groups)
result <- t.test(group_a, group_b)
print(result)

# Extract just the p-value
p_value <- result$p.value
cat("P-value:", p_value, "\n")

# One-sample t-test (testing against a known mean)
one_sample_result <- t.test(group_a, mu = 95)
cat("One-sample p-value:", one_sample_result$p.value, "\n")

# Paired t-test (before/after measurements)
before <- c(200, 190, 210, 180, 195, 205, 215, 185, 200, 190)
after <- c(185, 175, 195, 170, 180, 190, 200, 170, 185, 175)
paired_result <- t.test(before, after, paired = TRUE)
cat("Paired t-test p-value:", paired_result$p.value, "\n")

The $p.value accessor works consistently across R’s statistical functions. Store the test result in a variable, then pull out what you need. This pattern scales well when you’re running multiple tests programmatically.

For the two-sample test, you can also use the formula interface when your data is in a data frame:

# Formula interface with data frame
df <- data.frame(
  value = c(group_a, group_b),
  group = rep(c("A", "B"), each = 30)
)
result_formula <- t.test(value ~ group, data = df)
cat("P-value (formula):", result_formula$p.value, "\n")

P-Values from Chi-Square Tests

Chi-square tests work with categorical data. Use them for testing independence between two categorical variables or for goodness-of-fit against expected proportions.

# Create a contingency table
# Example: Treatment vs. Outcome
observed <- matrix(c(45, 15, 25, 35), nrow = 2, byrow = TRUE)
rownames(observed) <- c("Treatment", "Control")
colnames(observed) <- c("Improved", "No Change")
print(observed)

# Chi-square test of independence
chi_result <- chisq.test(observed)
print(chi_result)
cat("Chi-square p-value:", chi_result$p.value, "\n")

# Goodness-of-fit test
# Testing if observed frequencies match expected proportions
dice_rolls <- c(18, 22, 16, 20, 24, 20)  # Observed counts for each face
expected_prob <- rep(1/6, 6)  # Fair die expectation

gof_result <- chisq.test(dice_rolls, p = expected_prob)
cat("Goodness-of-fit p-value:", gof_result$p.value, "\n")

Watch out for small expected cell counts. When any expected frequency falls below 5, the chi-square approximation becomes unreliable. Use Fisher’s exact test instead:

# Fisher's exact test for small samples
small_table <- matrix(c(3, 1, 1, 4), nrow = 2)
fisher_result <- fisher.test(small_table)
cat("Fisher's exact p-value:", fisher_result$p.value, "\n")

P-Values from Correlation Tests

Testing whether a correlation is statistically significant requires cor.test(), not just cor(). The basic cor() function only computes the correlation coefficient without any inference.

# Sample data
set.seed(123)
x <- rnorm(50)
y <- 0.6 * x + rnorm(50, sd = 0.5)  # Correlated with x

# Pearson correlation test (default)
pearson_result <- cor.test(x, y)
cat("Pearson r:", pearson_result$estimate, "\n")
cat("P-value:", pearson_result$p.value, "\n")

# Spearman rank correlation (for non-linear relationships)
spearman_result <- cor.test(x, y, method = "spearman")
cat("Spearman rho:", spearman_result$estimate, "\n")
cat("P-value:", spearman_result$p.value, "\n")

# Kendall's tau (robust to outliers)
kendall_result <- cor.test(x, y, method = "kendall")
cat("Kendall tau:", kendall_result$estimate, "\n")
cat("P-value:", kendall_result$p.value, "\n")

Choose Pearson for linear relationships between normally distributed variables. Use Spearman or Kendall when you have ordinal data, non-linear monotonic relationships, or outliers that would distort Pearson’s coefficient.

P-Values from Regression Models

Regression models in R don’t give you p-values directly from the model object. You need summary() to access the coefficient table with standard errors, t-values, and p-values.

# Linear regression
set.seed(456)
n <- 100
predictor1 <- rnorm(n)
predictor2 <- rnorm(n)
outcome <- 2 + 1.5 * predictor1 - 0.8 * predictor2 + rnorm(n)

model <- lm(outcome ~ predictor1 + predictor2)
model_summary <- summary(model)

# View full coefficient table
print(model_summary$coefficients)

# Extract p-values for all coefficients
p_values <- model_summary$coefficients[, "Pr(>|t|)"]
print(p_values)

# Get p-value for a specific predictor
p_predictor1 <- model_summary$coefficients["predictor1", "Pr(>|t|)"]
cat("P-value for predictor1:", p_predictor1, "\n")

For logistic regression, the process is similar but the output uses z-values instead of t-values:

# Logistic regression
binary_outcome <- ifelse(outcome > median(outcome), 1, 0)
logit_model <- glm(binary_outcome ~ predictor1 + predictor2, 
                   family = binomial)
logit_summary <- summary(logit_model)

# Extract p-values
logit_p_values <- logit_summary$coefficients[, "Pr(>|z|)"]
print(logit_p_values)

Manual P-Value Calculation from Test Statistics

Sometimes you need to calculate p-values from raw test statistics. This happens when you’re implementing custom tests, working with published statistics, or want to understand the mechanics.

R provides distribution functions for this purpose. The pattern is consistent: use the cumulative distribution function (CDF) and work with the tails.

# Calculate p-value from a z-score (standard normal)
z_score <- 2.3

# Two-tailed p-value
p_two_tailed <- 2 * pnorm(-abs(z_score))
cat("Two-tailed p-value for z =", z_score, ":", p_two_tailed, "\n")

# One-tailed p-value (upper tail)
p_upper <- 1 - pnorm(z_score)
cat("Upper-tailed p-value:", p_upper, "\n")

# Calculate p-value from a t-statistic
t_stat <- 2.5
df <- 25

# Two-tailed
p_t_two <- 2 * pt(-abs(t_stat), df = df)
cat("Two-tailed p-value for t =", t_stat, "with df =", df, ":", p_t_two, "\n")

# Calculate p-value from chi-square statistic
chi_sq_stat <- 8.5
df_chi <- 3

# Chi-square is always one-tailed (upper)
p_chi <- 1 - pchisq(chi_sq_stat, df = df_chi)
cat("P-value for chi-square =", chi_sq_stat, ":", p_chi, "\n")

# F-statistic (ANOVA)
f_stat <- 4.2
df1 <- 2
df2 <- 45

p_f <- 1 - pf(f_stat, df1 = df1, df2 = df2)
cat("P-value for F =", f_stat, ":", p_f, "\n")

The key insight: pnorm(), pt(), pchisq(), and pf() give you the probability of values less than or equal to your statistic. For upper-tail tests, subtract from 1. For two-tailed tests, double the smaller tail probability.

Best Practices and Common Pitfalls

Running multiple statistical tests inflates your false positive rate. If you test 20 hypotheses at α = 0.05, you expect one false positive by chance alone. Correction methods adjust p-values to control this inflation.

# Simulate multiple test p-values
set.seed(789)
raw_p_values <- c(0.001, 0.015, 0.030, 0.045, 0.060, 0.120, 0.250)

# Bonferroni correction (most conservative)
bonferroni_adjusted <- p.adjust(raw_p_values, method = "bonferroni")

# Benjamini-Hochberg FDR correction (less conservative)
fdr_adjusted <- p.adjust(raw_p_values, method = "BH")

# Holm's method (step-down Bonferroni)
holm_adjusted <- p.adjust(raw_p_values, method = "holm")

# Compare results
comparison <- data.frame(
  raw = raw_p_values,
  bonferroni = bonferroni_adjusted,
  holm = holm_adjusted,
  fdr = fdr_adjusted
)
print(comparison)

# Which remain significant at 0.05?
cat("\nSignificant after Bonferroni:", sum(bonferroni_adjusted < 0.05), "\n")
cat("Significant after FDR:", sum(fdr_adjusted < 0.05), "\n")

Use Bonferroni when false positives are costly. Use FDR (Benjamini-Hochberg) when you’re doing exploratory analysis and can tolerate some false discoveries. Holm’s method offers a middle ground with more power than Bonferroni while still controlling the family-wise error rate.

Common mistakes to avoid:

Treating p > 0.05 as “no effect.” Absence of evidence isn’t evidence of absence. A non-significant result might just mean insufficient sample size.
Ignoring effect sizes. A tiny, meaningless effect can be statistically significant with enough data. Always report effect sizes alongside p-values.
P-hacking. Don’t run tests until you find significance. Preregister your analysis plan or be transparent about exploratory analyses.
Misinterpreting the value. P = 0.03 doesn’t mean there’s a 3% chance the null is true. It means if the null were true, you’d see data this extreme 3% of the time.

Report p-values with appropriate precision. Use exact values (p = 0.023) rather than inequalities (p < 0.05) when possible. For very small p-values, scientific notation works (p = 2.3e-06) or report as p < 0.001.