How to Calculate AIC and BIC in R

Key Insights

AIC and BIC both balance model fit against complexity, but AIC favors predictive accuracy while BIC tends to select simpler, more parsimonious models—especially with larger sample sizes.
R provides built-in AIC() and BIC() functions that work with most fitted model objects, making comparison straightforward once you understand what the numbers mean.
Never compare AIC or BIC values across models fitted to different datasets; the criteria are only meaningful for ranking models estimated on identical observations.

Introduction to Model Selection Criteria

Every statistical model involves a fundamental trade-off: more parameters improve fit to your training data but risk overfitting. Add enough predictors to a regression, and you can perfectly interpolate your observations while learning nothing generalizable.

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) formalize this trade-off. Both penalize model complexity, but they do so differently and answer subtly different questions. AIC asks: “Which model will predict new data best?” BIC asks: “Which model is most likely to be the true data-generating process?”

Understanding when to use each—and how to calculate them correctly in R—separates rigorous model selection from arbitrary choices.

The Math Behind AIC and BIC

Both criteria start with the log-likelihood of the fitted model, then subtract a penalty based on the number of parameters.

AIC formula: $$AIC = -2 \cdot \ln(L) + 2k$$

BIC formula: $$BIC = -2 \cdot \ln(L) + k \cdot \ln(n)$$

Where:

$L$ is the maximized likelihood of the model
$k$ is the number of estimated parameters
$n$ is the sample size

The key difference is the penalty term. AIC uses a fixed penalty of 2 per parameter. BIC’s penalty grows with sample size—once $n > 7$, the BIC penalty exceeds AIC’s, and this gap widens as data accumulates.

When to prefer each:

Use AIC when your goal is prediction. It’s asymptotically equivalent to leave-one-out cross-validation.
Use BIC when you believe a “true” model exists and want to identify it. BIC is consistent—given enough data, it selects the correct model with probability approaching 1.

Here’s how to calculate both manually:

# Manual AIC/BIC calculation
manual_ic <- function(model) {
  log_lik <- logLik(model)
  k <- attr(log_lik, "df")  # number of parameters
  n <- nobs(model)          # sample size
  
  aic <- -2 * as.numeric(log_lik) + 2 * k
  bic <- -2 * as.numeric(log_lik) + k * log(n)
  
  list(
    log_likelihood = as.numeric(log_lik),
    parameters = k,
    n = n,
    AIC = aic,
    BIC = bic
  )
}

# Example with simulated data
set.seed(42)
x <- rnorm(100)
y <- 2 + 3 * x + rnorm(100)
model <- lm(y ~ x)

manual_ic(model)

This produces values identical to R’s built-in functions (within floating-point precision), confirming your understanding of the underlying math.

Using Built-in R Functions

R’s stats package provides AIC() and BIC() functions that work with any model object implementing a logLik() method. This covers lm, glm, arima, nls, and models from most major packages.

# Fit a linear model
data(mtcars)
model <- lm(mpg ~ wt + hp, data = mtcars)

# Extract AIC and BIC
AIC(model)
BIC(model)

# Get both with log-likelihood details
logLik(model)

For a single model, these functions return scalar values. Lower is better for both metrics—you’re minimizing information loss.

You can also pass multiple models to compare them directly:

# Fit competing models
model1 <- lm(mpg ~ wt, data = mtcars)
model2 <- lm(mpg ~ wt + hp, data = mtcars)
model3 <- lm(mpg ~ wt + hp + qsec, data = mtcars)

# Compare all at once
AIC(model1, model2, model3)
BIC(model1, model2, model3)

This returns a data frame with degrees of freedom and criterion values, sorted by model order (not by criterion value—you’ll need to sort manually for ranking).

Comparing Multiple Models

When comparing models, absolute AIC/BIC values are meaningless. Only differences matter. A common approach is to compute delta values relative to the best model:

# Systematic model comparison
compare_models <- function(...) {
  models <- list(...)
  names(models) <- paste0("Model_", seq_along(models))
  
  # Extract criteria
  aic_vals <- sapply(models, AIC)
  bic_vals <- sapply(models, BIC)
  
  # Calculate deltas
  delta_aic <- aic_vals - min(aic_vals)
  delta_bic <- bic_vals - min(bic_vals)
  
  # Akaike weights (probability interpretation)
  aic_weights <- exp(-0.5 * delta_aic)
  aic_weights <- aic_weights / sum(aic_weights)
  
  data.frame(
    AIC = round(aic_vals, 2),
    Delta_AIC = round(delta_aic, 2),
    AIC_Weight = round(aic_weights, 3),
    BIC = round(bic_vals, 2),
    Delta_BIC = round(delta_bic, 2)
  )
}

# Apply to our models
compare_models(model1, model2, model3)

Interpreting delta values:

Delta < 2: Substantial support; models are essentially equivalent
Delta 2-7: Considerably less support
Delta > 10: Essentially no support; safely exclude from consideration

Akaike weights provide a probability interpretation—the weight represents the probability that a given model is the best approximating model among those considered.

# Full example with mtcars
models <- list(
  "wt only" = lm(mpg ~ wt, data = mtcars),
  "wt + hp" = lm(mpg ~ wt + hp, data = mtcars),
  "wt + hp + qsec" = lm(mpg ~ wt + hp + qsec, data = mtcars),
  "wt + hp + qsec + am" = lm(mpg ~ wt + hp + qsec + am, data = mtcars),
  "full model" = lm(mpg ~ ., data = mtcars)
)

# Create comparison table
comparison <- data.frame(
  Model = names(models),
  AIC = sapply(models, AIC),
  BIC = sapply(models, BIC),
  R_squared = sapply(models, function(m) summary(m)$r.squared)
)

comparison$Delta_AIC <- comparison$AIC - min(comparison$AIC)
comparison$Delta_BIC <- comparison$BIC - min(comparison$BIC)

# Sort by AIC
comparison[order(comparison$AIC), ]

Notice how AIC and BIC can disagree. AIC might favor a more complex model while BIC prefers parsimony. This isn’t a bug—it reflects their different objectives.

AIC/BIC with Different Model Types

The same functions work across model types, making comparison straightforward.

Logistic Regression (GLM)

# Binary outcome example
mtcars$am_binary <- as.factor(mtcars$am)

glm1 <- glm(am_binary ~ wt, data = mtcars, family = binomial)
glm2 <- glm(am_binary ~ wt + hp, data = mtcars, family = binomial)
glm3 <- glm(am_binary ~ wt + hp + qsec, data = mtcars, family = binomial)

AIC(glm1, glm2, glm3)
BIC(glm1, glm2, glm3)

Time Series (ARIMA)

# ARIMA model comparison
data(AirPassengers)
log_passengers <- log(AirPassengers)

arima1 <- arima(log_passengers, order = c(1, 1, 1), 
                seasonal = list(order = c(1, 1, 1), period = 12))
arima2 <- arima(log_passengers, order = c(2, 1, 1), 
                seasonal = list(order = c(1, 1, 1), period = 12))
arima3 <- arima(log_passengers, order = c(1, 1, 2), 
                seasonal = list(order = c(1, 1, 1), period = 12))

AIC(arima1, arima2, arima3)
BIC(arima1, arima2, arima3)

Mixed Effects Models

library(lme4)

# Using sleepstudy data
data(sleepstudy)

mixed1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
mixed2 <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)

AIC(mixed1, mixed2)
BIC(mixed1, mixed2)

For mixed models, be aware that the default uses REML estimation. When comparing models with different fixed effects, refit with REML = FALSE to ensure valid comparisons.

Common Pitfalls and Best Practices

Pitfall 1: Comparing models on different data

This is the most common error. If you subset data differently or handle missing values inconsistently, your comparisons are invalid.

# WRONG: Different sample sizes
model_a <- lm(mpg ~ wt, data = mtcars)
model_b <- lm(mpg ~ wt + qsec, data = na.omit(mtcars[, c("mpg", "wt", "qsec")]))

# RIGHT: Ensure identical observations
complete_data <- na.omit(mtcars[, c("mpg", "wt", "qsec")])
model_a <- lm(mpg ~ wt, data = complete_data)
model_b <- lm(mpg ~ wt + qsec, data = complete_data)

Pitfall 2: Ignoring sample size for AIC vs BIC choice

With small samples (n < 40), AIC and BIC behave similarly. As n grows, BIC becomes increasingly conservative. For very large datasets, BIC may select overly simple models that sacrifice predictive power.

Pitfall 3: Treating information criteria as hypothesis tests

AIC and BIC don’t provide p-values or confidence intervals. A model with lower AIC isn’t “significantly better”—it’s simply preferred given the trade-off encoded in the criterion.

Pitfall 4: Comparing non-nested models carelessly

While AIC/BIC can compare non-nested models (unlike likelihood ratio tests), ensure the models are answering the same question. Comparing a linear regression to a logistic regression is meaningless.

Best practices:

Always report both AIC and BIC when they disagree
Use delta values and weights, not raw scores
Consider domain knowledge alongside statistical criteria
For prediction tasks, validate with held-out data regardless of AIC results

Conclusion

AIC and BIC provide principled approaches to model selection that balance fit and complexity. Use AIC when prediction is your goal; use BIC when you’re trying to identify a true underlying model structure.

Task	Function	Notes
Extract AIC	`AIC(model)`	Works with most fitted objects
Extract BIC	`BIC(model)`	Equivalent to `AIC(model, k = log(nobs(model)))`
Compare models	`AIC(m1, m2, m3)`	Returns data frame
Get log-likelihood	`logLik(model)`	Foundation for manual calculation
Count parameters	`attr(logLik(model), "df")`	Includes intercept and variance

The functions are simple. The hard part is ensuring your comparisons are valid and interpreting the results appropriately. Start with identical datasets, compute delta values, and let domain expertise guide your final choice when the numbers are close.