How to Calculate AUC-ROC in R

Key Insights

AUC-ROC measures a binary classifier’s ability to distinguish between classes across all possible thresholds, with values ranging from 0.5 (random guessing) to 1.0 (perfect separation)
The pROC package provides the most straightforward implementation in R with roc() and auc() functions, while ROCR and caret offer alternative approaches with different visualization capabilities
AUC-ROC works best for balanced datasets; for imbalanced data, consider precision-recall curves instead, and always validate optimal thresholds using coords() rather than defaulting to 0.5

Introduction to AUC-ROC

The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is one of the most widely used metrics for evaluating binary classification models. Unlike accuracy, which depends on a single threshold, AUC-ROC evaluates model performance across all possible classification thresholds, making it particularly valuable when you need a threshold-independent metric.

The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. TPR, also called sensitivity or recall, measures the proportion of actual positives correctly identified: TPR = TP / (TP + FN). FPR measures the proportion of actual negatives incorrectly classified as positive: FPR = FP / (FP + TN).

The AUC represents the probability that your model ranks a random positive example higher than a random negative example. An AUC of 0.5 indicates random guessing, while 1.0 represents perfect classification. In practice, AUC values above 0.8 are considered good, and above 0.9 are excellent, though this varies by domain.

Setting Up Your Environment

Start by installing the necessary packages. The pROC package is specifically designed for ROC analysis, while ROCR and caret offer broader functionality with ROC capabilities built in.

# Install required packages
install.packages(c("pROC", "ROCR", "caret", "randomForest"))

# Load libraries
library(pROC)
library(ROCR)
library(caret)
library(randomForest)

For demonstration purposes, let’s create a medical diagnosis dataset simulating a scenario where we predict disease presence based on biomarkers:

# Set seed for reproducibility
set.seed(123)

# Create sample dataset
n <- 1000
data <- data.frame(
  biomarker1 = rnorm(n, mean = 100, sd = 15),
  biomarker2 = rnorm(n, mean = 50, sd = 10),
  age = sample(25:75, n, replace = TRUE),
  disease = sample(0:1, n, replace = TRUE, prob = c(0.7, 0.3))
)

# Add signal to make prediction possible
data$biomarker1 <- data$biomarker1 + data$disease * 20
data$biomarker2 <- data$biomarker2 + data$disease * 15

# Split into training and test sets
train_idx <- sample(1:n, 0.7 * n)
train_data <- data[train_idx, ]
test_data <- data[-train_idx, ]

Building a Simple Classification Model

Before calculating AUC-ROC, you need a model that produces predicted probabilities. Logistic regression is a natural choice for binary classification and provides well-calibrated probabilities.

# Train logistic regression model
log_model <- glm(disease ~ biomarker1 + biomarker2 + age, 
                 data = train_data, 
                 family = binomial(link = "logit"))

# Generate predicted probabilities on test set
test_probs <- predict(log_model, newdata = test_data, type = "response")

# View summary
summary(log_model)

The type = "response" argument is critical—it returns probabilities rather than log-odds. These probabilities range from 0 to 1 and represent the model’s confidence that each observation belongs to the positive class.

Let’s also train a random forest for comparison:

# Train random forest model
rf_model <- randomForest(as.factor(disease) ~ biomarker1 + biomarker2 + age,
                         data = train_data,
                         ntree = 500)

# Generate predicted probabilities (second column is positive class)
rf_probs <- predict(rf_model, newdata = test_data, type = "prob")[, 2]

Calculating AUC-ROC with pROC Package

The pROC package offers the most intuitive interface for ROC analysis in R. The roc() function computes the ROC curve, and auc() extracts the area under it.

# Calculate ROC curve for logistic regression
roc_log <- roc(test_data$disease, test_probs)

# Extract AUC
auc_log <- auc(roc_log)
print(paste("Logistic Regression AUC:", round(auc_log, 3)))

# Plot ROC curve
plot(roc_log, 
     main = "ROC Curve - Logistic Regression",
     col = "blue", 
     lwd = 2,
     print.auc = TRUE,
     auc.polygon = TRUE,
     auc.polygon.col = "lightblue")

# Add diagonal reference line
abline(a = 0, b = 1, lty = 2, col = "gray")

The print.auc = TRUE argument automatically displays the AUC value on the plot, while auc.polygon = TRUE shades the area under the curve for visual emphasis. The diagonal line represents random guessing (AUC = 0.5).

You can also calculate confidence intervals for the AUC:

# Calculate 95% confidence interval
ci_auc <- ci.auc(roc_log)
print(ci_auc)

Alternative Methods (ROCR and caret)

The ROCR package provides more granular control over performance metrics and visualization options:

# Create prediction object
pred_obj <- prediction(test_probs, test_data$disease)

# Calculate performance metrics
perf <- performance(pred_obj, measure = "tpr", x.measure = "fpr")

# Extract AUC
auc_rocr <- performance(pred_obj, measure = "auc")
auc_value <- auc_rocr@y.values[[1]]
print(paste("ROCR AUC:", round(auc_value, 3)))

# Plot ROC curve
plot(perf, 
     main = "ROC Curve - ROCR Package",
     col = "darkgreen",
     lwd = 2)
abline(a = 0, b = 1, lty = 2, col = "gray")
text(0.6, 0.2, paste("AUC =", round(auc_value, 3)), cex = 1.2)

The caret package integrates AUC calculation into its model training workflow:

# Set up cross-validation with AUC as metric
train_control <- trainControl(
  method = "cv",
  number = 5,
  summaryFunction = twoClassSummary,
  classProbs = TRUE,
  savePredictions = TRUE
)

# Prepare data with factor levels (required by caret)
train_caret <- train_data
train_caret$disease <- factor(train_caret$disease, 
                               levels = c(0, 1), 
                               labels = c("No", "Yes"))

# Train model with caret
caret_model <- train(disease ~ biomarker1 + biomarker2 + age,
                     data = train_caret,
                     method = "glm",
                     family = binomial,
                     trControl = train_control,
                     metric = "ROC")

print(caret_model$results)

Interpreting AUC Values

Understanding AUC values requires context. Here’s a practical interpretation framework:

0.5: Random guessing—your model has no discriminative ability
0.5-0.7: Poor performance—barely better than random
0.7-0.8: Acceptable performance—useful for some applications
0.8-0.9: Good performance—suitable for most practical applications
0.9-1.0: Excellent performance—exceptional discrimination

Compare multiple models systematically:

# Calculate ROC for random forest
roc_rf <- roc(test_data$disease, rf_probs)
auc_rf <- auc(roc_rf)

# Compare models
comparison <- data.frame(
  Model = c("Logistic Regression", "Random Forest"),
  AUC = c(auc_log, auc_rf)
)
print(comparison)

# Statistical test for difference
roc_test <- roc.test(roc_log, roc_rf)
print(roc_test)

Finding the optimal classification threshold is crucial for deployment:

# Find optimal threshold using Youden's J statistic
optimal_threshold <- coords(roc_log, "best", ret = "threshold", 
                            best.method = "youden")
print(paste("Optimal threshold:", round(optimal_threshold, 3)))

# Get sensitivity and specificity at optimal threshold
optimal_coords <- coords(roc_log, optimal_threshold, 
                         ret = c("threshold", "sensitivity", "specificity"))
print(optimal_coords)

# You can also optimize for specific sensitivity
high_sens <- coords(roc_log, x = 0.9, input = "sensitivity", 
                    ret = c("threshold", "specificity"))
print(high_sens)

The Youden’s J statistic maximizes (sensitivity + specificity - 1), providing a balanced threshold. However, in practice, you should choose thresholds based on domain requirements—medical diagnosis often prioritizes sensitivity to avoid missing cases, while fraud detection might prioritize specificity to reduce false alarms.

Conclusion

AUC-ROC is an essential metric for binary classification, but it’s not without limitations. It treats false positives and false negatives equally, which may not reflect real-world costs. For highly imbalanced datasets where the positive class is rare, precision-recall curves and average precision scores often provide better insights.

When calculating AUC-ROC in R, pROC is your best starting point for its simplicity and comprehensive functionality. Use ROCR when you need detailed performance analysis across multiple metrics, and leverage caret when integrating ROC analysis into a broader machine learning pipeline with cross-validation.

Always validate your models on held-out test data, never on training data. Report confidence intervals alongside AUC values, and remember that a high AUC doesn’t guarantee a useful model—consider calibration, interpretability, and real-world deployment constraints. Finally, determine optimal thresholds based on your specific use case rather than defaulting to 0.5, as the decision boundary that maximizes AUC rarely aligns with business objectives.