How to Implement LDA in R

Key Insights

LDA simultaneously reduces dimensionality and performs classification by finding linear combinations of features that best separate classes, making it more efficient than PCA for supervised learning tasks
The algorithm assumes normally distributed features with equal covariance matrices across classes—violations of these assumptions should prompt you to consider Quadratic Discriminant Analysis (QDA) instead
Cross-validation is essential for LDA model evaluation since the technique can overfit on small datasets, and proper preprocessing like feature scaling significantly impacts classification accuracy

Introduction to Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) serves dual purposes: dimensionality reduction and classification. Unlike Principal Component Analysis (PCA), which maximizes variance without considering class labels, LDA maximizes the separation between multiple classes. This makes LDA particularly valuable when you need both interpretable features and accurate predictions.

Use LDA when you have labeled data and want to reduce dimensions while preserving class separability. Common applications include customer segmentation based on purchasing behavior, medical diagnosis from patient metrics, and face recognition systems. LDA works best with 2-10 classes and when your features follow approximately normal distributions within each class.

Choose PCA over LDA when you lack class labels or need unsupervised dimensionality reduction. Opt for LDA when classification accuracy matters more than pure variance preservation.

Prerequisites and Dataset Preparation

Start by installing the necessary packages. The MASS package contains the core LDA implementation, caret provides evaluation tools, and ggplot2 handles visualization.

# Install packages if needed
install.packages(c("MASS", "caret", "ggplot2"))

# Load libraries
library(MASS)
library(caret)
library(ggplot2)

# Load the iris dataset
data(iris)

# Explore the structure
str(iris)
summary(iris)

# Check for missing values
sum(is.na(iris))

# Create training and testing sets (70/30 split)
set.seed(123)
train_index <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]

# Verify class distribution
table(train_data$Species)
table(test_data$Species)

The iris dataset contains 150 observations across three species with four features: sepal length, sepal width, petal length, and petal width. This balanced dataset with clear class separation makes it ideal for demonstrating LDA.

Implementing Basic LDA

The lda() function from MASS handles the core implementation. The syntax is straightforward: specify your formula and data frame.

# Fit LDA model
lda_model <- lda(Species ~ ., data = train_data)

# Display model summary
print(lda_model)

# View prior probabilities
lda_model$prior

# View group means
lda_model$means

# View coefficients of linear discriminants
lda_model$scaling

# Make predictions on test data
lda_predictions <- predict(lda_model, test_data)

# Extract predicted classes
predicted_classes <- lda_predictions$class

# Extract posterior probabilities
posterior_probs <- lda_predictions$posterior

# Extract linear discriminant values
ld_values <- lda_predictions$x

# View first few predictions
head(data.frame(
  Actual = test_data$Species,
  Predicted = predicted_classes,
  LD1 = ld_values[, 1],
  LD2 = ld_values[, 2]
))

The model output shows prior probabilities (proportion of each class in training data), group means (average feature values per class), and scaling coefficients (weights for each feature in the discriminant functions). For three classes, you get two linear discriminants (LD1 and LD2) since the maximum number of discriminants is min(p, k-1), where p is the number of features and k is the number of classes.

Visualizing LDA Results

Visualization helps you understand how LDA separates classes in the reduced dimensional space.

# Create a data frame for plotting
plot_data <- data.frame(
  LD1 = lda_predictions$x[, 1],
  LD2 = lda_predictions$x[, 2],
  Species = test_data$Species,
  Predicted = predicted_classes
)

# Plot LD1 vs LD2
ggplot(plot_data, aes(x = LD1, y = LD2, color = Species, shape = Predicted)) +
  geom_point(size = 3, alpha = 0.7) +
  labs(title = "LDA: Test Data Projection",
       x = "First Linear Discriminant (LD1)",
       y = "Second Linear Discriminant (LD2)") +
  theme_minimal() +
  theme(legend.position = "right")

# Create confusion matrix
conf_matrix <- confusionMatrix(predicted_classes, test_data$Species)
print(conf_matrix)

# Visualize confusion matrix
conf_matrix_table <- as.data.frame(conf_matrix$table)

ggplot(conf_matrix_table, aes(x = Reference, y = Prediction, fill = Freq)) +
  geom_tile(color = "white") +
  geom_text(aes(label = Freq), size = 6) +
  scale_fill_gradient(low = "white", high = "steelblue") +
  labs(title = "Confusion Matrix Heatmap",
       x = "Actual Species",
       y = "Predicted Species") +
  theme_minimal()

The scatter plot shows how well LDA separates the three species in two-dimensional space. Misclassifications appear as points where shape (predicted class) doesn’t match color (actual class).

Evaluating Model Performance

Beyond simple accuracy, calculate comprehensive performance metrics.

# Overall accuracy
accuracy <- conf_matrix$overall['Accuracy']
print(paste("Accuracy:", round(accuracy, 4)))

# Per-class metrics
print(conf_matrix$byClass)

# Calculate metrics manually for better understanding
precision <- conf_matrix$byClass[, 'Pos Pred Value']
recall <- conf_matrix$byClass[, 'Sensitivity']
f1_score <- 2 * (precision * recall) / (precision + recall)

metrics_df <- data.frame(
  Class = rownames(conf_matrix$byClass),
  Precision = precision,
  Recall = recall,
  F1_Score = f1_score
)
print(metrics_df)

# For multi-class ROC, use one-vs-rest approach
library(pROC)

# Create ROC for each class
roc_setosa <- roc(
  response = ifelse(test_data$Species == "setosa", 1, 0),
  predictor = lda_predictions$posterior[, "setosa"]
)

roc_versicolor <- roc(
  response = ifelse(test_data$Species == "versicolor", 1, 0),
  predictor = lda_predictions$posterior[, "versicolor"]
)

roc_virginica <- roc(
  response = ifelse(test_data$Species == "virginica", 1, 0),
  predictor = lda_predictions$posterior[, "virginica"]
)

# Plot ROC curves
plot(roc_setosa, col = "red", main = "ROC Curves for Each Class")
plot(roc_versicolor, col = "blue", add = TRUE)
plot(roc_virginica, col = "green", add = TRUE)
legend("bottomright", 
       legend = c("Setosa", "Versicolor", "Virginica"),
       col = c("red", "blue", "green"), lwd = 2)

# Print AUC values
cat("AUC - Setosa:", auc(roc_setosa), "\n")
cat("AUC - Versicolor:", auc(roc_versicolor), "\n")
cat("AUC - Virginica:", auc(roc_virginica), "\n")

High AUC values (close to 1.0) indicate excellent class separation. The iris dataset typically achieves near-perfect classification.

Advanced Techniques and Tuning

Implement cross-validation to get robust performance estimates and test LDA assumptions.

# K-fold cross-validation
set.seed(123)
train_control <- trainControl(
  method = "cv",
  number = 10,
  savePredictions = TRUE
)

# Train with cross-validation
cv_model <- train(
  Species ~ .,
  data = iris,
  method = "lda",
  trControl = train_control
)

print(cv_model)
print(cv_model$results)

# Test normality assumption (Shapiro-Wilk test)
# Test for each feature within each class
normality_results <- list()

for (species in unique(iris$Species)) {
  species_data <- iris[iris$Species == species, 1:4]
  
  for (feature in names(species_data)) {
    test_result <- shapiro.test(species_data[[feature]])
    normality_results[[paste(species, feature, sep = "_")]] <- test_result$p.value
  }
}

# Display normality test results
print("Normality Test P-values (>0.05 suggests normality):")
print(normality_results)

# Compare LDA vs QDA
qda_model <- qda(Species ~ ., data = train_data)
qda_predictions <- predict(qda_model, test_data)
qda_conf_matrix <- confusionMatrix(qda_predictions$class, test_data$Species)

# Compare accuracies
comparison <- data.frame(
  Model = c("LDA", "QDA"),
  Accuracy = c(
    conf_matrix$overall['Accuracy'],
    qda_conf_matrix$overall['Accuracy']
  )
)
print(comparison)

QDA relaxes the equal covariance assumption by estimating separate covariance matrices for each class. Use QDA when classes have genuinely different covariances, but remember it requires more training data.

Practical Tips and Common Pitfalls

Feature Scaling: LDA is sensitive to feature scales. Standardize features when they have different units or ranges.

# Standardize features
iris_scaled <- iris
iris_scaled[, 1:4] <- scale(iris[, 1:4])

# Refit model with scaled data
lda_scaled <- lda(Species ~ ., data = iris_scaled)

Handling Imbalanced Classes: Use stratified sampling and consider adjusting prior probabilities.

# Custom prior probabilities (equal priors)
lda_equal_priors <- lda(Species ~ ., 
                        data = train_data,
                        prior = c(1/3, 1/3, 1/3))

When LDA Fails: Don’t use LDA when classes have non-linear boundaries, severely violate normality assumptions, or have vastly different covariance structures. Consider alternatives like QDA, logistic regression, or tree-based methods.

Small Sample Sizes: LDA needs sufficient samples per class (at least 5-10 times the number of features). With fewer samples, the covariance matrix estimates become unstable.

Multicollinearity: Highly correlated features can cause numerical instability. Remove or combine correlated predictors before fitting LDA.

LDA remains a powerful tool when its assumptions hold. Test those assumptions, validate with cross-validation, and compare against alternative methods to ensure you’re using the right tool for your classification problem.