How to Implement Naive Bayes in R

Key Insights

Naive Bayes classifiers are exceptionally fast and perform surprisingly well despite assuming feature independence, making them ideal for text classification, spam filtering, and real-time prediction systems.
R offers multiple implementations through the e1071 and naivebayes packages, with the former being more established and the latter providing better performance for large datasets.
The algorithm’s probabilistic foundation means it naturally handles multi-class problems and provides probability estimates, not just classifications, giving you confidence scores for predictions.

Introduction to Naive Bayes

Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ theorem with a “naive” assumption that all features are independent of each other. Despite this oversimplification—which rarely holds true in real-world data—Naive Bayes performs remarkably well across various classification tasks.

The algorithm calculates the probability of each class given the input features and assigns the class with the highest probability. It’s particularly effective for text classification, spam detection, sentiment analysis, and medical diagnosis. The main advantages are speed, simplicity, and effectiveness with high-dimensional data.

Understanding the Mathematics

Bayes’ theorem forms the foundation of Naive Bayes classification:

P(Class|Features) = [P(Features|Class) × P(Class)] / P(Features)

Where:

P(Class|Features) is the posterior probability
P(Features|Class) is the likelihood
P(Class) is the prior probability
P(Features) is the evidence

The “naive” assumption means we treat features as independent, so:

P(x₁, x₂, …, xₙ|Class) = P(x₁|Class) × P(x₂|Class) × … × P(xₙ|Class)

There are three main variants:

Gaussian Naive Bayes: Assumes continuous features follow a normal distribution
Multinomial Naive Bayes: For discrete counts (ideal for text classification)
Bernoulli Naive Bayes: For binary/boolean features

Let’s demonstrate Bayes’ theorem with a simple calculation:

# Simple Bayes' theorem demonstration
# Probability of having disease given positive test

# Prior probability of disease
p_disease <- 0.01

# Probability of positive test given disease (sensitivity)
p_pos_given_disease <- 0.95

# Probability of positive test given no disease (false positive)
p_pos_given_no_disease <- 0.05

# Probability of positive test (evidence)
p_positive <- (p_pos_given_disease * p_disease) + 
              (p_pos_given_no_disease * (1 - p_disease))

# Posterior probability: disease given positive test
p_disease_given_pos <- (p_pos_given_disease * p_disease) / p_positive

cat("Probability of disease given positive test:", 
    round(p_disease_given_pos, 4))
# Output: 0.1609

Dataset Preparation

Proper data preparation is critical for any machine learning model. We’ll use the iris dataset to demonstrate classification of flower species.

# Load required libraries
library(e1071)
library(caret)

# Load and explore the iris dataset
data(iris)

# Examine structure and summary
str(iris)
summary(iris)

# Check for missing values
sum(is.na(iris))

# View first few rows
head(iris)

# Check class distribution
table(iris$Species)

Now split the data into training and testing sets:

# Set seed for reproducibility
set.seed(123)

# Create 70-30 train-test split
train_index <- createDataPartition(iris$Species, p = 0.7, list = FALSE)
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]

# Verify split proportions
nrow(train_data)  # 105 observations
nrow(test_data)   # 45 observations

# Check class distribution in both sets
prop.table(table(train_data$Species))
prop.table(table(test_data$Species))

Building the Naive Bayes Model

The e1071 package provides a straightforward implementation of Naive Bayes in R.

# Install and load e1071 if not already installed
if(!require(e1071)) install.packages("e1071")
library(e1071)

# Train the Naive Bayes model
nb_model <- naiveBayes(Species ~ ., data = train_data)

# Display model summary
print(nb_model)

# View prior probabilities
nb_model$apriori

# View conditional probabilities for each feature
nb_model$tables

The model output shows:

Prior probabilities for each class
Mean and standard deviation for each feature per class (for Gaussian NB)
Conditional probability tables

You can also specify the Laplace smoothing parameter to handle zero probabilities:

# Train with Laplace smoothing
nb_model_laplace <- naiveBayes(Species ~ ., 
                               data = train_data, 
                               laplace = 1)

Making Predictions and Evaluation

Now we’ll generate predictions and evaluate model performance:

# Make predictions on test data
predictions <- predict(nb_model, test_data)

# View first few predictions
head(predictions)

# Create confusion matrix
conf_matrix <- table(Predicted = predictions, Actual = test_data$Species)
print(conf_matrix)

# Calculate accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
cat("Accuracy:", round(accuracy, 4))

# Using caret for detailed metrics
confusionMatrix(predictions, test_data$Species)

For probability predictions instead of class labels:

# Get probability predictions
prob_predictions <- predict(nb_model, test_data, type = "raw")

# View probabilities for first few observations
head(prob_predictions)

# Show predictions with highest probability
predicted_class <- colnames(prob_predictions)[apply(prob_predictions, 1, which.max)]

Calculate additional metrics manually:

# Function to calculate precision, recall, F1
calculate_metrics <- function(conf_matrix, class_name) {
  tp <- conf_matrix[class_name, class_name]
  fp <- sum(conf_matrix[class_name, ]) - tp
  fn <- sum(conf_matrix[, class_name]) - tp
  
  precision <- tp / (tp + fp)
  recall <- tp / (tp + fn)
  f1 <- 2 * (precision * recall) / (precision + recall)
  
  return(c(Precision = precision, Recall = recall, F1 = f1))
}

# Calculate for each class
for(class in levels(test_data$Species)) {
  metrics <- calculate_metrics(conf_matrix, class)
  cat("\nMetrics for", class, ":\n")
  print(round(metrics, 4))
}

Practical Application: Text Classification

Naive Bayes excels at text classification. Here’s a spam detection example:

# Install required packages
if(!require(tm)) install.packages("tm")
if(!require(SnowballC)) install.packages("SnowballC")

library(tm)
library(SnowballC)

# Sample email data
emails <- c(
  "Get rich quick! Buy now!",
  "Meeting scheduled for tomorrow",
  "Claim your prize now!!!",
  "Project update attached",
  "Free money waiting for you",
  "Please review the quarterly report"
)

labels <- factor(c("spam", "ham", "spam", "ham", "spam", "ham"))

# Create corpus
corpus <- Corpus(VectorSource(emails))

# Text preprocessing
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stemDocument)
corpus <- tm_map(corpus, stripWhitespace)

# Create document-term matrix
dtm <- DocumentTermMatrix(corpus)
dtm_matrix <- as.matrix(dtm)

# Convert to data frame
email_data <- as.data.frame(dtm_matrix)
email_data$label <- labels

# Train Naive Bayes on text data
text_model <- naiveBayes(label ~ ., data = email_data)

# Predict on new email
new_email <- "Free prize money click now"
new_corpus <- Corpus(VectorSource(new_email))
new_corpus <- tm_map(new_corpus, content_transformer(tolower))
new_corpus <- tm_map(new_corpus, removePunctuation)
new_corpus <- tm_map(new_corpus, removeWords, stopwords("english"))

new_dtm <- DocumentTermMatrix(new_corpus, 
                              control = list(dictionary = Terms(dtm)))
new_matrix <- as.data.frame(as.matrix(new_dtm))

# Ensure same columns
missing_cols <- setdiff(names(email_data), names(new_matrix))
for(col in missing_cols[missing_cols != "label"]) {
  new_matrix[[col]] <- 0
}

prediction <- predict(text_model, new_matrix)
print(paste("Predicted class:", prediction))

Conclusion and Best Practices

Naive Bayes remains relevant because it’s fast, requires minimal training data, and handles high-dimensional data effectively. Use it when:

You need quick baseline models
Working with text classification or categorical data
Training data is limited
Real-time predictions are required
You need probability estimates, not just classifications

Limitations to consider:

The independence assumption rarely holds in practice
Performs poorly with correlated features
Zero-frequency problem (solved with Laplace smoothing)
Continuous features require distribution assumptions

Performance tips:

Apply Laplace smoothing to handle unseen feature combinations
Feature engineering matters—remove irrelevant features
For text data, consider TF-IDF weighting instead of raw counts
Compare against other algorithms; Naive Bayes is a baseline, not always the best choice
Use cross-validation for robust performance estimates

The naivebayes package offers an alternative implementation with better performance for large datasets and more flexibility with kernel density estimation for continuous variables. Experiment with both packages to find what works best for your specific use case.