How to Implement SVM in R

Key Insights

Support Vector Machines excel at binary classification tasks with clear margins of separation, making them ideal for text classification, image recognition, and bioinformatics applications where you need robust decision boundaries.
The choice of kernel function fundamentally changes your model’s behavior—use linear kernels for linearly separable data, RBF kernels when you suspect non-linear relationships, and always validate with cross-validation before committing to production.
Proper hyperparameter tuning through grid search can improve SVM accuracy by 10-20%, but be prepared for computational costs as SVMs scale poorly with datasets exceeding 100,000 samples.

Introduction to Support Vector Machines

Support Vector Machines (SVMs) are supervised learning algorithms that find the optimal hyperplane to separate classes in your feature space. Unlike logistic regression that maximizes likelihood, SVMs maximize the margin between classes—the distance between the decision boundary and the nearest data points (support vectors).

The algorithm works by mapping your input features into a higher-dimensional space where a linear separator can be found. This mapping happens through kernel functions, which compute similarities without explicitly performing the transformation. SVMs handle both classification (SVC) and regression (SVR) tasks, though classification is their primary strength.

Use SVMs when you have clear class separation, medium-sized datasets (1,000-100,000 samples), and high-dimensional feature spaces. They’re particularly effective for text classification, image recognition, and genomic data analysis. Avoid them for very large datasets or when you need probability estimates (though these can be approximated).

Setting Up Your R Environment

The e1071 package provides the core SVM implementation, while caret offers enhanced tuning capabilities and kernlab gives you more kernel options. Start by installing these packages and loading a dataset.

# Install required packages
install.packages(c("e1071", "caret", "kernlab"))

# Load libraries
library(e1071)
library(caret)
library(kernlab)

# Load the iris dataset
data(iris)

# Examine the structure
str(iris)
head(iris)

# For binary classification, filter to two species
iris_binary <- iris[iris$Species %in% c("versicolor", "virginica"), ]
iris_binary$Species <- droplevels(iris_binary$Species)

The iris dataset contains 150 observations of flower measurements across three species. We’ll start with binary classification using versicolor and virginica, which provides a cleaner learning example than multiclass problems.

Building a Basic SVM Classifier

A proper SVM implementation requires splitting your data into training and testing sets. Never evaluate on your training data—that’s a cardinal sin in machine learning.

# Set seed for reproducibility
set.seed(123)

# Create train/test split (70/30)
train_index <- createDataPartition(iris_binary$Species, p = 0.7, list = FALSE)
train_data <- iris_binary[train_index, ]
test_data <- iris_binary[-train_index, ]

# Train a linear SVM
svm_model <- svm(Species ~ ., 
                 data = train_data,
                 kernel = "linear",
                 cost = 1,
                 scale = TRUE)

# View model summary
summary(svm_model)

# Make predictions
predictions <- predict(svm_model, test_data)

# Calculate accuracy
accuracy <- sum(predictions == test_data$Species) / nrow(test_data)
cat("Accuracy:", round(accuracy * 100, 2), "%\n")

The scale = TRUE parameter standardizes features to zero mean and unit variance—critical for SVMs since they’re distance-based algorithms. The cost parameter controls the trade-off between smooth decision boundaries and classifying training points correctly. Higher cost means less tolerance for misclassification.

Understanding Kernel Functions

Kernel functions determine how your SVM transforms the feature space. The choice dramatically affects performance and should be based on your data’s characteristics.

Linear kernel: Use when data is linearly separable. Computationally efficient and interpretable.

Polynomial kernel: Captures polynomial relationships. Rarely used in practice due to numerical instability.

RBF (Radial Basis Function): The default choice for non-linear data. Works well when you have no prior knowledge about the data structure.

Sigmoid kernel: Behaves like a neural network. Less common but useful for specific domains.

# Compare different kernels
kernels <- c("linear", "polynomial", "radial", "sigmoid")
results <- data.frame(Kernel = character(), 
                      Accuracy = numeric(), 
                      stringsAsFactors = FALSE)

for (k in kernels) {
  # Train model
  model <- svm(Species ~ ., 
               data = train_data,
               kernel = k,
               cost = 1)
  
  # Predict and evaluate
  pred <- predict(model, test_data)
  acc <- sum(pred == test_data$Species) / nrow(test_data)
  
  results <- rbind(results, data.frame(Kernel = k, Accuracy = acc))
}

# Display results
print(results)

# Best performing kernel
best_kernel <- results[which.max(results$Accuracy), ]
cat("\nBest kernel:", best_kernel$Kernel, 
    "with accuracy:", round(best_kernel$Accuracy * 100, 2), "%\n")

In practice, start with RBF and linear kernels. RBF handles most non-linear cases effectively, while linear provides a baseline and works well for high-dimensional sparse data like text.

Hyperparameter Tuning

Default parameters rarely give optimal performance. The tune() function performs grid search with cross-validation to find the best hyperparameter combination.

# Define parameter grid for RBF kernel
tune_result <- tune(svm, Species ~ .,
                    data = train_data,
                    kernel = "radial",
                    ranges = list(
                      cost = c(0.1, 1, 10, 100),
                      gamma = c(0.01, 0.1, 1, 10)
                    ),
                    tunecontrol = tune.control(cross = 5)
)

# View tuning results
print(tune_result)

# Best parameters
best_params <- tune_result$best.parameters
cat("Best cost:", best_params$cost, "\n")
cat("Best gamma:", best_params$gamma, "\n")

# Train final model with best parameters
final_model <- svm(Species ~ .,
                   data = train_data,
                   kernel = "radial",
                   cost = best_params$cost,
                   gamma = best_params$gamma)

# Evaluate on test set
final_predictions <- predict(final_model, test_data)
final_accuracy <- sum(final_predictions == test_data$Species) / nrow(test_data)
cat("Tuned model accuracy:", round(final_accuracy * 100, 2), "%\n")

The cost parameter controls regularization—low values create smoother boundaries, high values fit training data more tightly. The gamma parameter (for RBF) defines how far the influence of a single training example reaches. Low gamma means far reach (smoother boundaries), high gamma means close reach (more complex boundaries).

Model Evaluation and Visualization

Accuracy alone doesn’t tell the full story. Use confusion matrices to understand misclassification patterns and visualize decision boundaries to build intuition.

# Confusion matrix
conf_matrix <- confusionMatrix(final_predictions, test_data$Species)
print(conf_matrix)

# Extract metrics
precision <- conf_matrix$byClass['Precision']
recall <- conf_matrix$byClass['Recall']
f1 <- conf_matrix$byClass['F1']

cat("Precision:", round(precision, 3), "\n")
cat("Recall:", round(recall, 3), "\n")
cat("F1 Score:", round(f1, 3), "\n")

# Visualize decision boundary (using 2 features for 2D plot)
library(ggplot2)

# Train model with just 2 features for visualization
viz_model <- svm(Species ~ Petal.Length + Petal.Width,
                 data = train_data,
                 kernel = "radial",
                 cost = best_params$cost,
                 gamma = best_params$gamma)

# Create grid for decision boundary
grid_x <- seq(min(iris_binary$Petal.Length), 
              max(iris_binary$Petal.Length), 
              length.out = 100)
grid_y <- seq(min(iris_binary$Petal.Width), 
              max(iris_binary$Petal.Width), 
              length.out = 100)
grid <- expand.grid(Petal.Length = grid_x, Petal.Width = grid_y)

# Predict on grid
grid$Species <- predict(viz_model, grid)

# Plot
ggplot() +
  geom_tile(data = grid, aes(x = Petal.Length, y = Petal.Width, fill = Species), 
            alpha = 0.3) +
  geom_point(data = iris_binary, aes(x = Petal.Length, y = Petal.Width, 
                                     color = Species), size = 3) +
  labs(title = "SVM Decision Boundary",
       x = "Petal Length", y = "Petal Width") +
  theme_minimal()

The decision boundary visualization reveals how your SVM separates classes in feature space. Look for overfitting (overly complex boundaries) or underfitting (classes overlap significantly).

Real-World Application

Let’s apply SVM to a customer churn prediction problem—a common business use case where you predict whether customers will leave your service.

# Simulate customer churn data
set.seed(456)
n <- 1000

customer_data <- data.frame(
  tenure_months = rnorm(n, 24, 12),
  monthly_charges = rnorm(n, 65, 20),
  support_calls = rpois(n, 2),
  contract_type = sample(c("Monthly", "Annual"), n, replace = TRUE),
  churn = factor(sample(c("Yes", "No"), n, replace = TRUE, prob = c(0.3, 0.7)))
)

# Convert categorical to numeric
customer_data$contract_numeric <- ifelse(customer_data$contract_type == "Annual", 1, 0)

# Prepare data
features <- c("tenure_months", "monthly_charges", "support_calls", "contract_numeric")
train_idx <- createDataPartition(customer_data$churn, p = 0.7, list = FALSE)
train_churn <- customer_data[train_idx, ]
test_churn <- customer_data[-train_idx, ]

# Tune and train model
churn_tune <- tune(svm, churn ~ tenure_months + monthly_charges + 
                     support_calls + contract_numeric,
                   data = train_churn,
                   kernel = "radial",
                   ranges = list(cost = c(1, 10, 100),
                                gamma = c(0.01, 0.1, 1)))

# Final model
churn_model <- churn_tune$best.model

# Predictions on test set
churn_pred <- predict(churn_model, test_churn)

# Evaluation
churn_conf <- confusionMatrix(churn_pred, test_churn$churn)
print(churn_conf)

# Predict for new customers
new_customers <- data.frame(
  tenure_months = c(6, 36, 12),
  monthly_charges = c(85, 45, 70),
  support_calls = c(5, 0, 2),
  contract_numeric = c(0, 1, 0)
)

new_predictions <- predict(churn_model, new_customers)
cat("Churn predictions for new customers:\n")
print(new_predictions)

This workflow demonstrates production-ready SVM implementation: data preprocessing, feature engineering, hyperparameter tuning, evaluation, and inference on new data. In real applications, add feature scaling, handle missing values, and implement proper validation strategies.

SVMs remain powerful tools for classification tasks despite the rise of deep learning. They require less data, train faster on small to medium datasets, and provide more interpretable results. Master these fundamentals, and you’ll have a reliable algorithm for your machine learning toolkit.