How to Calculate Feature Importance in R

Feature importance is one of the most practical tools in a data scientist's arsenal. It answers fundamental questions: Which variables actually drive your model's predictions? Where should you focus...

Key Insights

  • Tree-based models provide built-in feature importance metrics, but permutation importance offers a more reliable, model-agnostic alternative that works across any algorithm.
  • SHAP values deliver the most theoretically sound approach to feature importance by providing both global feature rankings and local explanations for individual predictions.
  • Different importance methods can rank features differently—always validate findings with multiple approaches and consider correlation structures in your data before making feature selection decisions.

Feature importance is one of the most practical tools in a data scientist’s arsenal. It answers fundamental questions: Which variables actually drive your model’s predictions? Where should you focus data collection efforts? Which features can you safely remove to simplify your model?

Understanding feature importance goes beyond academic curiosity. It directly impacts model debugging, stakeholder communication, and feature engineering decisions. A feature that shows high importance might warrant additional engineering effort, while low-importance features consuming computational resources become candidates for removal.

R offers multiple approaches to calculating feature importance, each with distinct trade-offs. Model-specific methods leverage internal model structures for fast computation. Model-agnostic methods like permutation importance and SHAP values work across any algorithm but require more computation. Let’s explore each approach with practical examples.

Tree-Based Model Feature Importance

Random forests and gradient boosting machines provide built-in importance metrics. These methods are computationally efficient since they extract importance during model training.

library(randomForest)
library(xgboost)
library(ggplot2)

# Load example data
data(iris)
set.seed(123)

# Train random forest
rf_model <- randomForest(Species ~ ., data = iris, importance = TRUE, ntree = 500)

# Extract importance
importance_df <- as.data.frame(importance(rf_model))
importance_df$feature <- rownames(importance_df)

# Visualize Gini importance
ggplot(importance_df, aes(x = reorder(feature, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Random Forest Feature Importance (Gini)",
       x = "Feature", y = "Mean Decrease in Gini") +
  theme_minimal()

Random forests calculate two importance metrics: Mean Decrease in Gini (MDG) and Mean Decrease in Accuracy (MDA). Gini importance measures how much each feature contributes to node purity across all trees. Higher values indicate features that create cleaner splits.

For XGBoost, the process is similar but offers additional importance types:

# Prepare data for XGBoost
iris_encoded <- iris
iris_encoded$Species <- as.numeric(iris$Species) - 1

train_matrix <- xgb.DMatrix(
  data = as.matrix(iris_encoded[, 1:4]),
  label = iris_encoded$Species
)

# Train XGBoost model
xgb_model <- xgb.train(
  params = list(objective = "multi:softmax", num_class = 3),
  data = train_matrix,
  nrounds = 100,
  verbose = 0
)

# Extract importance
xgb_importance <- xgb.importance(model = xgb_model)
xgb.plot.importance(xgb_importance, main = "XGBoost Feature Importance")

XGBoost provides gain (average improvement in accuracy when the feature is used), cover (relative number of observations affected), and frequency (percentage of times the feature appears in trees). Gain typically provides the most interpretable metric for feature importance.

The limitation? These metrics can be biased toward high-cardinality features and don’t account for feature interactions in a theoretically rigorous way.

Permutation Feature Importance

Permutation importance offers a model-agnostic alternative with stronger theoretical foundations. The concept is elegant: randomly shuffle a feature’s values and measure how much model performance degrades. Important features cause significant performance drops when permuted.

library(vip)
library(caret)

# Train a model (works with any model type)
set.seed(123)
train_control <- trainControl(method = "cv", number = 5)
glm_model <- train(Species ~ ., data = iris, method = "glm", 
                   trControl = train_control, family = "binomial")

# Calculate permutation importance
perm_importance <- vi(glm_model, method = "permute", 
                      target = "Species", metric = "accuracy",
                      pred_wrapper = predict, train = iris)

# Visualize
vip(perm_importance, num_features = 4, geom = "col", 
    aesthetics = list(fill = "coral")) +
  labs(title = "Permutation Feature Importance") +
  theme_minimal()

The vip package simplifies permutation importance across different model types. It handles the shuffling, prediction, and metric calculation automatically.

For more control, the iml package offers detailed permutation importance with confidence intervals:

library(iml)

# Create predictor object
predictor <- Predictor$new(rf_model, data = iris[, -5], y = iris$Species)

# Calculate permutation importance
importance_iml <- FeatureImp$new(predictor, loss = "ce", n.repetitions = 50)

# Plot with error bars
plot(importance_iml) +
  labs(title = "Permutation Importance with Confidence Intervals") +
  theme_minimal()

Permutation importance works with any model—linear models, neural networks, ensemble methods—making it invaluable for comparing feature importance across different algorithms. The trade-off is computational cost, especially with large datasets and many permutation rounds.

SHAP Values for Feature Importance

SHAP (SHapley Additive exPlanations) values represent the gold standard for feature importance, grounded in game theory. SHAP assigns each feature a contribution value for every prediction, then aggregates these for global importance.

library(fastshap)
library(shapviz)

# Create prediction wrapper function
pfun <- function(object, newdata) {
  predict(object, newdata, type = "prob")[, 1]
}

# Calculate SHAP values (this may take a moment)
set.seed(123)
shap_values <- explain(rf_model, X = iris[, -5], 
                       pred_wrapper = pfun, 
                       nsim = 50)

# Convert to shapviz object for better plotting
shap_viz <- shapviz(shap_values, X = iris[, -5])

# Summary plot showing global importance
sv_importance(shap_viz, kind = "beeswarm") +
  labs(title = "SHAP Feature Importance") +
  theme_minimal()

SHAP values provide both global and local interpretability. The summary plot above shows global importance, but you can examine individual predictions:

# SHAP values for a single prediction
sv_waterfall(shap_viz, row_id = 1) +
  labs(title = "SHAP Explanation for Single Prediction")

# Dependence plot showing feature interactions
sv_dependence(shap_viz, v = "Petal.Length") +
  labs(title = "SHAP Dependence Plot: Petal Length")

SHAP’s theoretical soundness comes at a computational cost. For large datasets or complex models, calculating SHAP values can be prohibitively slow. The fastshap package uses Monte Carlo sampling to approximate SHAP values more quickly, but even this requires careful consideration with big data.

Comparing Feature Importance Methods

Different methods can produce different rankings. Understanding why helps you choose the right approach:

library(dplyr)
library(tidyr)

# Compile importance from different methods
comparison_df <- data.frame(
  Feature = importance_df$feature,
  Gini = importance_df$MeanDecreaseGini,
  Permutation = perm_importance$Importance,
  SHAP = colMeans(abs(shap_values))
)

# Normalize to 0-1 scale for comparison
comparison_normalized <- comparison_df %>%
  mutate(across(Gini:SHAP, ~ (. - min(.)) / (max(.) - min(.))))

# Reshape for plotting
comparison_long <- comparison_normalized %>%
  pivot_longer(cols = Gini:SHAP, names_to = "Method", values_to = "Importance")

# Compare methods side-by-side
ggplot(comparison_long, aes(x = reorder(Feature, Importance), 
                            y = Importance, fill = Method)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(title = "Feature Importance Comparison Across Methods",
       x = "Feature", y = "Normalized Importance") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")

This comparison reveals how methods disagree. Gini importance might favor features that create many splits, while SHAP values account for feature interactions more rigorously.

Practical Considerations and Best Practices

Handling Correlated Features: When features are highly correlated, importance scores become unstable. Permutation importance and SHAP can split importance between correlated features unpredictably. Consider removing redundant features before calculating importance, or use domain knowledge to group correlated features.

Computational Trade-offs: Gini importance is nearly free—it’s calculated during training. Permutation importance requires retraining or re-predicting multiple times. SHAP values are most expensive, potentially requiring thousands of predictions per observation. For initial exploration, start with built-in metrics. Use permutation importance for model-agnostic comparison. Reserve SHAP for final model interpretation and stakeholder presentations.

Method Selection Guide:

  • Use Gini/Gain importance for quick exploration with tree-based models
  • Use permutation importance when comparing across model types or when you need more reliable rankings
  • Use SHAP values when you need both global importance and local explanations, or when presenting to non-technical stakeholders

Validation Strategy: Never rely on a single importance metric. Calculate importance using multiple methods and look for consensus. Features that rank highly across all methods are robust candidates for retention. Features with inconsistent rankings deserve deeper investigation.

The best approach combines multiple methods. Start with fast, model-specific metrics for initial screening. Validate important features using permutation importance. For final model interpretation and communication, invest in SHAP values for their theoretical rigor and intuitive visualizations.

Feature importance isn’t just about numbers—it’s about understanding your model’s behavior and making informed decisions about your machine learning pipeline. Choose your methods based on your specific needs: speed for exploration, reliability for feature selection, or interpretability for stakeholder communication.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.