R - Install and Load Packages

R packages extend base functionality through collections of functions, data, and documentation. The primary installation source is CRAN (Comprehensive R Archive Network), accessed through...

Key Insights

  • R’s package ecosystem contains over 19,000 packages on CRAN, with install.packages() for installation and library() or require() for loading into active sessions
  • Package management strategies including version pinning with renv, automated dependency resolution, and proper namespace handling prevent production environment failures
  • Understanding the difference between attached packages (via library()) and namespace loading (via ::) enables better memory management and reduces naming conflicts

Package Installation Fundamentals

R packages extend base functionality through collections of functions, data, and documentation. The primary installation source is CRAN (Comprehensive R Archive Network), accessed through install.packages().

# Basic installation from CRAN
install.packages("dplyr")

# Install multiple packages
install.packages(c("ggplot2", "tidyr", "readr"))

# Specify repository explicitly
install.packages("data.table", 
                 repos = "https://cloud.r-project.org")

# Install without dependencies (not recommended)
install.packages("shiny", dependencies = FALSE)

The dependencies parameter controls what gets installed alongside your target package. Setting it to TRUE installs Imports, Depends, and LinkingTo packages. Use dependencies = NA (default) for recommended dependencies only.

# Install with all dependencies including Suggests
install.packages("caret", dependencies = TRUE)

# Check where packages are installed
.libPaths()

# Install to specific library location
install.packages("forecast", 
                 lib = "~/R/custom_library")

Loading Packages Into Your Session

Installing a package downloads it to disk; loading makes its functions available in your current R session.

# Standard loading method
library(dplyr)

# Load and check success with require()
if (!require(ggplot2)) {
  install.packages("ggplot2")
  library(ggplot2)
}

# Load from specific library location
library(forecast, lib.loc = "~/R/custom_library")

# Suppress startup messages
library(tidyverse, quietly = TRUE)
suppressPackageStartupMessages(library(sf))

The library() function attaches the package to the search path, making all exported functions directly accessible. The require() function returns FALSE if loading fails instead of throwing an error, making it useful in conditional logic.

Namespace Management and the Double Colon Operator

Using the :: operator accesses package functions without loading the entire package, reducing memory overhead and avoiding namespace conflicts.

# Call function without loading package
dplyr::filter(mtcars, mpg > 20)

# Useful for one-off function calls
result <- purrr::map(1:10, sqrt)

# Access non-exported (internal) functions
internal_func <- package:::hidden_function()

# Compare memory usage
library(data.table)  # Loads entire package
# vs
data.table::fread("file.csv")  # Only loads what's needed

This approach is particularly valuable in package development and production scripts where you need explicit dependency tracking.

Managing Package Versions and Dependencies

Production environments require reproducible package states. The renv package creates project-specific package libraries with version locking.

# Initialize renv for project
install.packages("renv")
renv::init()

# Snapshot current package state
renv::snapshot()

# Restore packages from lockfile
renv::restore()

# Update specific package
renv::update("dplyr")

# Check package status
renv::status()

The renv.lock file stores exact package versions, ensuring consistent behavior across development, testing, and production environments.

# Install specific package version (without renv)
require(devtools)
install_version("ggplot2", version = "3.3.5", 
                repos = "http://cran.us.r-project.org")

# Check installed package version
packageVersion("dplyr")

# List all installed packages with versions
installed.packages()[, c("Package", "Version")]

Installing from Alternative Sources

Beyond CRAN, packages come from GitHub, Bioconductor, and local sources.

# Install from GitHub
install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

# Install specific branch or commit
devtools::install_github("user/repo@dev-branch")
devtools::install_github("user/repo@a1b2c3d")

# Install from Bioconductor
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("GenomicRanges")

# Install from local source
install.packages("~/packages/mypackage_1.0.tar.gz", 
                 repos = NULL, type = "source")

# Install from local directory (development)
devtools::install_local("~/dev/mypackage")

Checking, Updating, and Removing Packages

Regular maintenance prevents compatibility issues and security vulnerabilities.

# Check for package updates
old.packages()

# Update all packages
update.packages(ask = FALSE)

# Update specific packages
update.packages(oldPkgs = c("dplyr", "ggplot2"))

# Remove package
remove.packages("unused_package")

# Check if package is installed
if (system.file(package = "dplyr") != "") {
  print("dplyr is installed")
}

# List loaded packages
(.packages())

# List all attached packages including base
search()

# Detach package from session
detach("package:dplyr", unload = TRUE)

Automated Package Management in Scripts

Production scripts should handle package dependencies programmatically.

# Function to ensure packages are installed and loaded
ensure_packages <- function(packages) {
  new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
  if(length(new_packages)) {
    install.packages(new_packages, dependencies = TRUE)
  }
  invisible(lapply(packages, library, character.only = TRUE))
}

# Usage
required_packages <- c("dplyr", "ggplot2", "readr", "tidyr")
ensure_packages(required_packages)

# Alternative with pacman package
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, readr, tidyr)

# Load packages with error handling
safe_load <- function(pkg) {
  tryCatch({
    library(pkg, character.only = TRUE)
    return(TRUE)
  }, error = function(e) {
    message(sprintf("Failed to load %s: %s", pkg, e$message))
    return(FALSE)
  })
}

sapply(c("dplyr", "ggplot2"), safe_load)

Package Installation in Docker and CI/CD

Containerized environments require deterministic package installation.

# Dockerfile approach with renv
# COPY renv.lock .
# RUN R -e "renv::restore()"

# Install packages with specific CRAN snapshot
options(repos = c(CRAN = "https://packagemanager.rstudio.com/cran/2023-09-01"))
install.packages(c("dplyr", "ggplot2"))

# Parallel installation for faster builds
install.packages(c("tidyverse", "data.table"), 
                 Ncpus = parallel::detectCores())

# Verify installation in CI
stopifnot(require(dplyr))
stopifnot(packageVersion("dplyr") >= "1.0.0")

Troubleshooting Common Issues

Package installation failures typically stem from missing system dependencies, permission issues, or network problems.

# Check compilation tools availability
pkgbuild::check_build_tools()

# Install with verbose output
install.packages("sf", verbose = TRUE)

# Clear package cache
unlink(list.files(tempdir(), pattern = "^downloaded_packages", 
                  full.names = TRUE), recursive = TRUE)

# Check package load errors
tryCatch(
  library(problematic_package),
  error = function(e) print(e)
)

# Reinstall package from source
install.packages("package_name", type = "source")

# Check for conflicts between packages
conflicts(detail = TRUE)

Understanding R’s package management system enables reliable, reproducible analytics workflows. Use renv for project isolation, prefer :: notation for clarity in production code, and implement automated dependency checks in deployment pipelines.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.