R - S3 and S4 Classes (OOP)
R implements object-oriented programming differently than languages like Java or Python. Instead of methods belonging to objects, R uses generic functions that dispatch to appropriate methods based...
Key Insights
- S3 classes use generic function dispatch based on class attributes, offering simple, flexible object-oriented programming with minimal overhead but no formal structure enforcement
- S4 classes provide formal class definitions with strict slot typing, method signatures, and validation, making them ideal for robust packages requiring type safety
- Choosing between S3 and S4 depends on your needs: use S3 for quick prototyping and internal tools, S4 for production packages requiring maintainability and type guarantees
Understanding R’s Object Systems
R implements object-oriented programming differently than languages like Java or Python. Instead of methods belonging to objects, R uses generic functions that dispatch to appropriate methods based on object class. This functional OOP approach comes in two primary flavors: S3 (simple) and S4 (formal).
S3 emerged from the S language and remains R’s most common OOP system due to its simplicity. S4 was developed later to address S3’s lack of formal structure, providing rigorous class definitions with type checking.
S3 Classes: Simplicity and Flexibility
S3 classes are essentially lists with a class attribute. Creating an S3 object requires no formal class definition—just set the class attribute and you’re done.
# Create a simple S3 class
create_person <- function(name, age) {
person <- list(
name = name,
age = age
)
class(person) <- "Person"
return(person)
}
john <- create_person("John Doe", 30)
print(john)
Define methods by creating functions with the pattern generic.class:
# Define a print method for Person class
print.Person <- function(x, ...) {
cat("Person:", x$name, "\n")
cat("Age:", x$age, "years old\n")
}
# Define a summary method
summary.Person <- function(object, ...) {
cat("Summary for", object$name, "\n")
cat("Age category:",
if(object$age < 18) "Minor" else "Adult", "\n")
}
print(john)
summary(john)
Create your own generic functions when needed:
# Create a custom generic
celebrate_birthday <- function(x) {
UseMethod("celebrate_birthday")
}
# Implement method for Person class
celebrate_birthday.Person <- function(x) {
x$age <- x$age + 1
cat(x$name, "is now", x$age, "years old!\n")
return(x)
}
# Default method for other classes
celebrate_birthday.default <- function(x) {
cat("Don't know how to celebrate birthday for this object\n")
}
john <- celebrate_birthday(john)
S3 inheritance works through vector class attributes:
create_employee <- function(name, age, employee_id, department) {
emp <- create_person(name, age)
emp$employee_id <- employee_id
emp$department <- department
class(emp) <- c("Employee", "Person")
return(emp)
}
print.Employee <- function(x, ...) {
NextMethod() # Call print.Person
cat("Employee ID:", x$employee_id, "\n")
cat("Department:", x$department, "\n")
}
jane <- create_employee("Jane Smith", 28, "EMP001", "Engineering")
print(jane)
S4 Classes: Formal and Type-Safe
S4 requires explicit class definitions using setClass(), with named slots that have specified types:
library(methods)
# Define S4 class with formal structure
setClass("PersonS4",
slots = c(
name = "character",
age = "numeric",
email = "character"
),
prototype = list(
name = NA_character_,
age = NA_real_,
email = NA_character_
)
)
# Create instance using new()
person_s4 <- new("PersonS4",
name = "Alice Johnson",
age = 35,
email = "alice@example.com")
# Access slots with @ operator
person_s4@name
person_s4@age
Add validation to ensure data integrity:
setClass("PersonS4",
slots = c(
name = "character",
age = "numeric",
email = "character"
),
validity = function(object) {
errors <- character()
if (length(object@name) != 1 || nchar(object@name) == 0) {
errors <- c(errors, "Name must be a single non-empty string")
}
if (length(object@age) != 1 || object@age < 0 || object@age > 150) {
errors <- c(errors, "Age must be between 0 and 150")
}
if (length(object@email) != 1 || !grepl("@", object@email)) {
errors <- c(errors, "Email must contain @")
}
if (length(errors) == 0) TRUE else errors
}
)
# This will fail validation
tryCatch(
new("PersonS4", name = "Bob", age = -5, email = "invalid"),
error = function(e) cat("Validation error:", e$message, "\n")
)
S4 methods use setMethod() with formal generic functions:
# Define methods using setMethod
setMethod("show", "PersonS4", function(object) {
cat("PersonS4 object\n")
cat("Name:", object@name, "\n")
cat("Age:", object@age, "\n")
cat("Email:", object@email, "\n")
})
# Create custom generic and method
setGeneric("getAgeCategory", function(x) standardGeneric("getAgeCategory"))
setMethod("getAgeCategory", "PersonS4", function(x) {
age <- x@age
if (age < 18) return("Minor")
if (age < 65) return("Adult")
return("Senior")
})
person_s4 <- new("PersonS4",
name = "Alice Johnson",
age = 35,
email = "alice@example.com")
show(person_s4)
getAgeCategory(person_s4)
S4 inheritance uses the contains parameter:
setClass("EmployeeS4",
contains = "PersonS4",
slots = c(
employee_id = "character",
department = "character",
salary = "numeric"
),
validity = function(object) {
errors <- character()
if (object@salary < 0) {
errors <- c(errors, "Salary must be non-negative")
}
if (length(errors) == 0) TRUE else errors
}
)
setMethod("show", "EmployeeS4", function(object) {
callNextMethod() # Call PersonS4 show method
cat("Employee ID:", object@employee_id, "\n")
cat("Department:", object@department, "\n")
cat("Salary: $", format(object@salary, big.mark = ","), "\n", sep = "")
})
emp_s4 <- new("EmployeeS4",
name = "Bob Wilson",
age = 42,
email = "bob@company.com",
employee_id = "EMP002",
department = "Sales",
salary = 75000)
show(emp_s4)
Practical Comparison: Building a Data Container
Here’s a real-world example showing both approaches for a data analysis class:
# S3 approach
create_dataset_s3 <- function(data, name, description = "") {
obj <- list(
data = as.data.frame(data),
name = name,
description = description,
created = Sys.time()
)
class(obj) <- "DatasetS3"
obj
}
summary.DatasetS3 <- function(object, ...) {
cat("Dataset:", object$name, "\n")
cat("Rows:", nrow(object$data), "Columns:", ncol(object$data), "\n")
cat("Created:", format(object$created), "\n")
}
# S4 approach
setClass("DatasetS4",
slots = c(
data = "data.frame",
name = "character",
description = "character",
created = "POSIXct"
),
prototype = list(
description = "",
created = Sys.time()
),
validity = function(object) {
if (nrow(object@data) == 0) {
return("Dataset cannot be empty")
}
TRUE
}
)
setMethod("summary", "DatasetS4", function(object, ...) {
cat("Dataset:", object@name, "\n")
cat("Rows:", nrow(object@data), "Columns:", ncol(object@data), "\n")
cat("Created:", format(object@created), "\n")
})
# Usage comparison
ds_s3 <- create_dataset_s3(mtcars, "Motor Trends", "Car data")
ds_s4 <- new("DatasetS4", data = mtcars, name = "Motor Trends")
summary(ds_s3)
summary(ds_s4)
When to Use Each System
Use S3 when you need quick prototyping, internal tools, or simple data structures. S3’s minimal overhead makes it perfect for exploratory work and situations where flexibility matters more than strict contracts. Most base R functions use S3.
Use S4 when building production packages, working with complex class hierarchies, or requiring guaranteed type safety. S4’s formal structure prevents many runtime errors and makes code more maintainable. Bioconductor packages predominantly use S4.
For modern R development, also consider R6 classes (reference semantics) or the newer S7 system, but S3 and S4 remain the foundation of R’s object-oriented programming and understanding them is essential for working with the R ecosystem.