R - Arrays with Examples

Arrays are homogeneous data structures that extend beyond two dimensions. While vectors are one-dimensional and matrices are two-dimensional, arrays can have any number of dimensions. All elements...

Key Insights

  • Arrays in R extend vectors and matrices to multiple dimensions, enabling efficient storage and manipulation of multi-dimensional data structures with consistent data types
  • R provides comprehensive indexing capabilities including numeric, logical, and named indexing across all dimensions, plus powerful subsetting operations for complex data extraction
  • Array operations leverage R’s vectorization and apply family functions for high-performance computations across dimensions without explicit loops

Understanding Arrays in R

Arrays are homogeneous data structures that extend beyond two dimensions. While vectors are one-dimensional and matrices are two-dimensional, arrays can have any number of dimensions. All elements must be of the same data type.

# Creating a basic 3D array
arr <- array(1:24, dim = c(4, 3, 2))
print(arr)

# Check structure
str(arr)
# num [1:4, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ...

# Dimensions
dim(arr)
# [1] 4 3 2

The array fills by column within each matrix slice. The first dimension varies fastest, followed by the second, then the third.

Creating Arrays with Different Data Types

# Numeric array
numeric_arr <- array(seq(1, 100, by = 5), dim = c(5, 4))

# Character array
char_arr <- array(LETTERS[1:12], dim = c(2, 3, 2))
print(char_arr)

# Logical array
logical_arr <- array(c(TRUE, FALSE), dim = c(3, 3, 2))

# Initialize with specific value
zero_arr <- array(0, dim = c(3, 4, 2))

# From existing vectors
vec1 <- 1:12
vec2 <- 13:24
combined_arr <- array(c(vec1, vec2), dim = c(3, 4, 2))

Naming Dimensions and Elements

Dimension names improve code readability and enable name-based indexing.

# Create array with dimension names
sales <- array(
  data = c(100, 150, 200, 120, 180, 220, 
           110, 160, 210, 130, 190, 230),
  dim = c(2, 3, 2),
  dimnames = list(
    Region = c("North", "South"),
    Quarter = c("Q1", "Q2", "Q3"),
    Year = c("2023", "2024")
  )
)

print(sales)

# Access dimension names
dimnames(sales)

# Get specific dimension names
dimnames(sales)[[1]]  # Region names
names(dimnames(sales))  # Dimension labels

Indexing and Subsetting Arrays

Array indexing follows the pattern [dim1, dim2, dim3, ...]. Empty positions return all elements in that dimension.

# Create sample array
arr <- array(1:24, dim = c(4, 3, 2))

# Single element
arr[2, 3, 1]  # 10

# Entire row from first matrix
arr[1, , 1]  # Returns vector

# Entire column from second matrix
arr[, 2, 2]  # Returns vector

# Specific matrix slice
arr[, , 1]  # First 4x3 matrix

# Multiple elements
arr[1:2, 1:2, 1]  # 2x2 subarray

# Using negative indices (exclude)
arr[-1, , 1]  # All but first row

# Logical indexing
arr[arr > 15]  # Returns vector of matching elements

# Named indexing
sales["North", "Q1", "2023"]
sales["North", , "2023"]  # All quarters for North in 2023

Modifying Array Elements

arr <- array(1:24, dim = c(4, 3, 2))

# Modify single element
arr[1, 1, 1] <- 100

# Modify entire slice
arr[, , 1] <- arr[, , 1] * 2

# Modify using logical condition
arr[arr < 10] <- 0

# Replace specific row across all slices
arr[1, , ] <- 999

# Conditional replacement
arr <- array(1:24, dim = c(4, 3, 2))
arr[arr %% 2 == 0] <- arr[arr %% 2 == 0] * 10  # Multiply even numbers by 10

Array Operations and Arithmetic

R’s vectorization applies to arrays, enabling element-wise operations.

arr1 <- array(1:12, dim = c(3, 2, 2))
arr2 <- array(13:24, dim = c(3, 2, 2))

# Element-wise operations
sum_arr <- arr1 + arr2
diff_arr <- arr1 - arr2
prod_arr <- arr1 * arr2
quot_arr <- arr2 / arr1

# Scalar operations
scaled_arr <- arr1 * 10
shifted_arr <- arr1 + 100

# Mathematical functions
sqrt_arr <- sqrt(arr1)
log_arr <- log(arr1)
exp_arr <- exp(arr1)

# Comparison operations
comparison <- arr1 > 5  # Returns logical array

Apply Functions Across Dimensions

The apply() function is crucial for array operations, allowing computation across specific dimensions.

arr <- array(1:24, dim = c(4, 3, 2))

# Sum across first dimension (rows)
apply(arr, c(2, 3), sum)  # Returns 3x2 matrix

# Mean across second dimension (columns)
apply(arr, c(1, 3), mean)  # Returns 4x2 matrix

# Max across third dimension
apply(arr, c(1, 2), max)  # Returns 4x3 matrix

# Custom function
apply(arr, 3, function(x) sum(x^2))

# Multiple statistics
apply(arr, 3, summary)

Practical Example: Multi-Year Sales Analysis

# Create sales data: Products x Regions x Years
products <- c("Product_A", "Product_B", "Product_C")
regions <- c("East", "West", "North", "South")
years <- c("2022", "2023", "2024")

set.seed(123)
sales_data <- array(
  data = sample(1000:5000, 36),
  dim = c(3, 4, 3),
  dimnames = list(
    Product = products,
    Region = regions,
    Year = years
  )
)

# Total sales by year
yearly_totals <- apply(sales_data, 3, sum)
print(yearly_totals)

# Average sales by product across all regions and years
product_avg <- apply(sales_data, 1, mean)
print(product_avg)

# Best performing region per year
best_region_per_year <- apply(sales_data, 3, function(year_data) {
  region_totals <- apply(year_data, 2, sum)
  names(which.max(region_totals))
})
print(best_region_per_year)

# Year-over-year growth by product
yoy_growth <- function(product_idx) {
  product_sales <- sales_data[product_idx, , ]
  yearly_sales <- apply(product_sales, 2, sum)
  growth <- diff(yearly_sales) / yearly_sales[-length(yearly_sales)] * 100
  names(growth) <- paste(years[-length(years)], "to", years[-1])
  growth
}

sapply(1:3, yoy_growth)

Reshaping and Transforming Arrays

arr <- array(1:24, dim = c(4, 3, 2))

# Convert to vector
as.vector(arr)

# Convert to matrix (flattens last dimension)
matrix(arr, nrow = 4)

# Transpose dimensions
aperm(arr, c(2, 1, 3))  # Swap first two dimensions

# Reshape with different dimensions
dim(arr) <- c(2, 6, 2)  # Must preserve total elements
print(arr)

# Convert to data frame for analysis
df <- as.data.frame.table(arr)
names(df) <- c("Dim1", "Dim2", "Dim3", "Value")
head(df)

Performance Considerations

# Pre-allocate arrays for better performance
n <- 1000
result <- array(0, dim = c(n, n, 10))

# Vectorized operations are faster than loops
system.time({
  arr1 <- array(rnorm(1000000), dim = c(100, 100, 100))
  arr2 <- arr1 * 2 + 5  # Vectorized
})

# Use apply instead of nested loops
arr <- array(1:1000000, dim = c(100, 100, 100))

system.time({
  result <- apply(arr, c(1, 2), mean)
})

# Memory-efficient subsetting
large_arr <- array(rnorm(10000000), dim = c(100, 100, 1000))
subset <- large_arr[, , 1:10]  # Only extracts needed data

Arrays provide essential functionality for multi-dimensional data analysis in R. Their integration with vectorization and the apply family makes them powerful tools for statistical computing, scientific simulations, and data transformations requiring more than two dimensions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.