R - Arrays with Examples
Arrays are homogeneous data structures that extend beyond two dimensions. While vectors are one-dimensional and matrices are two-dimensional, arrays can have any number of dimensions. All elements...
Key Insights
- Arrays in R extend vectors and matrices to multiple dimensions, enabling efficient storage and manipulation of multi-dimensional data structures with consistent data types
- R provides comprehensive indexing capabilities including numeric, logical, and named indexing across all dimensions, plus powerful subsetting operations for complex data extraction
- Array operations leverage R’s vectorization and apply family functions for high-performance computations across dimensions without explicit loops
Understanding Arrays in R
Arrays are homogeneous data structures that extend beyond two dimensions. While vectors are one-dimensional and matrices are two-dimensional, arrays can have any number of dimensions. All elements must be of the same data type.
# Creating a basic 3D array
arr <- array(1:24, dim = c(4, 3, 2))
print(arr)
# Check structure
str(arr)
# num [1:4, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
# Dimensions
dim(arr)
# [1] 4 3 2
The array fills by column within each matrix slice. The first dimension varies fastest, followed by the second, then the third.
Creating Arrays with Different Data Types
# Numeric array
numeric_arr <- array(seq(1, 100, by = 5), dim = c(5, 4))
# Character array
char_arr <- array(LETTERS[1:12], dim = c(2, 3, 2))
print(char_arr)
# Logical array
logical_arr <- array(c(TRUE, FALSE), dim = c(3, 3, 2))
# Initialize with specific value
zero_arr <- array(0, dim = c(3, 4, 2))
# From existing vectors
vec1 <- 1:12
vec2 <- 13:24
combined_arr <- array(c(vec1, vec2), dim = c(3, 4, 2))
Naming Dimensions and Elements
Dimension names improve code readability and enable name-based indexing.
# Create array with dimension names
sales <- array(
data = c(100, 150, 200, 120, 180, 220,
110, 160, 210, 130, 190, 230),
dim = c(2, 3, 2),
dimnames = list(
Region = c("North", "South"),
Quarter = c("Q1", "Q2", "Q3"),
Year = c("2023", "2024")
)
)
print(sales)
# Access dimension names
dimnames(sales)
# Get specific dimension names
dimnames(sales)[[1]] # Region names
names(dimnames(sales)) # Dimension labels
Indexing and Subsetting Arrays
Array indexing follows the pattern [dim1, dim2, dim3, ...]. Empty positions return all elements in that dimension.
# Create sample array
arr <- array(1:24, dim = c(4, 3, 2))
# Single element
arr[2, 3, 1] # 10
# Entire row from first matrix
arr[1, , 1] # Returns vector
# Entire column from second matrix
arr[, 2, 2] # Returns vector
# Specific matrix slice
arr[, , 1] # First 4x3 matrix
# Multiple elements
arr[1:2, 1:2, 1] # 2x2 subarray
# Using negative indices (exclude)
arr[-1, , 1] # All but first row
# Logical indexing
arr[arr > 15] # Returns vector of matching elements
# Named indexing
sales["North", "Q1", "2023"]
sales["North", , "2023"] # All quarters for North in 2023
Modifying Array Elements
arr <- array(1:24, dim = c(4, 3, 2))
# Modify single element
arr[1, 1, 1] <- 100
# Modify entire slice
arr[, , 1] <- arr[, , 1] * 2
# Modify using logical condition
arr[arr < 10] <- 0
# Replace specific row across all slices
arr[1, , ] <- 999
# Conditional replacement
arr <- array(1:24, dim = c(4, 3, 2))
arr[arr %% 2 == 0] <- arr[arr %% 2 == 0] * 10 # Multiply even numbers by 10
Array Operations and Arithmetic
R’s vectorization applies to arrays, enabling element-wise operations.
arr1 <- array(1:12, dim = c(3, 2, 2))
arr2 <- array(13:24, dim = c(3, 2, 2))
# Element-wise operations
sum_arr <- arr1 + arr2
diff_arr <- arr1 - arr2
prod_arr <- arr1 * arr2
quot_arr <- arr2 / arr1
# Scalar operations
scaled_arr <- arr1 * 10
shifted_arr <- arr1 + 100
# Mathematical functions
sqrt_arr <- sqrt(arr1)
log_arr <- log(arr1)
exp_arr <- exp(arr1)
# Comparison operations
comparison <- arr1 > 5 # Returns logical array
Apply Functions Across Dimensions
The apply() function is crucial for array operations, allowing computation across specific dimensions.
arr <- array(1:24, dim = c(4, 3, 2))
# Sum across first dimension (rows)
apply(arr, c(2, 3), sum) # Returns 3x2 matrix
# Mean across second dimension (columns)
apply(arr, c(1, 3), mean) # Returns 4x2 matrix
# Max across third dimension
apply(arr, c(1, 2), max) # Returns 4x3 matrix
# Custom function
apply(arr, 3, function(x) sum(x^2))
# Multiple statistics
apply(arr, 3, summary)
Practical Example: Multi-Year Sales Analysis
# Create sales data: Products x Regions x Years
products <- c("Product_A", "Product_B", "Product_C")
regions <- c("East", "West", "North", "South")
years <- c("2022", "2023", "2024")
set.seed(123)
sales_data <- array(
data = sample(1000:5000, 36),
dim = c(3, 4, 3),
dimnames = list(
Product = products,
Region = regions,
Year = years
)
)
# Total sales by year
yearly_totals <- apply(sales_data, 3, sum)
print(yearly_totals)
# Average sales by product across all regions and years
product_avg <- apply(sales_data, 1, mean)
print(product_avg)
# Best performing region per year
best_region_per_year <- apply(sales_data, 3, function(year_data) {
region_totals <- apply(year_data, 2, sum)
names(which.max(region_totals))
})
print(best_region_per_year)
# Year-over-year growth by product
yoy_growth <- function(product_idx) {
product_sales <- sales_data[product_idx, , ]
yearly_sales <- apply(product_sales, 2, sum)
growth <- diff(yearly_sales) / yearly_sales[-length(yearly_sales)] * 100
names(growth) <- paste(years[-length(years)], "to", years[-1])
growth
}
sapply(1:3, yoy_growth)
Reshaping and Transforming Arrays
arr <- array(1:24, dim = c(4, 3, 2))
# Convert to vector
as.vector(arr)
# Convert to matrix (flattens last dimension)
matrix(arr, nrow = 4)
# Transpose dimensions
aperm(arr, c(2, 1, 3)) # Swap first two dimensions
# Reshape with different dimensions
dim(arr) <- c(2, 6, 2) # Must preserve total elements
print(arr)
# Convert to data frame for analysis
df <- as.data.frame.table(arr)
names(df) <- c("Dim1", "Dim2", "Dim3", "Value")
head(df)
Performance Considerations
# Pre-allocate arrays for better performance
n <- 1000
result <- array(0, dim = c(n, n, 10))
# Vectorized operations are faster than loops
system.time({
arr1 <- array(rnorm(1000000), dim = c(100, 100, 100))
arr2 <- arr1 * 2 + 5 # Vectorized
})
# Use apply instead of nested loops
arr <- array(1:1000000, dim = c(100, 100, 100))
system.time({
result <- apply(arr, c(1, 2), mean)
})
# Memory-efficient subsetting
large_arr <- array(rnorm(10000000), dim = c(100, 100, 1000))
subset <- large_arr[, , 1:10] # Only extracts needed data
Arrays provide essential functionality for multi-dimensional data analysis in R. Their integration with vectorization and the apply family makes them powerful tools for statistical computing, scientific simulations, and data transformations requiring more than two dimensions.