Array Data Structure: Complete Guide with Implementations

An array is a contiguous block of memory storing elements of the same type. That's it. This simplicity is precisely what makes arrays powerful.

Key Insights

  • Arrays provide O(1) random access through contiguous memory allocation, making them the foundation for nearly every other data structure and algorithm in computer science.
  • Understanding memory layout and pointer arithmetic transforms arrays from a black box into a predictable, optimizable tool—especially critical for performance-sensitive applications.
  • Most array problems in interviews and production code reduce to a handful of patterns: two-pointer, sliding window, and prefix sums. Master these three, and you’ve covered 80% of array algorithms.

Introduction to Arrays

An array is a contiguous block of memory storing elements of the same type. That’s it. This simplicity is precisely what makes arrays powerful.

When you declare an array, the system allocates a sequential chunk of memory. Each element sits directly next to its neighbors, which means accessing any element takes constant time—you calculate its address with simple arithmetic.

Here’s the time complexity breakdown you need to memorize:

Operation Time Complexity
Access by index O(1)
Search (unsorted) O(n)
Insert at end O(1) amortized
Insert at arbitrary position O(n)
Delete at arbitrary position O(n)

Let’s see basic declarations across languages:

# Python - lists are dynamic arrays
numbers = [1, 2, 3, 4, 5]
fixed_size = [0] * 10  # Pre-allocate 10 elements
// Java - static array
int[] numbers = {1, 2, 3, 4, 5};
int[] fixedSize = new int[10];

// Dynamic array
ArrayList<Integer> dynamic = new ArrayList<>();
// C++ - static array
int numbers[] = {1, 2, 3, 4, 5};
int fixedSize[10];

// Dynamic array
std::vector<int> dynamic = {1, 2, 3, 4, 5};

Memory Layout and How Arrays Work

Understanding memory layout separates competent programmers from those who just use arrays without thinking.

Stack vs Heap Allocation

Static arrays typically live on the stack (fast allocation, automatic cleanup, limited size). Dynamic arrays allocate on the heap (slower allocation, manual or garbage-collected cleanup, virtually unlimited size).

// C: Stack allocation - size must be known at compile time
int stack_array[100];

// C: Heap allocation - size can be determined at runtime
int* heap_array = (int*)malloc(n * sizeof(int));

// Address calculation: base_address + (index * element_size)
// For heap_array[5]: heap_array + (5 * 4) = heap_array + 20 bytes

How Indexing Works

When you write arr[i], the compiler generates: base_address + (i * sizeof(element)). This is why array access is O(1)—it’s just arithmetic.

#include <stdio.h>

int main() {
    int arr[] = {10, 20, 30, 40, 50};
    int* ptr = arr;
    
    // These are equivalent
    printf("%d\n", arr[2]);      // 30
    printf("%d\n", *(ptr + 2));  // 30
    printf("%d\n", *(arr + 2));  // 30
    
    // Memory addresses
    printf("arr[0] at: %p\n", (void*)&arr[0]);
    printf("arr[1] at: %p\n", (void*)&arr[1]);  // 4 bytes later
    
    return 0;
}

Common Array Operations

Let’s implement core operations from scratch. Understanding these builds intuition for more complex algorithms.

class DynamicArray:
    def __init__(self):
        self._capacity = 4
        self._size = 0
        self._data = [None] * self._capacity
    
    def _resize(self, new_capacity):
        """Double capacity when full - amortized O(1) insertion"""
        new_data = [None] * new_capacity
        for i in range(self._size):
            new_data[i] = self._data[i]
        self._data = new_data
        self._capacity = new_capacity
    
    def append(self, value):
        """O(1) amortized"""
        if self._size == self._capacity:
            self._resize(self._capacity * 2)
        self._data[self._size] = value
        self._size += 1
    
    def insert(self, index, value):
        """O(n) - must shift elements"""
        if self._size == self._capacity:
            self._resize(self._capacity * 2)
        # Shift elements right
        for i in range(self._size, index, -1):
            self._data[i] = self._data[i - 1]
        self._data[index] = value
        self._size += 1
    
    def delete(self, index):
        """O(n) - must shift elements"""
        # Shift elements left
        for i in range(index, self._size - 1):
            self._data[i] = self._data[i + 1]
        self._size -= 1
        # Shrink if too empty (optional optimization)
        if self._size < self._capacity // 4:
            self._resize(self._capacity // 2)
    
    def get(self, index):
        """O(1)"""
        if 0 <= index < self._size:
            return self._data[index]
        raise IndexError("Index out of bounds")

The resizing strategy matters. Doubling capacity gives amortized O(1) insertion because you copy n elements after n insertions, averaging to O(1) per operation.

Array Variants and Specialized Types

Multi-dimensional Arrays

2D arrays are arrays of arrays. Memory layout varies: row-major (C, Python) stores rows contiguously; column-major (Fortran, MATLAB) stores columns contiguously.

# 2D array (matrix) operations
def matrix_multiply(A, B):
    rows_a, cols_a = len(A), len(A[0])
    rows_b, cols_b = len(B), len(B[0])
    
    if cols_a != rows_b:
        raise ValueError("Incompatible dimensions")
    
    result = [[0] * cols_b for _ in range(rows_a)]
    
    for i in range(rows_a):
        for j in range(cols_b):
            for k in range(cols_a):
                result[i][j] += A[i][k] * B[k][j]
    
    return result

Circular Buffer

Circular buffers are essential for streaming data, producer-consumer patterns, and fixed-memory queues.

class CircularBuffer:
    def __init__(self, capacity):
        self._buffer = [None] * capacity
        self._capacity = capacity
        self._head = 0  # Read position
        self._tail = 0  # Write position
        self._size = 0
    
    def enqueue(self, value):
        if self._size == self._capacity:
            raise OverflowError("Buffer full")
        self._buffer[self._tail] = value
        self._tail = (self._tail + 1) % self._capacity
        self._size += 1
    
    def dequeue(self):
        if self._size == 0:
            raise IndexError("Buffer empty")
        value = self._buffer[self._head]
        self._head = (self._head + 1) % self._capacity
        self._size -= 1
        return value
    
    def peek(self):
        if self._size == 0:
            raise IndexError("Buffer empty")
        return self._buffer[self._head]

Classic Array Algorithms

These patterns solve the majority of array problems you’ll encounter.

Two-Pointer Technique

def two_sum_sorted(arr, target):
    """Find two numbers that sum to target in sorted array. O(n) time, O(1) space."""
    left, right = 0, len(arr) - 1
    
    while left < right:
        current_sum = arr[left] + arr[right]
        if current_sum == target:
            return [left, right]
        elif current_sum < target:
            left += 1
        else:
            right -= 1
    
    return [-1, -1]

Sliding Window

def max_subarray_sum_k(arr, k):
    """Maximum sum of subarray with exactly k elements. O(n) time."""
    if len(arr) < k:
        return None
    
    window_sum = sum(arr[:k])
    max_sum = window_sum
    
    for i in range(k, len(arr)):
        window_sum += arr[i] - arr[i - k]  # Slide window
        max_sum = max(max_sum, window_sum)
    
    return max_sum

Kadane’s Algorithm (Maximum Subarray)

def max_subarray(arr):
    """Find contiguous subarray with largest sum. O(n) time, O(1) space."""
    max_ending_here = max_so_far = arr[0]
    
    for num in arr[1:]:
        max_ending_here = max(num, max_ending_here + num)
        max_so_far = max(max_so_far, max_ending_here)
    
    return max_so_far

Array Rotation

def rotate_array(arr, k):
    """Rotate array right by k positions. O(n) time, O(1) space."""
    n = len(arr)
    k = k % n  # Handle k > n
    
    def reverse(start, end):
        while start < end:
            arr[start], arr[end] = arr[end], arr[start]
            start += 1
            end -= 1
    
    # Reverse entire array, then reverse each part
    reverse(0, n - 1)
    reverse(0, k - 1)
    reverse(k, n - 1)

Real-World Applications and Trade-offs

When to Use Arrays

Use arrays when you need fast random access and know the approximate size. They excel in:

  • Image processing (pixels as 2D arrays)
  • Lookup tables (constant-time retrieval)
  • Buffers (network packets, audio samples)
  • Any algorithm requiring index-based access

Cache Performance

Arrays crush linked lists in practice because of cache locality. Modern CPUs fetch memory in cache lines (typically 64 bytes). Contiguous array elements often share cache lines; linked list nodes scatter across memory, causing cache misses.

def grayscale_image(pixels):
    """Convert RGB image to grayscale. pixels is height x width x 3."""
    height = len(pixels)
    width = len(pixels[0])
    gray = [[0] * width for _ in range(height)]
    
    for y in range(height):
        for x in range(width):
            r, g, b = pixels[y][x]
            # Luminosity method
            gray[y][x] = int(0.299 * r + 0.587 * g + 0.114 * b)
    
    return gray

Best Practices and Common Pitfalls

Bounds Checking

Off-by-one errors are the most common array bug. Always verify your loop bounds.

# WRONG: IndexError on last iteration
for i in range(len(arr) + 1):
    print(arr[i])

# CORRECT
for i in range(len(arr)):
    print(arr[i])

# DEFENSIVE: Check bounds explicitly for user input
def safe_access(arr, index, default=None):
    if 0 <= index < len(arr):
        return arr[index]
    return default

Python Lists vs NumPy Arrays

Python lists are flexible but slow for numerical work. NumPy arrays are typed, contiguous, and leverage SIMD instructions.

import numpy as np
import time

size = 1_000_000

# Python list
py_list = list(range(size))
start = time.time()
result = [x * 2 for x in py_list]
print(f"Python list: {time.time() - start:.4f}s")

# NumPy array
np_array = np.arange(size)
start = time.time()
result = np_array * 2
print(f"NumPy array: {time.time() - start:.4f}s")

# NumPy is typically 10-100x faster for numerical operations

Choose Static When Possible

If you know the size upfront, prefer static arrays. They avoid reallocation overhead and make memory usage predictable—critical in embedded systems and real-time applications.

Arrays are deceptively simple. Master their memory model, internalize the core patterns, and you’ll have a foundation that supports everything else you build.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.