Circular Queue: Ring Buffer Queue Implementation

Key Insights

Circular queues solve the wasted space problem of linear queues by reusing array slots through modulo arithmetic, providing O(1) operations with zero memory allocation overhead during use.
The “full vs empty” ambiguity when front equals rear is a fundamental design decision—choose between wasting one slot, tracking a count, or using a flag based on your memory and performance constraints.
Ring buffers shine in real-time systems, streaming applications, and producer-consumer patterns where predictable memory usage and cache-friendly access patterns matter more than dynamic sizing.

Introduction to Circular Queues

A circular queue, often called a ring buffer, is a fixed-size queue implementation that treats the underlying array as if the end connects back to the beginning. The “ring” metaphor is apt: imagine wrapping a linear array into a circle where position 0 follows the last position.

Standard linear queues have an annoying problem. After several enqueue and dequeue operations, the front pointer advances through the array, leaving “dead” slots behind that can never be reused. Eventually, the queue reports itself as full even when most slots are empty. You’re forced to either shift all elements back to the start (expensive) or waste memory.

Circular queues eliminate this entirely. When a pointer reaches the end of the array, it wraps around to the beginning. Every slot gets reused. No shifting, no wasted space, no dynamic memory allocation during normal operation.

How Circular Queues Work

The magic happens through modulo arithmetic. Instead of incrementing pointers linearly, you increment and wrap:

next_position = (current_position + 1) % capacity

Two pointers track the queue state: front points to the oldest element (next to dequeue), and rear points to where the next element will be inserted. As elements flow through, both pointers chase each other around the ring.

Here’s the basic structure:

typedef struct {
    int *buffer;
    int front;
    int rear;
    int capacity;
} CircularQueue;

CircularQueue* createQueue(int capacity) {
    CircularQueue *queue = malloc(sizeof(CircularQueue));
    queue->buffer = malloc(sizeof(int) * capacity);
    queue->front = 0;
    queue->rear = 0;
    queue->capacity = capacity;
    return queue;
}

This structure uses the “waste one slot” approach (covered below), where we allocate capacity slots but only use capacity - 1 for data. The alternative is tracking a separate count.

Core Operations Implementation

Every operation runs in O(1) time—no loops, no memory allocation, just pointer arithmetic and array access.

#include <stdbool.h>
#include <stdlib.h>

typedef struct {
    int *buffer;
    int front;
    int rear;
    int capacity;
    int count;  // Using count approach for clarity
} CircularQueue;

CircularQueue* createQueue(int capacity) {
    CircularQueue *queue = malloc(sizeof(CircularQueue));
    queue->buffer = malloc(sizeof(int) * capacity);
    queue->front = 0;
    queue->rear = 0;
    queue->capacity = capacity;
    queue->count = 0;
    return queue;
}

bool isEmpty(CircularQueue *queue) {
    return queue->count == 0;
}

bool isFull(CircularQueue *queue) {
    return queue->count == queue->capacity;
}

bool enqueue(CircularQueue *queue, int value) {
    if (isFull(queue)) {
        return false;  // Queue overflow
    }
    queue->buffer[queue->rear] = value;
    queue->rear = (queue->rear + 1) % queue->capacity;
    queue->count++;
    return true;
}

bool dequeue(CircularQueue *queue, int *value) {
    if (isEmpty(queue)) {
        return false;  // Queue underflow
    }
    *value = queue->buffer[queue->front];
    queue->front = (queue->front + 1) % queue->capacity;
    queue->count--;
    return true;
}

bool peek(CircularQueue *queue, int *value) {
    if (isEmpty(queue)) {
        return false;
    }
    *value = queue->buffer[queue->front];
    return true;
}

void destroyQueue(CircularQueue *queue) {
    free(queue->buffer);
    free(queue);
}

The modulo operation (pointer + 1) % capacity is the key. When rear is at position 7 in an 8-slot buffer, (7 + 1) % 8 gives 0, wrapping back to the start.

Handling Full vs Empty States

Here’s the classic circular queue puzzle: if front == rear, is the queue empty or full? Both states look identical when you only track two pointers.

Three solutions exist, each with trade-offs:

Approach 1: Waste One Slot

Keep one slot permanently empty. The queue is full when (rear + 1) % capacity == front. Empty when front == rear.

// Waste-one-slot approach
bool isFull_WasteSlot(CircularQueue *queue) {
    return (queue->rear + 1) % queue->capacity == queue->front;
}

bool isEmpty_WasteSlot(CircularQueue *queue) {
    return queue->front == queue->rear;
}

// Note: actual capacity is (allocated_size - 1)

Approach 2: Track Count

Maintain a separate counter. Cleaner logic, uses full capacity, but requires updating count on every operation.

// Count-based approach (shown in main implementation above)
bool isFull_Count(CircularQueue *queue) {
    return queue->count == queue->capacity;
}

bool isEmpty_Count(CircularQueue *queue) {
    return queue->count == 0;
}

Approach 3: Boolean Flag

Use a flag that’s set on enqueue and cleared on dequeue. Check it when front == rear.

// Flag-based approach
typedef struct {
    int *buffer;
    int front;
    int rear;
    int capacity;
    bool lastOpWasEnqueue;
} CircularQueueFlag;

bool isFull_Flag(CircularQueueFlag *queue) {
    return queue->front == queue->rear && queue->lastOpWasEnqueue;
}

bool isEmpty_Flag(CircularQueueFlag *queue) {
    return queue->front == queue->rear && !queue->lastOpWasEnqueue;
}

My recommendation: use the count approach. The extra integer is negligible, the logic is clearest, and you get O(1) size queries for free. The waste-one-slot approach only makes sense when you’re extremely memory-constrained and the slot size is large.

Real-World Applications

Ring buffers appear everywhere in systems programming:

Audio/Video Streaming: Producers write frames at capture rate while consumers read at playback rate. The buffer absorbs timing jitter without allocating memory during playback.

Keyboard Input Buffers: Your OS uses a ring buffer to store keystrokes between hardware interrupts and application reads.

Network Packet Handling: NICs often use ring buffers for DMA transfers. The kernel and hardware share a circular buffer, avoiding copies.

Log Rotation: Keep the last N log entries in memory without growing indefinitely.

Here’s a simple producer-consumer simulation:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

CircularQueue *sharedQueue;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t notEmpty = PTHREAD_COND_INITIALIZER;
pthread_cond_t notFull = PTHREAD_COND_INITIALIZER;

void* producer(void *arg) {
    for (int i = 0; i < 20; i++) {
        pthread_mutex_lock(&mutex);
        
        while (isFull(sharedQueue)) {
            pthread_cond_wait(&notFull, &mutex);
        }
        
        enqueue(sharedQueue, i);
        printf("Produced: %d\n", i);
        
        pthread_cond_signal(&notEmpty);
        pthread_mutex_unlock(&mutex);
        
        usleep(50000);  // Simulate work
    }
    return NULL;
}

void* consumer(void *arg) {
    for (int i = 0; i < 20; i++) {
        pthread_mutex_lock(&mutex);
        
        while (isEmpty(sharedQueue)) {
            pthread_cond_wait(&notEmpty, &mutex);
        }
        
        int value;
        dequeue(sharedQueue, &value);
        printf("Consumed: %d\n", value);
        
        pthread_cond_signal(&notFull);
        pthread_mutex_unlock(&mutex);
        
        usleep(100000);  // Slower consumer
    }
    return NULL;
}

int main() {
    sharedQueue = createQueue(5);
    
    pthread_t prod, cons;
    pthread_create(&prod, NULL, producer, NULL);
    pthread_create(&cons, NULL, consumer, NULL);
    
    pthread_join(prod, NULL);
    pthread_join(cons, NULL);
    
    destroyQueue(sharedQueue);
    return 0;
}

The producer runs faster than the consumer. The ring buffer absorbs bursts up to its capacity, then the producer blocks. No memory allocation happens during the steady state.

Thread-Safe Implementation Considerations

The mutex-based approach above works but creates contention. For high-throughput scenarios, lock-free ring buffers using atomic operations are preferred.

A basic thread-safe wrapper:

#include <pthread.h>

typedef struct {
    CircularQueue *queue;
    pthread_mutex_t lock;
} ThreadSafeQueue;

ThreadSafeQueue* createThreadSafeQueue(int capacity) {
    ThreadSafeQueue *tsq = malloc(sizeof(ThreadSafeQueue));
    tsq->queue = createQueue(capacity);
    pthread_mutex_init(&tsq->lock, NULL);
    return tsq;
}

bool tsEnqueue(ThreadSafeQueue *tsq, int value) {
    pthread_mutex_lock(&tsq->lock);
    bool result = enqueue(tsq->queue, value);
    pthread_mutex_unlock(&tsq->lock);
    return result;
}

bool tsDequeue(ThreadSafeQueue *tsq, int *value) {
    pthread_mutex_lock(&tsq->lock);
    bool result = dequeue(tsq->queue, value);
    pthread_mutex_unlock(&tsq->lock);
    return result;
}

True lock-free implementations use atomic compare-and-swap operations on the front and rear pointers. They’re significantly more complex—handling the ABA problem, memory ordering, and platform-specific atomics. Libraries like liblfds or Boost.Lockfree provide battle-tested implementations.

For single-producer, single-consumer scenarios (SPSC), lock-free is straightforward: the producer only modifies rear, the consumer only modifies front, and proper memory barriers ensure visibility.

Performance and Trade-offs

Benefits:

O(1) everything: Enqueue, dequeue, peek, isEmpty, isFull—all constant time.
Zero allocation: After initialization, no memory operations occur. Predictable latency.
Cache-friendly: Sequential array access patterns. Data stays in cache lines.
Bounded memory: You know exactly how much memory you’re using.

Limitations:

Fixed size: You must know the maximum capacity upfront. Resizing requires allocating a new buffer and copying.
Resize is expensive: O(n) copy operation, and you need to handle the wrap-around correctly.
Wastes space if underutilized: If your queue rarely fills, you’re paying for unused capacity.

When to use alternatives:

Use a linked-list queue when size is unpredictable and you can tolerate allocation overhead. Use a dynamic array-based queue (like C++’s std::deque) when you need growth but want better cache performance than linked lists.

Use circular queues when you need predictable memory usage, real-time guarantees, or are implementing low-level system components. They’re the right choice for embedded systems, audio processing, network drivers, and any scenario where allocation during operation is unacceptable.

The ring buffer is a fundamental data structure that every systems programmer should understand deeply. Its simplicity—just an array with two pointers and modulo arithmetic—belies its power and ubiquity in production systems.