Semaphore: Counting and Binary Semaphores

Edsger Dijkstra introduced semaphores in 1965 as one of the first synchronization primitives for concurrent programming. The concept is elegantly simple: a semaphore is an integer counter that...

Key Insights

  • Binary semaphores provide mutex-like behavior but lack ownership semantics, meaning any thread can release them—a feature that’s both powerful and dangerous depending on your use case.
  • Counting semaphores excel at resource pooling scenarios where you need to limit concurrent access to N identical resources, such as database connections or API rate limits.
  • The atomic wait (P) and signal (V) operations are the foundation of semaphore correctness; understanding their blocking behavior prevents the majority of concurrency bugs.

Introduction to Semaphores

Edsger Dijkstra introduced semaphores in 1965 as one of the first synchronization primitives for concurrent programming. The concept is elegantly simple: a semaphore is an integer counter that controls access to shared resources through two atomic operations.

Think of a semaphore as a bouncer at a club with a clicker counter. The bouncer tracks how many people are inside. When someone wants to enter, they check the count—if there’s room, they decrement the counter and enter. When someone leaves, they increment the counter. If the club is full, newcomers wait in line.

This mental model scales from simple mutual exclusion (one person allowed) to complex resource management (N concurrent users). Despite being nearly 60 years old, semaphores remain fundamental to operating systems, database connection pools, and rate limiters.

Binary Semaphores

A binary semaphore restricts its counter to values 0 or 1. When the value is 1, the resource is available. When it’s 0, the resource is occupied and threads must wait.

This behavior looks identical to a mutex, but there’s a critical difference: ownership. A mutex must be released by the same thread that acquired it. A binary semaphore has no such restriction—any thread can signal it.

This distinction matters in signaling scenarios. One thread might wait for an event that another thread signals. You can’t do this with a mutex without violating its semantics.

import threading
import time

# Binary semaphore protecting a critical section
resource_semaphore = threading.Semaphore(1)
shared_counter = 0

def increment_counter(thread_id: int, iterations: int) -> None:
    global shared_counter
    for _ in range(iterations):
        resource_semaphore.acquire()  # Wait (P operation)
        try:
            # Critical section - only one thread at a time
            current = shared_counter
            time.sleep(0.0001)  # Simulate some work
            shared_counter = current + 1
        finally:
            resource_semaphore.release()  # Signal (V operation)

# Without the semaphore, this would produce incorrect results
threads = [
    threading.Thread(target=increment_counter, args=(i, 100))
    for i in range(5)
]

for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Final counter: {shared_counter}")  # Always 500 with semaphore

When should you use a binary semaphore over a mutex? Use a mutex when you need ownership semantics and potentially recursive locking. Use a binary semaphore when you need cross-thread signaling or when the “releaser” differs from the “acquirer.”

Counting Semaphores

Counting semaphores extend the concept by allowing values greater than 1. The counter represents the number of available resource units. This is where semaphores truly shine.

Common use cases include:

  • Connection pools: Limit database connections to N
  • Rate limiting: Allow N requests per time window
  • Bounded buffers: Control producer/consumer with fixed capacity
  • Thread pools: Limit concurrent worker threads
import threading
import time
import random
from contextlib import contextmanager

class ConnectionPool:
    def __init__(self, max_connections: int):
        self._semaphore = threading.Semaphore(max_connections)
        self._max = max_connections
        self._active = 0
        self._lock = threading.Lock()
    
    @contextmanager
    def acquire_connection(self):
        self._semaphore.acquire()
        with self._lock:
            self._active += 1
            conn_id = self._active
        
        try:
            print(f"Connection {conn_id} acquired (active: {self._active}/{self._max})")
            yield f"connection_{conn_id}"
        finally:
            with self._lock:
                self._active -= 1
            self._semaphore.release()
            print(f"Connection {conn_id} released (active: {self._active}/{self._max})")

# Simulate 10 workers competing for 3 connections
pool = ConnectionPool(max_connections=3)

def worker(worker_id: int) -> None:
    with pool.acquire_connection() as conn:
        # Simulate database work
        work_time = random.uniform(0.1, 0.5)
        time.sleep(work_time)

threads = [threading.Thread(target=worker, args=(i,)) for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()

The semaphore guarantees that no more than 3 connections are active simultaneously. Threads 4-10 block on acquire() until earlier threads release their connections.

Core Operations: Wait and Signal

Dijkstra named the operations P (from Dutch “proberen,” to test) and V (“verhogen,” to increment). Modern APIs use clearer names: wait/acquire and signal/release.

Here’s the precise semantics:

Wait (P / acquire):
    atomically:
        while counter == 0:
            block current thread
        counter = counter - 1

Signal (V / release):
    atomically:
        counter = counter + 1
        if any threads are waiting:
            wake one thread

The atomicity is crucial. Without it, two threads could both see counter == 1, both decrement it, and both enter the critical section. The operating system or runtime guarantees these operations are indivisible.

# Pseudocode demonstrating the logical behavior
class Semaphore:
    def __init__(self, initial: int):
        self.counter = initial
        self.waiting_queue = []
        self.internal_lock = Lock()  # For atomic operations
    
    def acquire(self):
        with self.internal_lock:
            while self.counter == 0:
                # Add current thread to wait queue
                # Release internal lock and sleep
                # Re-acquire internal lock when woken
                self._wait()
            self.counter -= 1
    
    def release(self):
        with self.internal_lock:
            self.counter += 1
            if self.waiting_queue:
                self._wake_one()

One subtle point: release() doesn’t check if the calling thread previously called acquire(). This is by design but can cause bugs if you release without acquiring—the counter grows beyond its intended maximum.

Classic Problems Solved with Semaphores

The producer-consumer problem is the canonical semaphore example. Producers create items and add them to a bounded buffer. Consumers remove items. We need to prevent buffer overflow, underflow, and race conditions.

import threading
import queue
import time
import random

class BoundedBuffer:
    def __init__(self, capacity: int):
        self._buffer = queue.Queue(maxsize=capacity)
        self._empty_slots = threading.Semaphore(capacity)  # Available slots
        self._filled_slots = threading.Semaphore(0)        # Items to consume
        self._mutex = threading.Semaphore(1)               # Buffer access
    
    def produce(self, item) -> None:
        self._empty_slots.acquire()  # Wait for empty slot
        
        self._mutex.acquire()
        try:
            self._buffer.put_nowait(item)
            print(f"Produced: {item} (size: {self._buffer.qsize()})")
        finally:
            self._mutex.release()
        
        self._filled_slots.release()  # Signal item available
    
    def consume(self):
        self._filled_slots.acquire()  # Wait for item
        
        self._mutex.acquire()
        try:
            item = self._buffer.get_nowait()
            print(f"Consumed: {item} (size: {self._buffer.qsize()})")
        finally:
            self._mutex.release()
        
        self._empty_slots.release()  # Signal slot available
        return item

buffer = BoundedBuffer(capacity=5)
stop_flag = threading.Event()

def producer(producer_id: int) -> None:
    count = 0
    while not stop_flag.is_set():
        item = f"P{producer_id}-{count}"
        buffer.produce(item)
        count += 1
        time.sleep(random.uniform(0.1, 0.3))

def consumer(consumer_id: int) -> None:
    while not stop_flag.is_set():
        try:
            buffer.consume()
            time.sleep(random.uniform(0.2, 0.4))
        except Exception:
            break

# Start 2 producers and 3 consumers
producers = [threading.Thread(target=producer, args=(i,)) for i in range(2)]
consumers = [threading.Thread(target=consumer, args=(i,)) for i in range(3)]

for t in producers + consumers:
    t.start()

time.sleep(2)
stop_flag.set()

This uses three semaphores: empty_slots tracks available buffer space, filled_slots tracks items ready for consumption, and mutex protects the buffer itself. The elegance is that producers automatically block when the buffer is full, and consumers block when it’s empty.

Common Pitfalls and Best Practices

Forgetting to release: The most common bug. Always use try/finally or context managers.

# Bad
semaphore.acquire()
do_work()  # If this throws, semaphore is never released
semaphore.release()

# Good
semaphore.acquire()
try:
    do_work()
finally:
    semaphore.release()

# Best (if your language supports it)
with semaphore:
    do_work()

Deadlock from acquire order: When using multiple semaphores, always acquire them in a consistent global order.

# Deadlock risk: Thread 1 holds A, waits for B
#                Thread 2 holds B, waits for A

# Solution: Always acquire in same order (e.g., alphabetically)
sem_a.acquire()
sem_b.acquire()
try:
    work()
finally:
    sem_b.release()
    sem_a.release()

Over-releasing: Releasing more times than acquiring inflates the counter beyond its intended maximum. This silently corrupts your concurrency guarantees.

Priority inversion: A low-priority thread holding a semaphore can block high-priority threads. Some real-time systems use priority inheritance to mitigate this, but standard semaphores don’t.

Language-Specific Implementations

Different languages offer varying semaphore APIs. Here’s a quick reference:

# Python: threading.Semaphore
import threading

sem = threading.Semaphore(3)
sem.acquire()
sem.release()

# With timeout
acquired = sem.acquire(timeout=5.0)

# Context manager
with sem:
    do_work()
// Java: java.util.concurrent.Semaphore
import java.util.concurrent.Semaphore;

Semaphore sem = new Semaphore(3);

sem.acquire();          // Blocking
sem.tryAcquire();       // Non-blocking, returns boolean
sem.tryAcquire(5, TimeUnit.SECONDS);  // With timeout
sem.release();

// Fair semaphore (FIFO ordering)
Semaphore fairSem = new Semaphore(3, true);
// Go: golang.org/x/sync/semaphore (weighted semaphore)
import "golang.org/x/sync/semaphore"

sem := semaphore.NewWeighted(3)

// Acquire with context for cancellation
ctx := context.Background()
sem.Acquire(ctx, 1)  // Acquire 1 unit
sem.Release(1)

// Try acquire (non-blocking)
if sem.TryAcquire(1) {
    defer sem.Release(1)
    doWork()
}
// POSIX C: sem_t
#include <semaphore.h>

sem_t sem;
sem_init(&sem, 0, 3);  // 0 = thread-shared, 3 = initial value

sem_wait(&sem);        // Blocking acquire
sem_trywait(&sem);     // Non-blocking
sem_post(&sem);        // Release

sem_destroy(&sem);

Go’s weighted semaphore is particularly interesting—you can acquire and release multiple units atomically, which is useful for resources with varying “costs.”

Semaphores are one of the oldest synchronization primitives, yet they remain relevant because they solve a fundamental problem elegantly: controlling concurrent access to limited resources. Master the wait/signal semantics, respect the acquire/release pairing, and you’ll find semaphores are a reliable tool for building correct concurrent systems.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.