Python Threading: Concurrent Execution

Key Insights

Python threading excels at I/O-bound tasks like network requests or file operations, but the Global Interpreter Lock (GIL) prevents true parallel execution of CPU-bound code across multiple threads.
Race conditions are inevitable when multiple threads access shared data—always protect mutable state with locks or use thread-safe data structures like Queue.
ThreadPoolExecutor provides a cleaner, more maintainable interface than raw threads for most concurrent workloads, with built-in resource management and future-based result handling.

Introduction to Threading Basics

Threading enables concurrent execution within a single process, allowing your Python programs to handle multiple operations simultaneously. Understanding when to use threading requires distinguishing between concurrency and parallelism.

Concurrency means managing multiple tasks that make progress during overlapping time periods. Parallelism means executing multiple tasks simultaneously on different CPU cores. Python’s Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time, which means threading provides concurrency but not true parallelism for CPU-bound work.

Threading shines for I/O-bound tasks—operations that spend time waiting for external resources like network responses, disk reads, or database queries. While one thread waits for I/O, other threads can execute, dramatically improving throughput.

Here’s a concrete comparison:

import time
import threading
import requests

def fetch_url(url):
    response = requests.get(url)
    return len(response.content)

urls = [
    'https://www.python.org',
    'https://www.github.com',
    'https://www.stackoverflow.com',
    'https://www.reddit.com'
]

# Single-threaded approach
start = time.time()
for url in urls:
    fetch_url(url)
single_threaded_time = time.time() - start

# Multi-threaded approach
start = time.time()
threads = []
for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()
multi_threaded_time = time.time() - start

print(f"Single-threaded: {single_threaded_time:.2f}s")
print(f"Multi-threaded: {multi_threaded_time:.2f}s")

The multi-threaded version typically completes 3-4x faster because threads can wait for network I/O concurrently.

Creating and Starting Threads

Python’s threading module provides the Thread class for creating threads. You can create threads two ways: passing a target function or subclassing Thread.

The target function approach is straightforward and preferred for simple cases:

import threading
import time

def worker(name, delay):
    print(f"Thread {name} starting")
    time.sleep(delay)
    print(f"Thread {name} finished after {delay}s")

# Create threads with target function
t1 = threading.Thread(target=worker, args=("A", 2))
t2 = threading.Thread(target=worker, args=("B", 1))

# Start threads
t1.start()
t2.start()

# Wait for completion
t1.join()
t2.join()

print("All threads completed")

For more complex behavior, subclass Thread and override the run() method:

import threading
import time

class WorkerThread(threading.Thread):
    def __init__(self, name, delay):
        super().__init__()
        self.worker_name = name
        self.delay = delay
        self.result = None
    
    def run(self):
        print(f"Worker {self.worker_name} starting")
        time.sleep(self.delay)
        self.result = f"Processed by {self.worker_name}"
        print(f"Worker {self.worker_name} finished")

# Create and start threads
threads = [WorkerThread("Worker-1", 1), WorkerThread("Worker-2", 2)]
for t in threads:
    t.start()

for t in threads:
    t.join()
    print(f"Result: {t.result}")

The start() method initiates the thread, which calls run() in a separate execution context. The join() method blocks until the thread completes, ensuring proper cleanup.

Thread Synchronization with Locks

When multiple threads access shared mutable state, race conditions occur. Consider this broken counter:

import threading

counter = 0

def increment():
    global counter
    for _ in range(100000):
        counter += 1

threads = [threading.Thread(target=increment) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Counter: {counter}")  # Expected: 500000, Actual: varies!

The counter will likely be less than 500,000 because counter += 1 isn’t atomic—it involves reading, incrementing, and writing. Multiple threads can read the same value before any writes occur.

Fix this with a Lock:

import threading

counter = 0
counter_lock = threading.Lock()

def increment():
    global counter
    for _ in range(100000):
        with counter_lock:
            counter += 1

threads = [threading.Thread(target=increment) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Counter: {counter}")  # Always 500000

The with counter_lock statement acquires the lock before entering the block and releases it afterward, ensuring only one thread modifies the counter at a time.

Use RLock (reentrant lock) when a thread needs to acquire the same lock multiple times:

import threading

class BankAccount:
    def __init__(self):
        self.balance = 0
        self.lock = threading.RLock()
    
    def deposit(self, amount):
        with self.lock:
            self.balance += amount
    
    def transfer_from(self, other_account, amount):
        with self.lock:  # Acquires lock for this account
            with other_account.lock:  # Acquires lock for other account
                self.balance -= amount
                other_account.deposit(amount)  # deposit() also acquires lock

Thread Communication and Coordination

Threads often need to coordinate actions or exchange data. Python provides several synchronization primitives.

The Queue class provides thread-safe FIFO data exchange, perfect for producer-consumer patterns:

import threading
import queue
import time
import random

def producer(q, producer_id):
    for i in range(5):
        item = f"Item-{producer_id}-{i}"
        time.sleep(random.uniform(0.1, 0.5))
        q.put(item)
        print(f"Producer {producer_id} produced {item}")
    q.put(None)  # Sentinel value

def consumer(q, consumer_id):
    while True:
        item = q.get()
        if item is None:
            q.put(None)  # Re-add sentinel for other consumers
            break
        time.sleep(random.uniform(0.1, 0.3))
        print(f"Consumer {consumer_id} consumed {item}")
        q.task_done()

# Create queue
work_queue = queue.Queue()

# Start producers and consumers
producers = [threading.Thread(target=producer, args=(work_queue, i)) 
             for i in range(2)]
consumers = [threading.Thread(target=consumer, args=(work_queue, i)) 
             for i in range(3)]

for t in producers + consumers:
    t.start()

for t in producers + consumers:
    t.join()

Event objects allow threads to signal each other:

import threading
import time

event = threading.Event()

def waiter():
    print("Waiting for event...")
    event.wait()  # Blocks until event is set
    print("Event received, proceeding!")

def setter():
    time.sleep(2)
    print("Setting event")
    event.set()

threading.Thread(target=waiter).start()
threading.Thread(target=setter).start()

Thread Pools with concurrent.futures

Managing individual threads becomes unwieldy for large workloads. ThreadPoolExecutor provides a high-level interface for thread pools:

from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
import time

def fetch_url(url):
    start = time.time()
    response = requests.get(url)
    duration = time.time() - start
    return {
        'url': url,
        'status': response.status_code,
        'size': len(response.content),
        'duration': duration
    }

urls = [
    'https://www.python.org',
    'https://www.github.com',
    'https://www.stackoverflow.com',
    'https://www.reddit.com',
    'https://www.wikipedia.org'
]

# Using map() for ordered results
with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(fetch_url, urls)
    for result in results:
        print(f"{result['url']}: {result['status']} ({result['size']} bytes)")

# Using submit() for more control
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {executor.submit(fetch_url, url): url for url in urls}
    
    for future in as_completed(futures):
        url = futures[future]
        try:
            result = future.result()
            print(f"Completed {url}: {result['duration']:.2f}s")
        except Exception as e:
            print(f"Failed {url}: {e}")

The context manager ensures proper cleanup. map() maintains input order while submit() with as_completed() processes results as they finish.

Best Practices and Common Pitfalls

Always use context managers with ThreadPoolExecutor to ensure threads are properly cleaned up:

from concurrent.futures import ThreadPoolExecutor

# Good: Context manager handles cleanup
with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(process_item, items)

# Avoid: Manual cleanup required
executor = ThreadPoolExecutor(max_workers=4)
results = executor.map(process_item, items)
executor.shutdown(wait=True)  # Easy to forget

Handle exceptions in threads properly—unhandled exceptions are silently swallowed:

import threading
import traceback

def risky_operation():
    try:
        # Your code here
        raise ValueError("Something went wrong")
    except Exception:
        traceback.print_exc()  # Print the exception

thread = threading.Thread(target=risky_operation)
thread.start()
thread.join()

Understand daemon threads—they terminate when the main program exits:

import threading
import time

def background_task():
    while True:
        print("Working...")
        time.sleep(1)

# Daemon thread won't prevent program exit
t = threading.Thread(target=background_task, daemon=True)
t.start()
time.sleep(3)  # Program exits after 3 seconds

Choose the right concurrency model. Use threading for I/O-bound tasks, multiprocessing for CPU-bound work, and asyncio for high-concurrency I/O with less overhead. Threading adds memory overhead (each thread needs its own stack) and introduces complexity through shared state.

Avoid deadlocks by always acquiring locks in the same order and using timeouts:

lock1 = threading.Lock()
lock2 = threading.Lock()

def safe_operation():
    # Always acquire locks in the same order
    with lock1:
        with lock2:
            # Critical section
            pass

Threading is powerful for I/O-bound concurrency, but requires careful attention to synchronization and resource management. Start with ThreadPoolExecutor for most use cases, and only drop down to raw threads when you need fine-grained control.