Python - Iterators vs Iterables | Application Architect

Key Insights

Iterables are objects that implement __iter__() and return an iterator, while iterators implement both __iter__() and __next__() to track iteration state
Every iterator is an iterable, but not every iterable is an iterator—this distinction matters for memory efficiency and reusability
Understanding the iterator protocol prevents common bugs like iterator exhaustion and enables building custom iteration patterns for domain-specific data structures

The Iterator Protocol Explained

Python’s iteration mechanism relies on two magic methods: __iter__() and __next__(). An iterable is any object that implements __iter__(), which returns an iterator. An iterator is an object that implements both __iter__() (returning itself) and __next__() (returning the next value or raising StopIteration).

# Iterable example: list
numbers = [1, 2, 3]
print(hasattr(numbers, '__iter__'))  # True
print(hasattr(numbers, '__next__'))  # False

# Get an iterator from the iterable
iterator = iter(numbers)
print(hasattr(iterator, '__iter__'))  # True
print(hasattr(iterator, '__next__'))  # True

# Manual iteration
print(next(iterator))  # 1
print(next(iterator))  # 2
print(next(iterator))  # 3
# next(iterator)  # Raises StopIteration

When you use a for loop, Python calls iter() on the object to get an iterator, then repeatedly calls next() until StopIteration is raised:

# What happens behind the scenes
numbers = [1, 2, 3]
iterator = iter(numbers)
while True:
    try:
        item = next(iterator)
        print(item)
    except StopIteration:
        break

Iterator Exhaustion: A Critical Difference

Iterators maintain state and can only be traversed once. Iterables can be iterated multiple times because each call to iter() returns a fresh iterator.

# Iterable: can iterate multiple times
numbers_list = [1, 2, 3]
print(sum(numbers_list))  # 6
print(sum(numbers_list))  # 6 - works fine

# Iterator: single-use
numbers_iter = iter([1, 2, 3])
print(sum(numbers_iter))  # 6
print(sum(numbers_iter))  # 0 - exhausted!

# File objects are iterators
with open('data.txt', 'w') as f:
    f.write('line1\nline2\nline3')

with open('data.txt') as f:
    lines1 = list(f)
    lines2 = list(f)  # Empty! Iterator exhausted
    print(len(lines1))  # 3
    print(len(lines2))  # 0

This behavior catches developers off-guard when passing iterators to multiple functions:

def process_data(data):
    total = sum(data)
    count = len(list(data))  # Problem if data is an iterator
    return total / count

# Works with iterables
result = process_data([1, 2, 3, 4])  # 2.5

# Fails with iterators
iterator = iter([1, 2, 3, 4])
result = process_data(iterator)  # Division by zero!

Building Custom Iterators

Custom iterators enable lazy evaluation and memory-efficient data processing. Implement __iter__() and __next__() to create your own:

class Countdown:
    def __init__(self, start):
        self.current = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1

counter = Countdown(3)
for num in counter:
    print(num)  # 3, 2, 1

# Iterator is exhausted
for num in counter:
    print(num)  # Nothing prints

For reusable iteration, separate the iterable and iterator classes:

class CountdownIterator:
    def __init__(self, start):
        self.current = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current + 1

class Countdown:
    def __init__(self, start):
        self.start = start
    
    def __iter__(self):
        return CountdownIterator(self.start)

counter = Countdown(3)
print(list(counter))  # [3, 2, 1]
print(list(counter))  # [3, 2, 1] - works again!

Generator Functions: Iterators Made Simple

Generator functions provide syntactic sugar for creating iterators without boilerplate:

def countdown(start):
    while start > 0:
        yield start
        start -= 1

counter = countdown(3)
print(next(counter))  # 3
print(next(counter))  # 2
print(list(counter))  # [1]

# Practical example: chunked file reading
def read_chunks(file_path, chunk_size=1024):
    with open(file_path, 'rb') as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            yield chunk

for chunk in read_chunks('large_file.dat'):
    process(chunk)  # Memory-efficient processing

Generator expressions offer even more concise syntax for simple cases:

# Generator expression (iterator)
squares_gen = (x**2 for x in range(1000000))
print(next(squares_gen))  # 0
print(next(squares_gen))  # 1

# List comprehension (iterable, loads all into memory)
squares_list = [x**2 for x in range(1000000)]

Real-World Application: Database Result Sets

Understanding iterators is crucial when working with database cursors, which return iterators to avoid loading entire result sets into memory:

import sqlite3

conn = sqlite3.connect('database.db')
cursor = conn.cursor()
cursor.execute('SELECT * FROM large_table')

# Wrong: loads everything into memory
all_rows = cursor.fetchall()
for row in all_rows:
    process(row)

# Right: iterate lazily
cursor.execute('SELECT * FROM large_table')
for row in cursor:  # cursor is an iterator
    process(row)
    if should_stop():
        break  # Can stop early without fetching remaining rows

Custom iterator for batched database reads:

class BatchedCursor:
    def __init__(self, cursor, batch_size=1000):
        self.cursor = cursor
        self.batch_size = batch_size
    
    def __iter__(self):
        return self
    
    def __next__(self):
        rows = self.cursor.fetchmany(self.batch_size)
        if not rows:
            raise StopIteration
        return rows

cursor.execute('SELECT * FROM large_table')
batched = BatchedCursor(cursor, batch_size=500)

for batch in batched:
    process_batch(batch)  # Process 500 rows at a time

Performance Implications

Iterators enable lazy evaluation, computing values on-demand rather than upfront:

import time

def slow_squares(n):
    """Iterable that returns a fresh iterator each time"""
    class Iterator:
        def __init__(self):
            self.i = 0
        def __iter__(self):
            return self
        def __next__(self):
            if self.i >= n:
                raise StopIteration
            time.sleep(0.1)  # Simulate expensive computation
            result = self.i ** 2
            self.i += 1
            return result
    return Iterator

# Only computes what's needed
gen = slow_squares(100)
first_five = []
for i, val in enumerate(gen()):
    first_five.append(val)
    if i == 4:
        break  # Only 5 computations, not 100

print(first_five)  # [0, 1, 4, 9, 16]

Memory comparison:

import sys

# List (iterable): stores all values
numbers_list = [i for i in range(1000000)]
print(sys.getsizeof(numbers_list))  # ~8MB

# Generator (iterator): stores only state
numbers_gen = (i for i in range(1000000))
print(sys.getsizeof(numbers_gen))  # ~200 bytes

Common Pitfalls and Solutions

Pitfall 1: Passing iterators to functions expecting iterables

def analyze(data):
    mean = sum(data) / len(list(data))  # Bug if data is iterator
    return mean

# Solution: convert to list or use itertools.tee
from itertools import tee

def analyze(data):
    data1, data2 = tee(data, 2)
    total = sum(data1)
    count = sum(1 for _ in data2)
    return total / count

Pitfall 2: Modifying collections during iteration

# Wrong
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # Skips elements!

# Right: iterate over a copy
numbers = [1, 2, 3, 4, 5]
for num in numbers[:]:
    if num % 2 == 0:
        numbers.remove(num)

Pitfall 3: Not handling StopIteration in manual iteration

# Robust manual iteration
iterator = iter([1, 2, 3])
while True:
    try:
        value = next(iterator)
        process(value)
    except StopIteration:
        break

# Or use sentinel value
iterator = iter([1, 2, 3])
sentinel = object()
while (value := next(iterator, sentinel)) is not sentinel:
    process(value)

The iterator protocol forms the foundation of Python’s iteration model. Mastering the distinction between iterators and iterables enables you to write memory-efficient code, build custom iteration patterns, and avoid subtle bugs in production systems.