Python - Iterators vs Iterables
Python's iteration mechanism relies on two magic methods: `__iter__()` and `__next__()`. An **iterable** is any object that implements `__iter__()`, which returns an iterator. An **iterator** is an...
Key Insights
- Iterables are objects that implement
__iter__()and return an iterator, while iterators implement both__iter__()and__next__()to track iteration state - Every iterator is an iterable, but not every iterable is an iterator—this distinction matters for memory efficiency and reusability
- Understanding the iterator protocol prevents common bugs like iterator exhaustion and enables building custom iteration patterns for domain-specific data structures
The Iterator Protocol Explained
Python’s iteration mechanism relies on two magic methods: __iter__() and __next__(). An iterable is any object that implements __iter__(), which returns an iterator. An iterator is an object that implements both __iter__() (returning itself) and __next__() (returning the next value or raising StopIteration).
# Iterable example: list
numbers = [1, 2, 3]
print(hasattr(numbers, '__iter__')) # True
print(hasattr(numbers, '__next__')) # False
# Get an iterator from the iterable
iterator = iter(numbers)
print(hasattr(iterator, '__iter__')) # True
print(hasattr(iterator, '__next__')) # True
# Manual iteration
print(next(iterator)) # 1
print(next(iterator)) # 2
print(next(iterator)) # 3
# next(iterator) # Raises StopIteration
When you use a for loop, Python calls iter() on the object to get an iterator, then repeatedly calls next() until StopIteration is raised:
# What happens behind the scenes
numbers = [1, 2, 3]
iterator = iter(numbers)
while True:
try:
item = next(iterator)
print(item)
except StopIteration:
break
Iterator Exhaustion: A Critical Difference
Iterators maintain state and can only be traversed once. Iterables can be iterated multiple times because each call to iter() returns a fresh iterator.
# Iterable: can iterate multiple times
numbers_list = [1, 2, 3]
print(sum(numbers_list)) # 6
print(sum(numbers_list)) # 6 - works fine
# Iterator: single-use
numbers_iter = iter([1, 2, 3])
print(sum(numbers_iter)) # 6
print(sum(numbers_iter)) # 0 - exhausted!
# File objects are iterators
with open('data.txt', 'w') as f:
f.write('line1\nline2\nline3')
with open('data.txt') as f:
lines1 = list(f)
lines2 = list(f) # Empty! Iterator exhausted
print(len(lines1)) # 3
print(len(lines2)) # 0
This behavior catches developers off-guard when passing iterators to multiple functions:
def process_data(data):
total = sum(data)
count = len(list(data)) # Problem if data is an iterator
return total / count
# Works with iterables
result = process_data([1, 2, 3, 4]) # 2.5
# Fails with iterators
iterator = iter([1, 2, 3, 4])
result = process_data(iterator) # Division by zero!
Building Custom Iterators
Custom iterators enable lazy evaluation and memory-efficient data processing. Implement __iter__() and __next__() to create your own:
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
counter = Countdown(3)
for num in counter:
print(num) # 3, 2, 1
# Iterator is exhausted
for num in counter:
print(num) # Nothing prints
For reusable iteration, separate the iterable and iterator classes:
class CountdownIterator:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
class Countdown:
def __init__(self, start):
self.start = start
def __iter__(self):
return CountdownIterator(self.start)
counter = Countdown(3)
print(list(counter)) # [3, 2, 1]
print(list(counter)) # [3, 2, 1] - works again!
Generator Functions: Iterators Made Simple
Generator functions provide syntactic sugar for creating iterators without boilerplate:
def countdown(start):
while start > 0:
yield start
start -= 1
counter = countdown(3)
print(next(counter)) # 3
print(next(counter)) # 2
print(list(counter)) # [1]
# Practical example: chunked file reading
def read_chunks(file_path, chunk_size=1024):
with open(file_path, 'rb') as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
yield chunk
for chunk in read_chunks('large_file.dat'):
process(chunk) # Memory-efficient processing
Generator expressions offer even more concise syntax for simple cases:
# Generator expression (iterator)
squares_gen = (x**2 for x in range(1000000))
print(next(squares_gen)) # 0
print(next(squares_gen)) # 1
# List comprehension (iterable, loads all into memory)
squares_list = [x**2 for x in range(1000000)]
Real-World Application: Database Result Sets
Understanding iterators is crucial when working with database cursors, which return iterators to avoid loading entire result sets into memory:
import sqlite3
conn = sqlite3.connect('database.db')
cursor = conn.cursor()
cursor.execute('SELECT * FROM large_table')
# Wrong: loads everything into memory
all_rows = cursor.fetchall()
for row in all_rows:
process(row)
# Right: iterate lazily
cursor.execute('SELECT * FROM large_table')
for row in cursor: # cursor is an iterator
process(row)
if should_stop():
break # Can stop early without fetching remaining rows
Custom iterator for batched database reads:
class BatchedCursor:
def __init__(self, cursor, batch_size=1000):
self.cursor = cursor
self.batch_size = batch_size
def __iter__(self):
return self
def __next__(self):
rows = self.cursor.fetchmany(self.batch_size)
if not rows:
raise StopIteration
return rows
cursor.execute('SELECT * FROM large_table')
batched = BatchedCursor(cursor, batch_size=500)
for batch in batched:
process_batch(batch) # Process 500 rows at a time
Performance Implications
Iterators enable lazy evaluation, computing values on-demand rather than upfront:
import time
def slow_squares(n):
"""Iterable that returns a fresh iterator each time"""
class Iterator:
def __init__(self):
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i >= n:
raise StopIteration
time.sleep(0.1) # Simulate expensive computation
result = self.i ** 2
self.i += 1
return result
return Iterator
# Only computes what's needed
gen = slow_squares(100)
first_five = []
for i, val in enumerate(gen()):
first_five.append(val)
if i == 4:
break # Only 5 computations, not 100
print(first_five) # [0, 1, 4, 9, 16]
Memory comparison:
import sys
# List (iterable): stores all values
numbers_list = [i for i in range(1000000)]
print(sys.getsizeof(numbers_list)) # ~8MB
# Generator (iterator): stores only state
numbers_gen = (i for i in range(1000000))
print(sys.getsizeof(numbers_gen)) # ~200 bytes
Common Pitfalls and Solutions
Pitfall 1: Passing iterators to functions expecting iterables
def analyze(data):
mean = sum(data) / len(list(data)) # Bug if data is iterator
return mean
# Solution: convert to list or use itertools.tee
from itertools import tee
def analyze(data):
data1, data2 = tee(data, 2)
total = sum(data1)
count = sum(1 for _ in data2)
return total / count
Pitfall 2: Modifying collections during iteration
# Wrong
numbers = [1, 2, 3, 4, 5]
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # Skips elements!
# Right: iterate over a copy
numbers = [1, 2, 3, 4, 5]
for num in numbers[:]:
if num % 2 == 0:
numbers.remove(num)
Pitfall 3: Not handling StopIteration in manual iteration
# Robust manual iteration
iterator = iter([1, 2, 3])
while True:
try:
value = next(iterator)
process(value)
except StopIteration:
break
# Or use sentinel value
iterator = iter([1, 2, 3])
sentinel = object()
while (value := next(iterator, sentinel)) is not sentinel:
process(value)
The iterator protocol forms the foundation of Python’s iteration model. Mastering the distinction between iterators and iterables enables you to write memory-efficient code, build custom iteration patterns, and avoid subtle bugs in production systems.