How to Iterate Over Rows in Pandas

Row iteration is one of those topics where knowing *how* to do something is less important than knowing *when* to do it. Pandas is built on NumPy, which processes entire arrays in optimized C code....

Key Insights

  • Row iteration in Pandas is almost always the wrong approach—vectorized operations can be 100x faster and should be your default choice
  • When you must iterate, itertuples() outperforms iterrows() by 10-100x because it returns lightweight namedtuples instead of Series objects
  • The apply() method with axis=1 offers a middle ground, but it’s still just a dressed-up loop and won’t match vectorized performance

Row iteration is one of those topics where knowing how to do something is less important than knowing when to do it. Pandas is built on NumPy, which processes entire arrays in optimized C code. The moment you write a Python for loop over DataFrame rows, you’re throwing away that performance advantage. That said, sometimes iteration is genuinely necessary—complex conditional logic, API calls per row, or operations that depend on previous row state. Let’s cover all your options, from the methods you should avoid to the ones you should reach for first.

Using iterrows() - The Basic Approach

The iterrows() method is what most developers discover first. It yields pairs of (index, Series) for each row, making it intuitive to work with.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'price': [25.99, 49.99, 12.50],
    'quantity': [100, 45, 200]
})

for index, row in df.iterrows():
    print(f"Row {index}: {row['product']} costs ${row['price']}")

Output:

Row 0: Widget costs $25.99
Row 1: Gadget costs $49.99
Row 2: Sprocket costs $12.50

The API is clean and readable. You access columns by name, and the index is right there if you need it. However, iterrows() has two significant problems.

First, it’s slow. Each row gets converted into a Pandas Series object, which involves substantial overhead. Second, it doesn’t preserve dtypes. If you have a DataFrame with integers and floats, iterrows() may convert everything to float64 to fit into a single Series. This can cause subtle bugs:

df = pd.DataFrame({'id': [1, 2, 3], 'value': [1.5, 2.5, 3.5]})

for index, row in df.iterrows():
    print(f"id type: {type(row['id'])}")  # float64, not int64

Use iterrows() only for quick debugging, small datasets where performance doesn’t matter, or when you genuinely need the row as a Series for downstream processing.

Using itertuples() - The Faster Alternative

The itertuples() method returns each row as a namedtuple, which is a lightweight Python object. This eliminates the Series creation overhead and preserves dtypes.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'price': [25.99, 49.99, 12.50],
    'quantity': [100, 45, 200]
})

for row in df.itertuples():
    print(f"Row {row.Index}: {row.product} costs ${row.price}")

Output:

Row 0: Widget costs $25.99
Row 1: Gadget costs $49.99
Row 2: Sprocket costs $12.50

Notice that you access values as attributes (row.product) rather than dictionary-style access. The index is available as row.Index (capital I).

There’s one gotcha: column names that aren’t valid Python identifiers get replaced with positional names. If your column is named unit price or 2024_sales, you’ll need to access it by position:

df = pd.DataFrame({'unit price': [25.99, 49.99]})

for row in df.itertuples():
    # row.unit price won't work
    print(row[1])  # Access by position instead

You can also disable the index and customize the tuple name:

for row in df.itertuples(index=False, name='Product'):
    print(type(row))  # <class '__main__.Product'>

When iteration is unavoidable, itertuples() should be your default choice. It’s consistently 10-100x faster than iterrows().

Using apply() - Row-wise Function Application

The apply() method with axis=1 lets you run a function on each row without writing an explicit loop. It’s cleaner syntax, but don’t be fooled—it’s still iteration under the hood.

import pandas as pd

df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'price': [25.99, 49.99, 12.50],
    'quantity': [100, 45, 200]
})

def calculate_total(row):
    base = row['price'] * row['quantity']
    # Apply bulk discount for large quantities
    if row['quantity'] > 100:
        return base * 0.9
    return base

df['total'] = df.apply(calculate_total, axis=1)
print(df)

Output:

    product  price  quantity    total
0    Widget  25.99       100  2599.00
1    Gadget  49.99        45  2249.55
2  Sprocket  12.50       200  2250.00

The apply() approach shines when you have complex row-wise logic that would be awkward to vectorize. It’s more readable than a loop with manual assignment, and it integrates naturally with method chaining.

For simple operations, you can use lambda functions:

df['revenue'] = df.apply(lambda row: row['price'] * row['quantity'], axis=1)

But here’s the thing: that lambda is slower than the equivalent vectorized operation. Use apply() when the logic genuinely requires row context—like calling external functions, handling complex conditionals, or when readability trumps performance for your use case.

Vectorized Operations - The Preferred Approach

Vectorization means expressing operations on entire columns at once, letting Pandas and NumPy handle the iteration in optimized compiled code. This should be your first instinct for any data transformation.

Let’s rewrite the previous examples without any iteration:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'product': ['Widget', 'Gadget', 'Sprocket'],
    'price': [25.99, 49.99, 12.50],
    'quantity': [100, 45, 200]
})

# Simple calculation - just use column operations
df['revenue'] = df['price'] * df['quantity']

# Conditional logic - use np.where or boolean indexing
df['total'] = np.where(
    df['quantity'] > 100,
    df['price'] * df['quantity'] * 0.9,
    df['price'] * df['quantity']
)

print(df)

Output:

    product  price  quantity  revenue    total
0    Widget  25.99       100  2599.00  2599.00
1    Gadget  49.99        45  2249.55  2249.55
2  Sprocket  12.50       200  2500.00  2250.00

For more complex conditions, use np.select:

conditions = [
    df['quantity'] > 150,
    df['quantity'] > 100,
    df['quantity'] > 50
]
choices = [0.85, 0.90, 0.95]  # Discount multipliers

df['discount_rate'] = np.select(conditions, choices, default=1.0)
df['final_price'] = df['price'] * df['quantity'] * df['discount_rate']

String operations are also vectorized through the .str accessor:

df['product_upper'] = df['product'].str.upper()
df['product_length'] = df['product'].str.len()

The mental shift is this: stop thinking “for each row, do X” and start thinking “take column A, transform it, store in column B.” Once you internalize this pattern, you’ll rarely need explicit iteration.

Performance Comparison

Let’s quantify the performance differences with a realistic benchmark:

import pandas as pd
import numpy as np
import time

# Create a larger dataset
n_rows = 100_000
df = pd.DataFrame({
    'a': np.random.randn(n_rows),
    'b': np.random.randn(n_rows),
    'c': np.random.randint(0, 100, n_rows)
})

def benchmark(name, func):
    start = time.perf_counter()
    func()
    elapsed = time.perf_counter() - start
    print(f"{name}: {elapsed:.4f} seconds")

# Method 1: iterrows()
def using_iterrows():
    result = []
    for idx, row in df.iterrows():
        result.append(row['a'] * row['b'] + row['c'])
    return result

# Method 2: itertuples()
def using_itertuples():
    result = []
    for row in df.itertuples():
        result.append(row.a * row.b + row.c)
    return result

# Method 3: apply()
def using_apply():
    return df.apply(lambda row: row['a'] * row['b'] + row['c'], axis=1)

# Method 4: Vectorized
def using_vectorized():
    return df['a'] * df['b'] + df['c']

benchmark("iterrows()", using_iterrows)
benchmark("itertuples()", using_itertuples)
benchmark("apply()", using_apply)
benchmark("vectorized", using_vectorized)

Typical output on a modern machine:

iterrows(): 4.2341 seconds
itertuples(): 0.1823 seconds
apply(): 1.0567 seconds
vectorized: 0.0012 seconds

The vectorized approach is roughly 3,500x faster than iterrows() and 150x faster than itertuples(). These ratios hold across different operations and dataset sizes. The performance gap only widens as your data grows.

Conclusion

The decision tree is straightforward:

  1. Try vectorization first. If you can express your operation using column arithmetic, np.where, np.select, or built-in Pandas methods, do that. It’s faster and usually more readable once you’re comfortable with the syntax.

  2. Use itertuples() when iteration is unavoidable. This includes scenarios like making API calls per row, writing to external systems, or operations where each row genuinely depends on complex state. It’s the fastest iteration method and preserves dtypes.

  3. Use apply() for complex row logic that’s awkward to vectorize. It won’t be fast, but it’s cleaner than a manual loop and works well in method chains.

  4. Reserve iterrows() for debugging and exploration. It’s the slowest option, but the Series return type can be convenient when you’re poking around in a notebook.

The broader lesson: Pandas rewards you for thinking in columns rather than rows. Invest time learning np.where, np.select, and the various Pandas transform methods. That investment pays dividends every time you process data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.