How to Filter Rows in Pandas

Row filtering is something you'll do in virtually every pandas workflow. Whether you're cleaning messy data, preparing subsets for analysis, or extracting records that meet specific criteria,...

Key Insights

  • Boolean indexing is the foundation of pandas filtering—master it first, and every other method becomes intuitive
  • Use .query() for complex conditions with multiple columns; it’s more readable and often faster on large datasets
  • Always wrap multiple conditions in parentheses when using & and | operators; this is the most common source of filtering bugs

Introduction

Row filtering is something you’ll do in virtually every pandas workflow. Whether you’re cleaning messy data, preparing subsets for analysis, or extracting records that meet specific criteria, filtering is foundational. Yet I regularly see developers struggle with the syntax, especially when combining multiple conditions.

This article covers the practical techniques you need. We’ll start with boolean indexing basics, work through multi-condition filtering, and explore specialized methods like .query(), .isin(), and .between(). By the end, you’ll know exactly which approach to use for any filtering scenario.

Let’s work with a sample dataset throughout:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'age': [28, 35, 42, 31, None],
    'department': ['Engineering', 'Sales', 'Engineering', 'Marketing', 'Sales'],
    'salary': [75000, 52000, 95000, 61000, 48000],
    'years_employed': [3, 7, 12, 5, 2]
})

Boolean Indexing Basics

Boolean indexing is the core mechanism behind pandas filtering. The concept is simple: you create a Series of True and False values (a boolean mask), then pass it to the DataFrame. Rows where the mask is True are kept; rows where it’s False are dropped.

When you apply a comparison operator to a pandas Series, you get a boolean Series back:

# This creates a boolean Series
mask = df['age'] > 30
print(mask)
# 0    False
# 1     True
# 2     True
# 3     True
# 4    False
# Name: age, dtype: bool

Pass this mask to the DataFrame using bracket notation to filter:

# Filter rows where age > 30
older_employees = df[df['age'] > 30]
print(older_employees)
#       name   age   department  salary  years_employed
# 1      Bob  35.0        Sales   52000               7
# 2  Charlie  42.0  Engineering   95000              12
# 3    Diana  31.0    Marketing   61000               5

String comparisons work identically:

# Filter rows where department equals 'Engineering'
engineers = df[df['department'] == 'Engineering']
print(engineers)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 2  Charlie  42.0  Engineering   95000              12

You can also use other comparison operators: <, <=, >=, !=. The pattern is always the same: df[df['column'] <operator> value].

Filtering with Multiple Conditions

Real-world filtering usually involves multiple conditions. This is where pandas syntax gets tricky if you’re coming from Python or SQL.

Use & for AND, | for OR, and ~ for NOT. Crucially, you must wrap each condition in parentheses. This isn’t optional—it’s required because of Python’s operator precedence rules.

# Filter: age > 30 AND salary < 70000
mid_career_moderate_pay = df[(df['age'] > 30) & (df['salary'] < 70000)]
print(mid_career_moderate_pay)
#     name   age department  salary  years_employed
# 1    Bob  35.0      Sales   52000               7
# 3  Diana  31.0  Marketing   61000               5

Without parentheses, you’ll get a cryptic error about ambiguous truth values. I’ve seen experienced developers waste hours debugging this.

OR conditions follow the same pattern:

# Filter: department is Engineering OR salary > 60000
high_value = df[(df['department'] == 'Engineering') | (df['salary'] > 60000)]
print(high_value)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 2  Charlie  42.0  Engineering   95000              12
# 3    Diana  31.0    Marketing   61000               5

Negate conditions with ~:

# Filter: NOT in Sales department
not_sales = df[~(df['department'] == 'Sales')]
print(not_sales)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 2  Charlie  42.0  Engineering   95000              12
# 3    Diana  31.0    Marketing   61000               5

When you have more than two or three conditions, the boolean indexing syntax becomes unwieldy. That’s when you should reach for .query().

Using the .query() Method

The .query() method lets you write filter conditions as strings using a SQL-like syntax. It’s more readable for complex conditions and often performs better on large datasets.

Here’s the boolean indexing example rewritten with .query():

# Boolean indexing version
result = df[(df['age'] > 30) & (df['salary'] < 70000)]

# Equivalent .query() version
result = df.query('age > 30 and salary < 70000')

The .query() version is cleaner—no repeated df, no parentheses around each condition, and you use and/or/not instead of &/|/~.

For string comparisons, use quotes within the query string:

# Filter by string value
engineers = df.query('department == "Engineering"')

Reference external variables with the @ prefix:

min_salary = 60000
target_dept = 'Engineering'

# Reference Python variables in query
filtered = df.query('salary >= @min_salary and department == @target_dept')
print(filtered)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 2  Charlie  42.0  Engineering   95000              12

This variable referencing is particularly useful in functions where filter values come from parameters.

One limitation: .query() doesn’t work well with column names containing spaces or special characters. Stick to boolean indexing for those cases, or rename your columns first.

Filtering with .isin() and .between()

When checking if values are in a list or within a range, pandas provides specialized methods that are both cleaner and faster than chaining OR conditions.

Use .isin() to check membership in a collection:

# Filter: department is Engineering or Marketing
target_depts = ['Engineering', 'Marketing']
filtered = df[df['department'].isin(target_depts)]
print(filtered)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 2  Charlie  42.0  Engineering   95000              12
# 3    Diana  31.0    Marketing   61000               5

Compare this to the OR-chain alternative:

# Equivalent but worse
filtered = df[(df['department'] == 'Engineering') | (df['department'] == 'Marketing')]

The .isin() version scales better—adding more values to the list doesn’t make the code longer or harder to read.

For numeric ranges, .between() is your friend:

# Filter: salary between 50000 and 80000 (inclusive)
mid_range_salary = df[df['salary'].between(50000, 80000)]
print(mid_range_salary)
#     name   age   department  salary  years_employed
# 0  Alice  28.0  Engineering   75000               3
# 1    Bob  35.0        Sales   52000               7
# 3  Diana  31.0    Marketing   61000               5
# 4    Eve   NaN        Sales   48000               2

By default, .between() is inclusive on both ends. Use the inclusive parameter to change this:

# Exclusive on both ends
df[df['salary'].between(50000, 80000, inclusive='neither')]

# Inclusive only on left
df[df['salary'].between(50000, 80000, inclusive='left')]

Handling Missing Values in Filters

Missing values (NaN/None) require special attention. Standard comparison operators don’t work intuitively with NaN—NaN == NaN returns False, and NaN > 5 also returns False.

Use .isna() and .notna() for explicit null checking:

# Filter rows where age is missing
missing_age = df[df['age'].isna()]
print(missing_age)
#   name  age department  salary  years_employed
# 4  Eve  NaN      Sales   48000               2

# Filter rows where age is NOT missing
has_age = df[df['age'].notna()]
print(has_age)
#       name   age   department  salary  years_employed
# 0    Alice  28.0  Engineering   75000               3
# 1      Bob  35.0        Sales   52000               7
# 2  Charlie  42.0  Engineering   95000              12
# 3    Diana  31.0    Marketing   61000               5

Combine null checks with other conditions:

# Filter: age > 30 OR age is missing
result = df[(df['age'] > 30) | (df['age'].isna())]

For quick removal of rows with any missing values, use dropna():

# Drop rows with any missing values
clean_df = df.dropna()

# Drop rows with missing values in specific columns only
clean_df = df.dropna(subset=['age', 'salary'])

Performance Tips and Summary

For small to medium datasets (under a million rows), all filtering methods perform similarly. For larger datasets, keep these guidelines in mind:

Boolean indexing is the most flexible and works everywhere. It’s your default choice for simple conditions.

.query() can be faster on large DataFrames because it uses numexpr under the hood for numeric operations. It’s also more readable for complex multi-column conditions.

.isin() is significantly faster than chaining OR conditions, especially with large lists of values.

Avoid applying filters in loops. If you need to filter by multiple criteria separately, consider using groupby() instead.

Here’s a quick reference for choosing your filtering approach:

Scenario Recommended Method
Simple single condition Boolean indexing
Multiple conditions (3+) .query()
Value in list .isin()
Numeric range .between()
Missing value check .isna() / .notna()
Column names with spaces Boolean indexing

Filtering is one of those operations you’ll use constantly. Get comfortable with these patterns, and you’ll write cleaner, faster pandas code. Start with boolean indexing to understand the fundamentals, then reach for .query() and the specialized methods when they make your code clearer.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.