How to Filter by Multiple Conditions in Polars
Polars has emerged as the go-to DataFrame library for Python developers who need speed. Built in Rust with a query optimizer, it consistently outperforms pandas by 10-100x on large datasets. But...
Key Insights
- Polars requires explicit parentheses around each condition when combining with
&,|, or~operators—forgetting them is the most common source of filtering errors - Use
pl.all_horizontal()andpl.any_horizontal()for cleaner, more readable code when applying the same logical operation across multiple conditions - Lazy mode with predicate pushdown can dramatically improve performance by filtering data before it’s fully loaded into memory
Introduction
Polars has emerged as the go-to DataFrame library for Python developers who need speed. Built in Rust with a query optimizer, it consistently outperforms pandas by 10-100x on large datasets. But speed means nothing if you can’t express your filtering logic clearly.
Filtering by multiple conditions is something you’ll do constantly—finding users who are active AND subscribed, products that are in stock OR on backorder, transactions that happened this month but NOT from a specific region. Polars gives you several ways to express these conditions, each with different trade-offs for readability and flexibility.
This article covers the practical patterns you’ll actually use, from basic boolean operators to dynamic filter construction for production applications.
Basic Single-Condition Filtering Refresher
Before combining conditions, let’s establish the foundation. Polars uses the .filter() method with column expressions:
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
"age": [28, 35, 42, 31, 25],
"department": ["Engineering", "Sales", "Engineering", "Marketing", "Sales"],
"salary": [75000, 65000, 95000, 70000, 55000],
"is_active": [True, True, False, True, True]
})
# Single condition filter
senior_employees = df.filter(pl.col("age") > 30)
print(senior_employees)
Output:
shape: (3, 5)
┌─────────┬─────┬─────────────┬────────┬───────────┐
│ name ┆ age ┆ department ┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═════════╪═════╪═════════════╪════════╪═══════════╡
│ Bob ┆ 35 ┆ Sales ┆ 65000 ┆ true │
│ Charlie ┆ 42 ┆ Engineering ┆ 95000 ┆ false │
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└─────────┴─────┴─────────────┴────────┴───────────┘
The pl.col() function creates an expression that references a column. The comparison operator returns a boolean expression that .filter() uses to select rows.
Combining Conditions with Logical Operators
Here’s where most developers hit their first Polars gotcha: you must wrap each condition in parentheses. This isn’t optional—Python’s operator precedence will break your code without them.
AND Conditions with &
# Find active employees over 30
result = df.filter(
(pl.col("age") > 30) & (pl.col("is_active") == True)
)
print(result)
Output:
shape: (2, 5)
┌───────┬─────┬─────────────┬────────┬───────────┐
│ name ┆ age ┆ department ┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═══════╪═════╪═════════════╪════════╪═══════════╡
│ Bob ┆ 35 ┆ Sales ┆ 65000 ┆ true │
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└───────┴─────┴─────────────┴────────┴───────────┘
Without parentheses, pl.col("age") > 30 & pl.col("is_active") would evaluate 30 & pl.col("is_active") first due to operator precedence, causing a cryptic error.
OR Conditions with |
# Find employees in Engineering OR with salary above 70000
result = df.filter(
(pl.col("department") == "Engineering") | (pl.col("salary") > 70000)
)
print(result)
Output:
shape: (3, 5)
┌─────────┬─────┬─────────────┬────────┬───────────┐
│ name ┆ age ┆ department ┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═════════╪═════╪═════════════╪════════╪═══════════╡
│ Alice ┆ 28 ┆ Engineering ┆ 75000 ┆ true │
│ Charlie ┆ 42 ┆ Engineering ┆ 95000 ┆ false │
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└─────────┴─────┴─────────────┴────────┴───────────┘
NOT Conditions with ~
# Find employees NOT in Sales
result = df.filter(
~(pl.col("department") == "Sales")
)
print(result)
You can also use .ne() or != for simple negations, but ~ is essential for negating complex expressions.
Combining All Three
# Active employees over 30 who are NOT in Sales
result = df.filter(
(pl.col("age") > 30) &
(pl.col("is_active") == True) &
~(pl.col("department") == "Sales")
)
print(result)
Output:
shape: (1, 5)
┌───────┬─────┬───────────┬────────┬───────────┐
│ name ┆ age ┆ department┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═══════╪═════╪═══════════╪════════╪═══════════╡
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└───────┴─────┴───────────┴────────┴───────────┘
Using pl.all_horizontal() and pl.any_horizontal()
When you have many conditions with the same logical operator, chaining & or | becomes unwieldy. Polars provides pl.all_horizontal() (all conditions must be true) and pl.any_horizontal() (at least one condition must be true) for cleaner code.
# Using all_horizontal for multiple AND conditions
result = df.filter(
pl.all_horizontal(
pl.col("age") > 25,
pl.col("salary") > 60000,
pl.col("is_active") == True
)
)
print(result)
Output:
shape: (3, 5)
┌───────┬─────┬─────────────┬────────┬───────────┐
│ name ┆ age ┆ department ┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═══════╪═════╪═════════════╪════════╪═══════════╡
│ Alice ┆ 28 ┆ Engineering ┆ 75000 ┆ true │
│ Bob ┆ 35 ┆ Sales ┆ 65000 ┆ true │
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└───────┴─────┴─────────────┴────────┴───────────┘
# Using any_horizontal for multiple OR conditions
result = df.filter(
pl.any_horizontal(
pl.col("department") == "Engineering",
pl.col("salary") > 90000,
pl.col("age") < 26
)
)
print(result)
These functions accept any number of boolean expressions and are particularly useful when building conditions programmatically.
Filtering with .is_in() for Multiple Values
When checking if a column matches any value from a list, .is_in() is cleaner and faster than chaining OR conditions:
# Instead of this verbose approach
verbose_result = df.filter(
(pl.col("department") == "Engineering") |
(pl.col("department") == "Marketing")
)
# Use is_in()
clean_result = df.filter(
pl.col("department").is_in(["Engineering", "Marketing"])
)
print(clean_result)
Output:
shape: (3, 5)
┌─────────┬─────┬─────────────┬────────┬───────────┐
│ name ┆ age ┆ department ┆ salary ┆ is_active │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ bool │
╞═════════╪═════════════════════╪════════╪═══════════╡
│ Alice ┆ 28 ┆ Engineering ┆ 75000 ┆ true │
│ Charlie ┆ 42 ┆ Engineering ┆ 95000 ┆ false │
│ Diana ┆ 31 ┆ Marketing ┆ 70000 ┆ true │
└─────────┴─────┴─────────────┴────────┴───────────┘
Combine .is_in() with other conditions naturally:
# Active employees in specific departments with salary above threshold
result = df.filter(
(pl.col("department").is_in(["Engineering", "Marketing"])) &
(pl.col("is_active") == True) &
(pl.col("salary") >= 70000)
)
Dynamic Filtering with Expression Lists
Production code often requires building filters dynamically—from user input, configuration files, or API parameters. Here’s how to construct filters programmatically:
from functools import reduce
import operator
# Define conditions as a dictionary
filter_config = {
"age": ("gt", 25),
"salary": ("gte", 60000),
"is_active": ("eq", True)
}
def build_condition(column: str, op: str, value) -> pl.Expr:
"""Convert a filter specification to a Polars expression."""
col = pl.col(column)
operations = {
"eq": lambda c, v: c == v,
"ne": lambda c, v: c != v,
"gt": lambda c, v: c > v,
"gte": lambda c, v: c >= v,
"lt": lambda c, v: c < v,
"lte": lambda c, v: c <= v,
}
return operations[op](col, value)
# Build list of conditions
conditions = [
build_condition(col, op, val)
for col, (op, val) in filter_config.items()
]
# Combine with reduce for AND logic
combined_filter = reduce(operator.and_, conditions)
result = df.filter(combined_filter)
print(result)
For simpler cases, pass a list directly to pl.all_horizontal():
# Dynamic conditions list
conditions = [
pl.col("age") > 25,
pl.col("salary") > 60000,
]
# Add conditions based on runtime logic
if include_active_only:
conditions.append(pl.col("is_active") == True)
result = df.filter(pl.all_horizontal(*conditions))
Performance Tips and Best Practices
Use Lazy Mode for Large Datasets
Lazy evaluation enables predicate pushdown—Polars pushes your filter conditions down to the data source, filtering rows before loading them into memory:
# Eager mode: loads all data, then filters
eager_result = pl.read_csv("large_file.csv").filter(
pl.col("status") == "active"
)
# Lazy mode: filters during scan
lazy_result = (
pl.scan_csv("large_file.csv")
.filter(pl.col("status") == "active")
.collect()
)
Inspect the query plan with .explain():
query = (
pl.scan_csv("large_file.csv")
.filter((pl.col("age") > 30) & (pl.col("status") == "active"))
)
print(query.explain())
You’ll see FILTER operations pushed down to the CSV SCAN step.
Order Conditions by Selectivity
Put the most restrictive conditions first. Polars evaluates conditions left to right, and eliminating rows early reduces work for subsequent conditions:
# Better: rare condition first
df.filter(
(pl.col("premium_tier") == "platinum") & # ~1% of rows
(pl.col("is_active") == True) # ~80% of rows
)
# Worse: common condition first
df.filter(
(pl.col("is_active") == True) & # ~80% of rows
(pl.col("premium_tier") == "platinum") # ~1% of rows
)
Avoid Anti-Patterns
Don’t use Python’s and/or keywords—they don’t work with Polars expressions:
# WRONG: Python keywords
df.filter(pl.col("age") > 30 and pl.col("is_active")) # Raises error
# CORRECT: Bitwise operators with parentheses
df.filter((pl.col("age") > 30) & (pl.col("is_active")))
Don’t filter repeatedly when you can combine conditions:
# Inefficient: multiple filter calls
result = df.filter(pl.col("age") > 30)
result = result.filter(pl.col("is_active") == True)
# Efficient: single filter with combined conditions
result = df.filter(
(pl.col("age") > 30) & (pl.col("is_active") == True)
)
Polars’ filtering capabilities strike a balance between expressiveness and performance. Master these patterns, and you’ll write cleaner, faster data processing code.