How to Filter Rows in Polars

Polars has earned its reputation as the fastest DataFrame library in Python, and row filtering is where that speed becomes immediately apparent. Unlike pandas, which processes filters row-by-row in...

Key Insights

  • Polars uses expression-based filtering with filter() and pl.col(), which enables powerful query optimization and parallel execution that pandas can’t match
  • Always wrap multiple conditions in parentheses when using & and | operators—Python’s operator precedence will bite you otherwise
  • Lazy evaluation with lazy() and collect() lets Polars optimize your entire filter chain before execution, often yielding 10-100x performance improvements on large datasets

Why Filtering Matters in Polars

Polars has earned its reputation as the fastest DataFrame library in Python, and row filtering is where that speed becomes immediately apparent. Unlike pandas, which processes filters row-by-row in many cases, Polars compiles expressions into optimized query plans and executes them in parallel across CPU cores.

If you’re coming from pandas, you’ll need to adjust your mental model. Polars doesn’t use boolean indexing with df[df["column"] > 5]. Instead, it uses an expression-based API that’s more explicit and significantly faster. Let’s dive into the practical patterns you’ll use daily.

Basic Filtering with filter()

The filter() method is your primary tool for row selection. It takes a boolean expression built with pl.col() and returns a new DataFrame containing only matching rows.

import polars as pl

# Create a sample DataFrame
df = pl.DataFrame({
    "product": ["Widget", "Gadget", "Sprocket", "Gizmo", "Doohickey"],
    "price": [25.99, 149.99, 8.50, 299.99, 45.00],
    "quantity": [100, 25, 500, 10, 75],
    "category": ["Tools", "Electronics", "Tools", "Electronics", "Tools"]
})

# Filter rows where category equals "Tools"
tools_df = df.filter(pl.col("category") == "Tools")
print(tools_df)

Output:

shape: (3, 4)
┌───────────┬───────┬──────────┬──────────┐
│ product   ┆ price ┆ quantity ┆ category │
│ ---       ┆ ---   ┆ ---      ┆ ---      │
│ str       ┆ f64   ┆ i64      ┆ str      │
╞═══════════╪═══════╪══════════╪══════════╡
│ Widget    ┆ 25.99 ┆ 100      ┆ Tools    │
│ Sprocket  ┆ 8.5   ┆ 500      ┆ Tools    │
│ Doohickey ┆ 45.0  ┆ 75       ┆ Tools    │
└───────────┴───────┴──────────┴──────────┘

The pl.col("category") creates a column expression, and == "Tools" creates a boolean expression. Polars evaluates this expression for every row and keeps only those where it’s true.

Comparison Operators

Polars supports all standard comparison operators, and they work exactly as you’d expect on numeric and string data.

# Filter rows where price > 100 and quantity < 50
expensive_low_stock = df.filter(
    (pl.col("price") > 100) & (pl.col("quantity") < 50)
)
print(expensive_low_stock)

Output:

shape: (2, 4)
┌─────────┬────────┬──────────┬─────────────┐
│ product ┆ price  ┆ quantity ┆ category    │
│ ---     ┆ ---    ┆ ---      ┆ ---         │
│ str     ┆ f64    ┆ i64      ┆ str         │
╞═════════╪════════╪══════════╪═════════════╡
│ Gadget  ┆ 149.99 ┆ 25       ┆ Electronics │
│ Gizmo   ┆ 299.99 ┆ 10       ┆ Electronics │
└─────────┴────────┴──────────┴─────────────┘

Here’s the full list of comparison operators:

  • == equal to
  • != not equal to
  • > greater than
  • < less than
  • >= greater than or equal to
  • <= less than or equal to

String comparisons work lexicographically, which is useful for alphabetical sorting but can surprise you with mixed-case data.

Combining Conditions with Logical Operators

Complex filters require combining multiple conditions. Polars uses & for AND, | for OR, and ~ for NOT. Here’s the critical rule: always wrap individual conditions in parentheses.

# Find products that are either cheap OR (expensive AND low stock)
# Without parentheses, this would fail or produce wrong results
filtered = df.filter(
    (pl.col("price") < 30) | 
    ((pl.col("price") > 200) & (pl.col("quantity") < 20))
)
print(filtered)

Output:

shape: (3, 4)
┌──────────┬────────┬──────────┬──────────┐
│ product  ┆ price  ┆ quantity ┆ category │
│ ---      ┆ ---    ┆ ---      ┆ ---      │
│ str      ┆ f64    ┆ i64      ┆ str      │
╞══════════╪════════╪══════════╪══════════╡
│ Widget   ┆ 25.99  ┆ 100      ┆ Tools    │
│ Sprocket ┆ 8.5    ┆ 500      ┆ Tools    │
│ Gizmo    ┆ 299.99 ┆ 10       ┆ Electronics │
└──────────┴────────┴──────────┴──────────┘

The ~ operator negates a condition:

# Find all products NOT in the Tools category
non_tools = df.filter(~(pl.col("category") == "Tools"))
# Equivalent to: df.filter(pl.col("category") != "Tools")

Python’s operator precedence means & and | bind tighter than comparison operators. Without parentheses, pl.col("price") > 100 & pl.col("quantity") < 50 would be parsed as pl.col("price") > (100 & pl.col("quantity")) < 50, which isn’t what you want.

String Filtering Methods

Text data requires specialized filtering methods. Polars provides a rich str namespace for string operations.

# Create a DataFrame with names
people = pl.DataFrame({
    "name": ["John Smith", "Jane Doe", "Bob Smithson", "Alice SMITH", "Charlie Brown"],
    "age": [30, 25, 45, 35, 28],
    "department": ["Engineering", "Sales", "Engineering", "Marketing", "Sales"]
})

# Filter rows where name contains "Smith" (case-insensitive)
smiths = people.filter(
    pl.col("name").str.to_lowercase().str.contains("smith")
)
print(smiths)

Output:

shape: (3, 3)
┌──────────────┬─────┬─────────────┐
│ name         ┆ age ┆ department  │
│ ---          ┆ --- ┆ ---         │
│ str          ┆ i64 ┆ str         │
╞══════════════╪═════╪═════════════╡
│ John Smith   ┆ 30  ┆ Engineering │
│ Bob Smithson ┆ 45  ┆ Engineering │
│ Alice SMITH  ┆ 35  ┆ Marketing   │
└──────────────┴─────┴─────────────┘

Other useful string methods:

# Starts with a prefix
df.filter(pl.col("name").str.starts_with("John"))

# Ends with a suffix
df.filter(pl.col("name").str.ends_with("son"))

# Regex matching for complex patterns
df.filter(pl.col("name").str.contains(r"^[A-Z][a-z]+\s[A-Z][a-z]+$"))

The str.contains() method accepts regular expressions by default. If you’re matching literal strings with special regex characters, pass literal=True to avoid escaping issues.

Filtering with is_in() and Null Handling

Membership testing with is_in() replaces verbose chains of OR conditions. Null handling requires explicit methods since null comparisons don’t behave intuitively.

# Filter rows where category is in a specific list
target_categories = ["Electronics", "Tools"]
filtered = df.filter(pl.col("category").is_in(target_categories))
print(filtered)

For null handling, Polars provides is_null() and is_not_null():

# Create DataFrame with nulls
df_with_nulls = pl.DataFrame({
    "product": ["Widget", "Gadget", None, "Gizmo", "Doohickey"],
    "price": [25.99, None, 8.50, 299.99, None],
    "category": ["Tools", "Electronics", "Tools", None, "Tools"]
})

# Keep only rows where price is not null
valid_prices = df_with_nulls.filter(pl.col("price").is_not_null())

# Drop rows where ANY column has a null
complete_rows = df_with_nulls.filter(
    pl.all_horizontal(pl.all().is_not_null())
)

# Keep rows where at least one specified column has a value
has_some_data = df_with_nulls.filter(
    pl.col("price").is_not_null() | pl.col("category").is_not_null()
)

A common gotcha: pl.col("price") != None doesn’t work as expected. Null comparisons always return null in Polars (and SQL), so you must use is_null() and is_not_null() explicitly.

Performance Tips

Polars really shines when you use lazy evaluation. Instead of executing each operation immediately, lazy mode builds a query plan that Polars optimizes before execution.

# Lazy filtering chain with multiple conditions
result = (
    pl.scan_csv("large_sales_data.csv")  # Returns LazyFrame
    .filter(pl.col("year") >= 2020)
    .filter(pl.col("region").is_in(["North", "South"]))
    .filter(pl.col("revenue") > 10000)
    .filter(~pl.col("status").str.contains("cancelled"))
    .select(["date", "region", "revenue", "product"])
    .collect()  # Execute the optimized query
)

What happens under the hood:

  1. Polars combines all four filter() calls into a single pass over the data
  2. Predicate pushdown moves filters as early as possible in the query plan
  3. Projection pushdown reads only the columns you actually need from the CSV
  4. Parallel execution processes chunks of data across all CPU cores

For an eager DataFrame, convert to lazy mode with lazy():

# Convert existing DataFrame to lazy, filter, then collect
result = (
    df.lazy()
    .filter((pl.col("price") > 50) & (pl.col("quantity") > 20))
    .filter(pl.col("category") == "Electronics")
    .collect()
)

Benchmark comparisons consistently show Polars filtering 5-50x faster than pandas on datasets over 100K rows. On datasets with millions of rows, the gap widens further because Polars’ parallel execution scales with CPU cores while pandas remains single-threaded.

One final tip: when filtering on multiple columns, put the most selective filter first. While Polars’ optimizer handles many cases automatically, helping it by ordering filters from most to least selective can improve performance on very large datasets.

Wrapping Up

Row filtering in Polars follows a consistent pattern: use filter() with expressions built from pl.col(). Master the comparison operators, remember to parenthesize compound conditions, and leverage lazy evaluation for performance. The expression-based API might feel verbose compared to pandas’ bracket indexing, but it enables optimizations that make Polars dramatically faster on real-world data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.