Pandas - Select Rows with Multiple Conditions (AND/OR)
The most common approach uses bitwise operators: `&` (AND), `|` (OR), and `~` (NOT). Each condition must be wrapped in parentheses due to Python's operator precedence.
Key Insights
- Pandas provides multiple syntaxes for filtering rows with compound conditions: bitwise operators (
&,|,~),query()method, andloc[]with callable functions—each with distinct performance characteristics and readability trade-offs - Bitwise operators require parentheses around each condition due to operator precedence, while
query()allows natural Python-like expressions without parentheses but with potential variable scope complications - For complex multi-condition filtering, combining
isin(),between(), and boolean indexing often outperforms chained comparisons, especially on large datasets
Bitwise Operators for AND/OR Conditions
The most common approach uses bitwise operators: & (AND), | (OR), and ~ (NOT). Each condition must be wrapped in parentheses due to Python’s operator precedence.
import pandas as pd
import numpy as np
# Sample dataset
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [1200, 25, 75, 350, 80],
'stock': [5, 150, 45, 12, 8],
'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories']
})
# AND condition: Electronics AND price > 500
electronics_expensive = df[(df['category'] == 'Electronics') & (df['price'] > 500)]
print(electronics_expensive)
Output:
product price stock category
0 Laptop 1200 5 Electronics
For OR conditions:
# OR condition: price < 50 OR stock > 100
cheap_or_abundant = df[(df['price'] < 50) | (df['stock'] > 100)]
print(cheap_or_abundant)
Output:
product price stock category
1 Mouse 25 150 Accessories
Combine multiple conditions with grouping:
# Complex: (Electronics AND price > 300) OR (Accessories AND stock < 10)
complex_filter = df[
((df['category'] == 'Electronics') & (df['price'] > 300)) |
((df['category'] == 'Accessories') & (df['stock'] < 10))
]
print(complex_filter)
Output:
product price stock category
0 Laptop 1200 5 Electronics
3 Monitor 350 12 Electronics
4 Webcam 80 8 Accessories
Using query() Method
The query() method provides cleaner syntax for complex conditions, accepting string expressions similar to SQL WHERE clauses.
# AND condition with query()
electronics_expensive = df.query('category == "Electronics" and price > 500')
print(electronics_expensive)
# OR condition
cheap_or_abundant = df.query('price < 50 or stock > 100')
print(cheap_or_abundant)
# Complex nested conditions
complex_filter = df.query(
'(category == "Electronics" and price > 300) or '
'(category == "Accessories" and stock < 10)'
)
print(complex_filter)
Using variables with query():
min_price = 300
target_category = 'Electronics'
# Use @ to reference variables
result = df.query('category == @target_category and price > @min_price')
print(result)
The query() method is particularly readable for complex conditions but has limitations with column names containing spaces or special characters.
loc[] with Boolean Indexing
The loc[] accessor combined with boolean indexing provides explicit control over row selection.
# Using loc with multiple conditions
result = df.loc[
(df['category'] == 'Electronics') & (df['price'] > 500),
['product', 'price'] # Select specific columns
]
print(result)
Output:
product price
0 Laptop 1200
Using callable functions with loc[]:
# Callable approach for reusable filters
def expensive_electronics(df):
return (df['category'] == 'Electronics') & (df['price'] > 500)
result = df.loc[expensive_electronics]
print(result)
isin() for Multiple Values
When checking if a column value matches any value in a list, isin() is more efficient than chaining OR conditions.
# Instead of: (df['product'] == 'Laptop') | (df['product'] == 'Monitor') | ...
target_products = ['Laptop', 'Monitor', 'Webcam']
result = df[df['product'].isin(target_products)]
print(result)
Output:
product price stock category
0 Laptop 1200 5 Electronics
3 Monitor 350 12 Electronics
4 Webcam 80 8 Accessories
Combining isin() with other conditions:
# Products in list AND price > 100
target_products = ['Laptop', 'Monitor', 'Webcam']
result = df[df['product'].isin(target_products) & (df['price'] > 100)]
print(result)
Use ~ for negation (NOT IN):
# Products NOT in the list
excluded_products = ['Mouse', 'Keyboard']
result = df[~df['product'].isin(excluded_products)]
print(result)
between() for Range Conditions
The between() method simplifies range checks, replacing two comparison operations.
# Price between 50 and 500 (inclusive by default)
mid_range = df[df['price'].between(50, 500)]
print(mid_range)
Output:
product price stock category
2 Keyboard 75 45 Accessories
3 Monitor 350 12 Electronics
4 Webcam 80 8 Accessories
Exclusive boundaries:
# Exclusive boundaries
result = df[df['price'].between(50, 500, inclusive='neither')]
print(result)
Combining between() with other conditions:
# Price between 50-500 AND Accessories category
result = df[
df['price'].between(50, 500) &
(df['category'] == 'Accessories')
]
print(result)
String Matching with Multiple Conditions
For string columns, combine pattern matching with other conditions.
# Create sample data with text
df_text = pd.DataFrame({
'product': ['Laptop Pro', 'Gaming Mouse', 'Mechanical Keyboard', 'LED Monitor', 'HD Webcam'],
'price': [1200, 45, 120, 350, 80],
'brand': ['Apple', 'Logitech', 'Corsair', 'Dell', 'Logitech']
})
# Products containing 'o' AND price > 100
result = df_text[
df_text['product'].str.contains('o', case=False) &
(df_text['price'] > 100)
]
print(result)
Output:
product price brand
0 Laptop Pro 1200 Apple
2 Mechanical Keyboard 120 Corsair
3 LED Monitor 350 Dell
Multiple string conditions:
# Brand is Logitech OR product starts with 'LED'
result = df_text[
(df_text['brand'] == 'Logitech') |
df_text['product'].str.startswith('LED')
]
print(result)
Performance Considerations
For large datasets, method choice impacts performance significantly.
import time
# Create large dataset
large_df = pd.DataFrame({
'A': np.random.randint(0, 100, 1000000),
'B': np.random.randint(0, 100, 1000000),
'C': np.random.choice(['X', 'Y', 'Z'], 1000000)
})
# Benchmark bitwise operators
start = time.time()
result1 = large_df[(large_df['A'] > 50) & (large_df['B'] < 30)]
time1 = time.time() - start
# Benchmark query()
start = time.time()
result2 = large_df.query('A > 50 and B < 30')
time2 = time.time() - start
print(f"Bitwise: {time1:.4f}s")
print(f"Query: {time2:.4f}s")
Generally, bitwise operators perform slightly faster for simple conditions, while query() can optimize complex expressions. For repeated filtering operations, pre-compute boolean masks:
# Pre-compute masks for reuse
mask_a = large_df['A'] > 50
mask_b = large_df['B'] < 30
mask_c = large_df['C'] == 'X'
# Combine masks efficiently
result = large_df[mask_a & mask_b & mask_c]
Handling Missing Values in Conditions
Missing values require explicit handling in multi-condition filters.
df_missing = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price': [1200, np.nan, 75, 350],
'stock': [5, 150, np.nan, 12]
})
# Exclude NaN values explicitly
result = df_missing[
(df_missing['price'] > 100) &
(df_missing['price'].notna())
]
print(result)
For inclusive NaN handling:
# Include rows where price > 100 OR price is NaN
result = df_missing[
(df_missing['price'] > 100) |
(df_missing['price'].isna())
]
print(result)
Choose your filtering approach based on expression complexity, performance requirements, and team coding standards. Bitwise operators offer the best performance for simple conditions, query() excels at readability for complex logic, and specialized methods like isin() and between() optimize specific use cases.