Pandas - Select Rows Between Two Values
• Use boolean indexing with comparison operators to filter DataFrame rows between two values, combining conditions with the `&` operator for precise range selection
Key Insights
• Use boolean indexing with comparison operators to filter DataFrame rows between two values, combining conditions with the & operator for precise range selection
• The between() method provides cleaner syntax for inclusive range filtering and handles edge cases like missing values more gracefully than manual comparison operators
• Query-based selection using query() offers readable SQL-like syntax for complex filtering operations, especially when working with multiple conditions or column name references
Boolean Indexing with Comparison Operators
The most fundamental approach to selecting rows between two values uses boolean indexing with comparison operators. This method gives you complete control over whether endpoints are inclusive or exclusive.
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'price': [10, 25, 30, 45, 50, 65, 80],
'quantity': [100, 150, 200, 120, 180, 90, 110]
})
# Select rows where price is between 25 and 65 (inclusive)
filtered = df[(df['price'] >= 25) & (df['price'] <= 65)]
print(filtered)
product price quantity
1 B 25 150
2 C 30 200
3 D 45 120
4 E 50 180
5 F 65 90
The key requirement is wrapping each condition in parentheses and using & for AND operations. The bitwise operator & has higher precedence than comparison operators, making parentheses mandatory.
For exclusive ranges, adjust the operators:
# Exclusive bounds (25 < price < 65)
filtered_exclusive = df[(df['price'] > 25) & (df['price'] < 65)]
print(filtered_exclusive)
product price quantity
2 C 30 200
3 D 45 120
4 E 50 180
Using the between() Method
The between() method simplifies range filtering with cleaner syntax. By default, it performs inclusive comparisons on both endpoints.
# Inclusive range using between()
filtered_between = df[df['price'].between(25, 65)]
print(filtered_between)
The between() method accepts an inclusive parameter that controls endpoint behavior:
# Exclusive on both ends
filtered_exclusive = df[df['price'].between(25, 65, inclusive='neither')]
# Inclusive on left, exclusive on right
filtered_left = df[df['price'].between(25, 65, inclusive='left')]
# Inclusive on right, exclusive on left
filtered_right = df[df['price'].between(25, 65, inclusive='right')]
print("Neither inclusive:")
print(filtered_exclusive)
print("\nLeft inclusive only:")
print(filtered_left)
Neither inclusive:
product price quantity
2 C 30 200
3 D 45 120
4 E 50 180
Left inclusive only:
product price quantity
1 B 25 150
2 C 30 200
3 D 45 120
4 E 50 180
The between() method handles NaN values intelligently by excluding them from results by default:
df_with_nan = df.copy()
df_with_nan.loc[3, 'price'] = np.nan
filtered = df_with_nan[df_with_nan['price'].between(25, 65)]
print(filtered)
# NaN row is automatically excluded
Query Method for SQL-Like Syntax
The query() method provides readable, SQL-inspired syntax particularly useful for complex conditions or when column names would require awkward bracket notation.
# Basic query syntax
filtered_query = df.query('price >= 25 and price <= 65')
print(filtered_query)
# Equivalent to between()
filtered_query_between = df.query('25 <= price <= 65')
print(filtered_query_between)
The query() method shines when combining multiple conditions:
# Multiple conditions
result = df.query('25 <= price <= 65 and quantity > 100')
print(result)
product price quantity
1 B 25 150
2 C 30 200
3 D 45 120
4 E 50 180
Use variables in queries with the @ symbol:
min_price = 25
max_price = 65
min_quantity = 100
result = df.query('price >= @min_price and price <= @max_price and quantity > @min_quantity')
print(result)
Filtering with DateTime Ranges
Selecting rows between date values is a common requirement. Pandas handles datetime comparisons seamlessly with all three methods.
# Create DataFrame with datetime index
date_range = pd.date_range('2024-01-01', periods=10, freq='D')
df_dates = pd.DataFrame({
'date': date_range,
'sales': [100, 150, 200, 180, 220, 190, 210, 230, 195, 205]
})
# Boolean indexing with datetime
start_date = pd.Timestamp('2024-01-03')
end_date = pd.Timestamp('2024-01-07')
filtered_dates = df_dates[(df_dates['date'] >= start_date) & (df_dates['date'] <= end_date)]
print(filtered_dates)
date sales
2 2024-01-03 200
3 2024-01-04 180
4 2024-01-05 220
5 2024-01-06 190
6 2024-01-07 210
Using between() with datetime objects:
filtered_between_dates = df_dates[df_dates['date'].between('2024-01-03', '2024-01-07')]
print(filtered_between_dates)
For datetime strings, Pandas automatically converts them:
filtered_string_dates = df_dates[
df_dates['date'].between('2024-01-03', '2024-01-07')
]
Multiple Column Filtering
Real-world scenarios often require filtering across multiple columns simultaneously.
# Create multi-dimensional dataset
df_multi = pd.DataFrame({
'product': ['A', 'B', 'C', 'D', 'E', 'F'],
'price': [10, 25, 30, 45, 50, 65],
'quantity': [100, 150, 200, 120, 180, 90],
'rating': [4.2, 4.5, 3.8, 4.7, 4.1, 4.9]
})
# Filter on multiple ranges
filtered_multi = df_multi[
(df_multi['price'].between(25, 65)) &
(df_multi['quantity'].between(100, 180)) &
(df_multi['rating'] >= 4.0)
]
print(filtered_multi)
product price quantity rating
1 B 25 150 4.5
3 D 45 120 4.7
4 E 50 180 4.1
Using query() for multiple ranges:
result = df_multi.query('25 <= price <= 65 and 100 <= quantity <= 180 and rating >= 4.0')
print(result)
Performance Considerations
For large DataFrames, method choice impacts performance. Boolean indexing typically performs fastest for simple conditions:
import time
# Create large DataFrame
large_df = pd.DataFrame({
'value': np.random.randint(0, 1000, 1000000)
})
# Benchmark boolean indexing
start = time.time()
result1 = large_df[(large_df['value'] >= 250) & (large_df['value'] <= 750)]
time1 = time.time() - start
# Benchmark between()
start = time.time()
result2 = large_df[large_df['value'].between(250, 750)]
time2 = time.time() - start
# Benchmark query()
start = time.time()
result3 = large_df.query('250 <= value <= 750')
time3 = time.time() - start
print(f"Boolean indexing: {time1:.4f}s")
print(f"between(): {time2:.4f}s")
print(f"query(): {time3:.4f}s")
Boolean indexing and between() typically perform similarly, while query() may be slightly slower due to string parsing overhead. However, query() can be faster with numexpr engine for complex expressions.
Handling Edge Cases
Always consider NaN values, empty results, and type mismatches:
df_edge = pd.DataFrame({
'value': [10, 25, np.nan, 45, None, 65]
})
# between() excludes NaN by default
result = df_edge[df_edge['value'].between(20, 50)]
print(f"Rows with values between 20-50: {len(result)}") # 2 rows
# Boolean indexing with NaN handling
result_bool = df_edge[(df_edge['value'] >= 20) & (df_edge['value'] <= 50)]
print(f"Boolean result: {len(result_bool)}") # 2 rows
# Check for empty results
if result.empty:
print("No matching rows found")
else:
print(f"Found {len(result)} matching rows")
Type consistency matters for accurate comparisons:
df_types = pd.DataFrame({
'value': ['10', '25', '30', '45'] # String values
})
# This won't work as expected
# filtered = df_types[df_types['value'].between(20, 40)]
# Convert to numeric first
df_types['value'] = pd.to_numeric(df_types['value'])
filtered = df_types[df_types['value'].between(20, 40)]
print(filtered)
Choose boolean indexing for maximum control and performance, between() for cleaner code with standard inclusive ranges, and query() for complex multi-condition filters requiring readable syntax.