Pandas - Select Rows Between Two Values

Key Insights

• Use boolean indexing with comparison operators to filter DataFrame rows between two values, combining conditions with the & operator for precise range selection • The between() method provides cleaner syntax for inclusive range filtering and handles edge cases like missing values more gracefully than manual comparison operators • Query-based selection using query() offers readable SQL-like syntax for complex filtering operations, especially when working with multiple conditions or column name references

Boolean Indexing with Comparison Operators

The most fundamental approach to selecting rows between two values uses boolean indexing with comparison operators. This method gives you complete control over whether endpoints are inclusive or exclusive.

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
    'price': [10, 25, 30, 45, 50, 65, 80],
    'quantity': [100, 150, 200, 120, 180, 90, 110]
})

# Select rows where price is between 25 and 65 (inclusive)
filtered = df[(df['price'] >= 25) & (df['price'] <= 65)]
print(filtered)

  product  price  quantity
1       B     25       150
2       C     30       200
3       D     45       120
4       E     50       180
5       F     65        90

The key requirement is wrapping each condition in parentheses and using & for AND operations. The bitwise operator & has higher precedence than comparison operators, making parentheses mandatory.

For exclusive ranges, adjust the operators:

# Exclusive bounds (25 < price < 65)
filtered_exclusive = df[(df['price'] > 25) & (df['price'] < 65)]
print(filtered_exclusive)

  product  price  quantity
2       C     30       200
3       D     45       120
4       E     50       180

Using the between() Method

The between() method simplifies range filtering with cleaner syntax. By default, it performs inclusive comparisons on both endpoints.

# Inclusive range using between()
filtered_between = df[df['price'].between(25, 65)]
print(filtered_between)

The between() method accepts an inclusive parameter that controls endpoint behavior:

# Exclusive on both ends
filtered_exclusive = df[df['price'].between(25, 65, inclusive='neither')]

# Inclusive on left, exclusive on right
filtered_left = df[df['price'].between(25, 65, inclusive='left')]

# Inclusive on right, exclusive on left
filtered_right = df[df['price'].between(25, 65, inclusive='right')]

print("Neither inclusive:")
print(filtered_exclusive)
print("\nLeft inclusive only:")
print(filtered_left)

Neither inclusive:
  product  price  quantity
2       C     30       200
3       D     45       120
4       E     50       180

Left inclusive only:
  product  price  quantity
1       B     25       150
2       C     30       200
3       D     45       120
4       E     50       180

The between() method handles NaN values intelligently by excluding them from results by default:

df_with_nan = df.copy()
df_with_nan.loc[3, 'price'] = np.nan

filtered = df_with_nan[df_with_nan['price'].between(25, 65)]
print(filtered)
# NaN row is automatically excluded

Query Method for SQL-Like Syntax

The query() method provides readable, SQL-inspired syntax particularly useful for complex conditions or when column names would require awkward bracket notation.

# Basic query syntax
filtered_query = df.query('price >= 25 and price <= 65')
print(filtered_query)

# Equivalent to between()
filtered_query_between = df.query('25 <= price <= 65')
print(filtered_query_between)

The query() method shines when combining multiple conditions:

# Multiple conditions
result = df.query('25 <= price <= 65 and quantity > 100')
print(result)

  product  price  quantity
1       B     25       150
2       C     30       200
3       D     45       120
4       E     50       180

Use variables in queries with the @ symbol:

min_price = 25
max_price = 65
min_quantity = 100

result = df.query('price >= @min_price and price <= @max_price and quantity > @min_quantity')
print(result)

Filtering with DateTime Ranges

Selecting rows between date values is a common requirement. Pandas handles datetime comparisons seamlessly with all three methods.

# Create DataFrame with datetime index
date_range = pd.date_range('2024-01-01', periods=10, freq='D')
df_dates = pd.DataFrame({
    'date': date_range,
    'sales': [100, 150, 200, 180, 220, 190, 210, 230, 195, 205]
})

# Boolean indexing with datetime
start_date = pd.Timestamp('2024-01-03')
end_date = pd.Timestamp('2024-01-07')

filtered_dates = df_dates[(df_dates['date'] >= start_date) & (df_dates['date'] <= end_date)]
print(filtered_dates)

        date  sales
2 2024-01-03    200
3 2024-01-04    180
4 2024-01-05    220
5 2024-01-06    190
6 2024-01-07    210

Using between() with datetime objects:

filtered_between_dates = df_dates[df_dates['date'].between('2024-01-03', '2024-01-07')]
print(filtered_between_dates)

For datetime strings, Pandas automatically converts them:

filtered_string_dates = df_dates[
    df_dates['date'].between('2024-01-03', '2024-01-07')
]

Multiple Column Filtering

Real-world scenarios often require filtering across multiple columns simultaneously.

# Create multi-dimensional dataset
df_multi = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D', 'E', 'F'],
    'price': [10, 25, 30, 45, 50, 65],
    'quantity': [100, 150, 200, 120, 180, 90],
    'rating': [4.2, 4.5, 3.8, 4.7, 4.1, 4.9]
})

# Filter on multiple ranges
filtered_multi = df_multi[
    (df_multi['price'].between(25, 65)) & 
    (df_multi['quantity'].between(100, 180)) &
    (df_multi['rating'] >= 4.0)
]
print(filtered_multi)

  product  price  quantity  rating
1       B     25       150     4.5
3       D     45       120     4.7
4       E     50       180     4.1

Using query() for multiple ranges:

result = df_multi.query('25 <= price <= 65 and 100 <= quantity <= 180 and rating >= 4.0')
print(result)

Performance Considerations

For large DataFrames, method choice impacts performance. Boolean indexing typically performs fastest for simple conditions:

import time

# Create large DataFrame
large_df = pd.DataFrame({
    'value': np.random.randint(0, 1000, 1000000)
})

# Benchmark boolean indexing
start = time.time()
result1 = large_df[(large_df['value'] >= 250) & (large_df['value'] <= 750)]
time1 = time.time() - start

# Benchmark between()
start = time.time()
result2 = large_df[large_df['value'].between(250, 750)]
time2 = time.time() - start

# Benchmark query()
start = time.time()
result3 = large_df.query('250 <= value <= 750')
time3 = time.time() - start

print(f"Boolean indexing: {time1:.4f}s")
print(f"between(): {time2:.4f}s")
print(f"query(): {time3:.4f}s")

Boolean indexing and between() typically perform similarly, while query() may be slightly slower due to string parsing overhead. However, query() can be faster with numexpr engine for complex expressions.

Handling Edge Cases

Always consider NaN values, empty results, and type mismatches:

df_edge = pd.DataFrame({
    'value': [10, 25, np.nan, 45, None, 65]
})

# between() excludes NaN by default
result = df_edge[df_edge['value'].between(20, 50)]
print(f"Rows with values between 20-50: {len(result)}")  # 2 rows

# Boolean indexing with NaN handling
result_bool = df_edge[(df_edge['value'] >= 20) & (df_edge['value'] <= 50)]
print(f"Boolean result: {len(result_bool)}")  # 2 rows

# Check for empty results
if result.empty:
    print("No matching rows found")
else:
    print(f"Found {len(result)} matching rows")

Type consistency matters for accurate comparisons:

df_types = pd.DataFrame({
    'value': ['10', '25', '30', '45']  # String values
})

# This won't work as expected
# filtered = df_types[df_types['value'].between(20, 40)]

# Convert to numeric first
df_types['value'] = pd.to_numeric(df_types['value'])
filtered = df_types[df_types['value'].between(20, 40)]
print(filtered)

Choose boolean indexing for maximum control and performance, between() for cleaner code with standard inclusive ranges, and query() for complex multi-condition filters requiring readable syntax.