Pandas - Select Rows by Label (loc)
• The `loc` indexer selects rows and columns by label-based indexing, making it essential for working with labeled data in pandas DataFrames where you need explicit, readable selections based on...
Key Insights
• The loc indexer selects rows and columns by label-based indexing, making it essential for working with labeled data in pandas DataFrames where you need explicit, readable selections based on index names or column labels.
• Unlike iloc which uses integer positions, loc is inclusive on both endpoints when slicing, meaning df.loc[1:3] includes rows with labels 1, 2, AND 3, which differs from standard Python slicing behavior.
• loc accepts boolean arrays, callable functions, and various combinations of row/column specifications, enabling complex filtering operations while maintaining code clarity and avoiding chained indexing warnings.
Basic Row Selection with loc
The loc indexer retrieves rows based on their index labels. For DataFrames with default integer indices, this appears similar to positional indexing, but the behavior differs significantly.
import pandas as pd
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [1200, 25, 75, 350, 80],
'stock': [15, 150, 45, 30, 60]
})
# Select single row by label (returns Series)
row = df.loc[2]
print(row)
# product Keyboard
# price 75
# stock 45
# Select multiple rows by label list (returns DataFrame)
rows = df.loc[[0, 2, 4]]
print(rows)
# product price stock
# 0 Laptop 1200 15
# 2 Keyboard 75 45
# 4 Webcam 80 60
Working with Custom Index Labels
The power of loc becomes evident when working with non-integer indices where label-based selection provides semantic meaning.
df_indexed = pd.DataFrame({
'sales': [45000, 52000, 48000, 61000, 58000],
'expenses': [32000, 35000, 33000, 40000, 38000],
'profit': [13000, 17000, 15000, 21000, 20000]
}, index=['Q1-2023', 'Q2-2023', 'Q3-2023', 'Q4-2023', 'Q1-2024'])
# Select specific quarter
q4_data = df_indexed.loc['Q4-2023']
print(q4_data)
# sales 61000
# expenses 40000
# profit 21000
# Select multiple quarters
quarters = df_indexed.loc[['Q1-2023', 'Q4-2023', 'Q1-2024']]
print(quarters)
# sales expenses profit
# Q1-2023 45000 32000 13000
# Q4-2023 61000 40000 21000
# Q1-2024 58000 38000 20000
Slicing Rows with loc
Slicing with loc is inclusive on both ends, unlike standard Python slicing. This is critical to understand when working with labeled data.
# Inclusive slicing with loc
slice_result = df_indexed.loc['Q2-2023':'Q4-2023']
print(slice_result)
# sales expenses profit
# Q2-2023 52000 35000 17000
# Q3-2023 48000 33000 15000
# Q4-2023 61000 40000 21000 # Q4-2023 IS included
# With integer index, still inclusive
df_int = pd.DataFrame({'value': [10, 20, 30, 40, 50]})
print(df_int.loc[1:3])
# value
# 1 20
# 2 30
# 3 40 # Index 3 IS included
Boolean Indexing with loc
loc excels at filtering rows based on conditions. This approach is more explicit and avoids SettingWithCopyWarning issues.
inventory = pd.DataFrame({
'item': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headset'],
'price': [1200, 25, 75, 350, 80, 120],
'stock': [5, 150, 45, 3, 60, 25],
'category': ['Electronics', 'Accessories', 'Accessories',
'Electronics', 'Electronics', 'Accessories']
})
# Single condition
low_stock = inventory.loc[inventory['stock'] < 10]
print(low_stock)
# item price stock category
# 0 Laptop 1200 5 Electronics
# 3 Monitor 350 3 Electronics
# Multiple conditions with & (and) or | (or)
expensive_electronics = inventory.loc[
(inventory['price'] > 100) & (inventory['category'] == 'Electronics')
]
print(expensive_electronics)
# item price stock category
# 0 Laptop 1200 5 Electronics
# 3 Monitor 350 3 Electronics
# Using isin() for multiple values
accessories = inventory.loc[inventory['item'].isin(['Mouse', 'Keyboard', 'Headset'])]
print(accessories)
# item price stock category
# 1 Mouse 25 150 Accessories
# 2 Keyboard 75 45 Accessories
# 5 Headset 120 25 Accessories
Selecting Rows and Columns Simultaneously
loc accepts both row and column specifications, separated by a comma. This enables precise data extraction.
sales_data = pd.DataFrame({
'region': ['North', 'South', 'East', 'West', 'Central'],
'q1': [45000, 52000, 48000, 61000, 58000],
'q2': [47000, 54000, 51000, 63000, 60000],
'q3': [50000, 56000, 49000, 65000, 62000],
'q4': [53000, 59000, 52000, 68000, 65000]
}, index=['North', 'South', 'East', 'West', 'Central'])
# Specific rows and columns
result = sales_data.loc[['North', 'West'], ['q1', 'q4']]
print(result)
# q1 q4
# North 45000 53000
# West 61000 68000
# Row slice with specific columns
result = sales_data.loc['South':'West', ['region', 'q2', 'q3']]
print(result)
# region q2 q3
# South South 54000 56000
# East East 51000 49000
# West West 63000 65000
# All rows, specific columns
q1_q2 = sales_data.loc[:, ['q1', 'q2']]
print(q1_q2)
# q1 q2
# North 45000 47000
# South 52000 54000
# East 48000 51000
# West 61000 63000
# Central 58000 60000
Using Callable Functions with loc
loc accepts callable functions that receive the DataFrame and return valid indexing objects, enabling dynamic selection logic.
products = pd.DataFrame({
'name': ['Widget A', 'Widget B', 'Gadget X', 'Gadget Y', 'Tool Z'],
'price': [15.99, 25.50, 45.00, 38.75, 52.00],
'rating': [4.2, 4.8, 3.9, 4.5, 4.7],
'reviews': [120, 340, 89, 210, 156]
})
# Callable returning boolean array
highly_rated = products.loc[lambda df: df['rating'] >= 4.5]
print(highly_rated)
# name price rating reviews
# 1 Widget B 25.5 4.8 340
# 3 Gadget Y 38.75 4.5 210
# 4 Tool Z 52.00 4.7 156
# Callable with complex logic
popular_affordable = products.loc[
lambda df: (df['reviews'] > 150) & (df['price'] < 50)
]
print(popular_affordable)
# name price rating reviews
# 1 Widget B 25.5 4.8 340
# 3 Gadget Y 38.75 4.5 210
Modifying Data with loc
Use loc to modify values in-place, avoiding chained assignment issues that trigger SettingWithCopyWarning.
inventory = pd.DataFrame({
'product': ['A', 'B', 'C', 'D', 'E'],
'stock': [10, 25, 5, 30, 15],
'price': [100, 150, 200, 120, 180]
})
# Modify single value
inventory.loc[2, 'stock'] = 20
print(inventory.loc[2])
# product C
# stock 20
# price 200
# Modify multiple values with boolean condition
inventory.loc[inventory['stock'] < 15, 'stock'] = 15
print(inventory)
# product stock price
# 0 A 15 100
# 1 B 25 150
# 2 C 20 200
# 3 D 30 120
# 4 E 15 180
# Modify entire column for specific rows
inventory.loc[inventory['price'] > 150, 'price'] *= 0.9 # 10% discount
print(inventory)
# product stock price
# 0 A 15 100.0
# 1 B 25 150.0
# 2 C 20 180.0
# 3 D 30 120.0
# 4 E 15 162.0
Working with MultiIndex
loc handles hierarchical indices using tuples or slices, providing precise access to multi-level indexed data.
arrays = [
['Store1', 'Store1', 'Store2', 'Store2'],
['Electronics', 'Furniture', 'Electronics', 'Furniture']
]
multi_idx = pd.MultiIndex.from_arrays(arrays, names=['store', 'department'])
sales = pd.DataFrame({
'revenue': [50000, 30000, 45000, 28000],
'units': [250, 80, 220, 75]
}, index=multi_idx)
# Select by tuple
store1_electronics = sales.loc[('Store1', 'Electronics')]
print(store1_electronics)
# revenue 50000
# units 250
# Select all departments in Store2
store2_all = sales.loc['Store2']
print(store2_all)
# revenue units
# department
# Electronics 45000 220
# Furniture 28000 75
# Cross-section with xs() or tuple slicing
electronics_all = sales.loc[(slice(None), 'Electronics'), :]
print(electronics_all)
# revenue units
# store department
# Store1 Electronics 50000 250
# Store2 Electronics 45000 220
The loc indexer provides explicit, label-based data access that makes code more readable and maintainable. Its inclusive slicing behavior, boolean indexing capabilities, and support for simultaneous row-column selection make it the preferred method for most DataFrame selection operations where you’re working with meaningful labels rather than positional indices.