Pandas - Select Rows by Label (loc)

• The `loc` indexer selects rows and columns by label-based indexing, making it essential for working with labeled data in pandas DataFrames where you need explicit, readable selections based on...

Key Insights

• The loc indexer selects rows and columns by label-based indexing, making it essential for working with labeled data in pandas DataFrames where you need explicit, readable selections based on index names or column labels. • Unlike iloc which uses integer positions, loc is inclusive on both endpoints when slicing, meaning df.loc[1:3] includes rows with labels 1, 2, AND 3, which differs from standard Python slicing behavior. • loc accepts boolean arrays, callable functions, and various combinations of row/column specifications, enabling complex filtering operations while maintaining code clarity and avoiding chained indexing warnings.

Basic Row Selection with loc

The loc indexer retrieves rows based on their index labels. For DataFrames with default integer indices, this appears similar to positional indexing, but the behavior differs significantly.

import pandas as pd

df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'price': [1200, 25, 75, 350, 80],
    'stock': [15, 150, 45, 30, 60]
})

# Select single row by label (returns Series)
row = df.loc[2]
print(row)
# product     Keyboard
# price             75
# stock             45

# Select multiple rows by label list (returns DataFrame)
rows = df.loc[[0, 2, 4]]
print(rows)
#     product  price  stock
# 0    Laptop   1200     15
# 2  Keyboard     75     45
# 4    Webcam     80     60

Working with Custom Index Labels

The power of loc becomes evident when working with non-integer indices where label-based selection provides semantic meaning.

df_indexed = pd.DataFrame({
    'sales': [45000, 52000, 48000, 61000, 58000],
    'expenses': [32000, 35000, 33000, 40000, 38000],
    'profit': [13000, 17000, 15000, 21000, 20000]
}, index=['Q1-2023', 'Q2-2023', 'Q3-2023', 'Q4-2023', 'Q1-2024'])

# Select specific quarter
q4_data = df_indexed.loc['Q4-2023']
print(q4_data)
# sales       61000
# expenses    40000
# profit      21000

# Select multiple quarters
quarters = df_indexed.loc[['Q1-2023', 'Q4-2023', 'Q1-2024']]
print(quarters)
#          sales  expenses  profit
# Q1-2023  45000     32000   13000
# Q4-2023  61000     40000   21000
# Q1-2024  58000     38000   20000

Slicing Rows with loc

Slicing with loc is inclusive on both ends, unlike standard Python slicing. This is critical to understand when working with labeled data.

# Inclusive slicing with loc
slice_result = df_indexed.loc['Q2-2023':'Q4-2023']
print(slice_result)
#          sales  expenses  profit
# Q2-2023  52000     35000   17000
# Q3-2023  48000     33000   15000
# Q4-2023  61000     40000   21000  # Q4-2023 IS included

# With integer index, still inclusive
df_int = pd.DataFrame({'value': [10, 20, 30, 40, 50]})
print(df_int.loc[1:3])
#    value
# 1     20
# 2     30
# 3     40  # Index 3 IS included

Boolean Indexing with loc

loc excels at filtering rows based on conditions. This approach is more explicit and avoids SettingWithCopyWarning issues.

inventory = pd.DataFrame({
    'item': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headset'],
    'price': [1200, 25, 75, 350, 80, 120],
    'stock': [5, 150, 45, 3, 60, 25],
    'category': ['Electronics', 'Accessories', 'Accessories', 
                 'Electronics', 'Electronics', 'Accessories']
})

# Single condition
low_stock = inventory.loc[inventory['stock'] < 10]
print(low_stock)
#       item  price  stock     category
# 0   Laptop   1200      5  Electronics
# 3  Monitor    350      3  Electronics

# Multiple conditions with & (and) or | (or)
expensive_electronics = inventory.loc[
    (inventory['price'] > 100) & (inventory['category'] == 'Electronics')
]
print(expensive_electronics)
#       item  price  stock     category
# 0   Laptop   1200      5  Electronics
# 3  Monitor    350      3  Electronics

# Using isin() for multiple values
accessories = inventory.loc[inventory['item'].isin(['Mouse', 'Keyboard', 'Headset'])]
print(accessories)
#        item  price  stock      category
# 1     Mouse     25    150  Accessories
# 2  Keyboard     75     45  Accessories
# 5   Headset    120     25  Accessories

Selecting Rows and Columns Simultaneously

loc accepts both row and column specifications, separated by a comma. This enables precise data extraction.

sales_data = pd.DataFrame({
    'region': ['North', 'South', 'East', 'West', 'Central'],
    'q1': [45000, 52000, 48000, 61000, 58000],
    'q2': [47000, 54000, 51000, 63000, 60000],
    'q3': [50000, 56000, 49000, 65000, 62000],
    'q4': [53000, 59000, 52000, 68000, 65000]
}, index=['North', 'South', 'East', 'West', 'Central'])

# Specific rows and columns
result = sales_data.loc[['North', 'West'], ['q1', 'q4']]
print(result)
#           q1     q4
# North  45000  53000
# West   61000  68000

# Row slice with specific columns
result = sales_data.loc['South':'West', ['region', 'q2', 'q3']]
print(result)
#        region     q2     q3
# South   South  54000  56000
# East     East  51000  49000
# West     West  63000  65000

# All rows, specific columns
q1_q2 = sales_data.loc[:, ['q1', 'q2']]
print(q1_q2)
#             q1     q2
# North    45000  47000
# South    52000  54000
# East     48000  51000
# West     61000  63000
# Central  58000  60000

Using Callable Functions with loc

loc accepts callable functions that receive the DataFrame and return valid indexing objects, enabling dynamic selection logic.

products = pd.DataFrame({
    'name': ['Widget A', 'Widget B', 'Gadget X', 'Gadget Y', 'Tool Z'],
    'price': [15.99, 25.50, 45.00, 38.75, 52.00],
    'rating': [4.2, 4.8, 3.9, 4.5, 4.7],
    'reviews': [120, 340, 89, 210, 156]
})

# Callable returning boolean array
highly_rated = products.loc[lambda df: df['rating'] >= 4.5]
print(highly_rated)
#        name  price  rating  reviews
# 1  Widget B   25.5     4.8      340
# 3  Gadget Y  38.75     4.5      210
# 4    Tool Z  52.00     4.7      156

# Callable with complex logic
popular_affordable = products.loc[
    lambda df: (df['reviews'] > 150) & (df['price'] < 50)
]
print(popular_affordable)
#        name  price  rating  reviews
# 1  Widget B   25.5     4.8      340
# 3  Gadget Y  38.75     4.5      210

Modifying Data with loc

Use loc to modify values in-place, avoiding chained assignment issues that trigger SettingWithCopyWarning.

inventory = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D', 'E'],
    'stock': [10, 25, 5, 30, 15],
    'price': [100, 150, 200, 120, 180]
})

# Modify single value
inventory.loc[2, 'stock'] = 20
print(inventory.loc[2])
# product      C
# stock       20
# price      200

# Modify multiple values with boolean condition
inventory.loc[inventory['stock'] < 15, 'stock'] = 15
print(inventory)
#   product  stock  price
# 0       A     15    100
# 1       B     25    150
# 2       C     20    200
# 3       D     30    120
# 4       E     15    180

# Modify entire column for specific rows
inventory.loc[inventory['price'] > 150, 'price'] *= 0.9  # 10% discount
print(inventory)
#   product  stock  price
# 0       A     15  100.0
# 1       B     25  150.0
# 2       C     20  180.0
# 3       D     30  120.0
# 4       E     15  162.0

Working with MultiIndex

loc handles hierarchical indices using tuples or slices, providing precise access to multi-level indexed data.

arrays = [
    ['Store1', 'Store1', 'Store2', 'Store2'],
    ['Electronics', 'Furniture', 'Electronics', 'Furniture']
]
multi_idx = pd.MultiIndex.from_arrays(arrays, names=['store', 'department'])

sales = pd.DataFrame({
    'revenue': [50000, 30000, 45000, 28000],
    'units': [250, 80, 220, 75]
}, index=multi_idx)

# Select by tuple
store1_electronics = sales.loc[('Store1', 'Electronics')]
print(store1_electronics)
# revenue    50000
# units        250

# Select all departments in Store2
store2_all = sales.loc['Store2']
print(store2_all)
#                    revenue  units
# department                       
# Electronics          45000    220
# Furniture            28000     75

# Cross-section with xs() or tuple slicing
electronics_all = sales.loc[(slice(None), 'Electronics'), :]
print(electronics_all)
#                           revenue  units
# store   department                      
# Store1  Electronics         50000    250
# Store2  Electronics         45000    220

The loc indexer provides explicit, label-based data access that makes code more readable and maintainable. Its inclusive slicing behavior, boolean indexing capabilities, and support for simultaneous row-column selection make it the preferred method for most DataFrame selection operations where you’re working with meaningful labels rather than positional indices.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.