How to Use loc in Pandas

Pandas provides two primary indexers for accessing data: `loc` and `iloc`. Understanding the difference between them is fundamental to writing clean, bug-free data manipulation code.

Key Insights

  • loc uses label-based indexing, meaning you reference rows and columns by their names, not their integer positions—this makes your code more readable and less prone to errors when data order changes.
  • The slicing behavior in loc is inclusive on both ends (df.loc['a':'c'] includes ‘c’), which differs from standard Python slicing and catches many developers off guard.
  • Using loc for assignments prevents the dreaded SettingWithCopyWarning and ensures you’re modifying the original DataFrame, not a copy.

Introduction to loc

Pandas provides two primary indexers for accessing data: loc and iloc. Understanding the difference between them is fundamental to writing clean, bug-free data manipulation code.

loc is a label-based indexer. You use it when you want to access rows and columns by their names (index labels and column names). iloc, on the other hand, is position-based—you use integer indices like you would with a Python list.

Here’s the quick mental model:

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
}, index=['a', 'b', 'c'])

# loc uses labels
df.loc['a']  # Gets row with index label 'a'

# iloc uses positions
df.iloc[0]   # Gets first row (position 0)

Use loc when your index has meaningful labels (dates, IDs, names) and you want to reference them explicitly. Use iloc when you need positional access regardless of labels.

Basic Row Selection

The simplest use of loc is selecting rows by their index labels.

Selecting a Single Row

When you pass a single label to loc, you get a Series representing that row:

import pandas as pd

employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
    'salary': [85000, 72000, 91000, 68000]
}, index=['E001', 'E002', 'E003', 'E004'])

# Select single row - returns a Series
alice = employees.loc['E001']
print(alice)
# name           Alice
# department     Engineering
# salary         85000
# Name: E001, dtype: object

Selecting Multiple Rows

Pass a list of labels to select multiple rows. This returns a DataFrame:

# Select multiple rows - returns a DataFrame
subset = employees.loc[['E001', 'E003']]
print(subset)
#       name   department  salary
# E001  Alice  Engineering   85000
# E003  Charlie Engineering  91000

If you want a DataFrame even when selecting a single row, wrap the label in a list:

# Single row as DataFrame (not Series)
single_df = employees.loc[['E001']]

Column Selection with loc

The full signature of loc is df.loc[row_indexer, column_indexer]. The second parameter lets you specify which columns to return.

Selecting All Rows, Specific Columns

Use : as the row indexer to select all rows:

# All rows, single column - returns Series
names = employees.loc[:, 'name']

# All rows, multiple columns - returns DataFrame
contact_info = employees.loc[:, ['name', 'department']]
print(contact_info)
#          name   department
# E001    Alice  Engineering
# E002      Bob    Marketing
# E003  Charlie  Engineering
# E004    Diana        Sales

Selecting Specific Rows and Columns

Combine row and column selection for precise data extraction:

# Specific rows and columns
result = employees.loc[['E001', 'E002'], ['name', 'salary']]
print(result)
#       name  salary
# E001  Alice   85000
# E002    Bob   72000

# Single cell value
alice_salary = employees.loc['E001', 'salary']
print(alice_salary)  # 85000

Slicing with loc

Label-based slicing with loc has one critical difference from standard Python slicing: both endpoints are inclusive.

dates = pd.DataFrame({
    'revenue': [1000, 1200, 1100, 1400, 1300]
}, index=pd.date_range('2024-01-01', periods=5))

# Slice rows - BOTH endpoints included
subset = dates.loc['2024-01-02':'2024-01-04']
print(subset)
#             revenue
# 2024-01-02     1200
# 2024-01-03     1100
# 2024-01-04     1400

This inclusive behavior makes sense for labels. If you ask for data from January 2nd to January 4th, you expect January 4th to be included.

Slicing Rows and Columns Together

You can slice both dimensions simultaneously:

data = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, 16]
}, index=['w', 'x', 'y', 'z'])

# Slice both rows and columns
result = data.loc['x':'z', 'B':'D']
print(result)
#    B   C   D
# x  6  10  14
# y  7  11  15
# z  8  12  16

Conditional Selection (Boolean Indexing)

This is where loc becomes genuinely powerful. You can pass boolean conditions to filter rows dynamically.

Single Condition

# Employees earning more than 75000
high_earners = employees.loc[employees['salary'] > 75000]
print(high_earners)
#          name   department  salary
# E001    Alice  Engineering   85000
# E003  Charlie  Engineering   91000

Multiple Conditions

Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses:

# Engineering employees earning more than 80000
senior_engineers = employees.loc[
    (employees['department'] == 'Engineering') & 
    (employees['salary'] > 80000)
]

# Employees in Engineering OR Sales
eng_or_sales = employees.loc[
    (employees['department'] == 'Engineering') | 
    (employees['department'] == 'Sales')
]

# Using isin() for multiple values
target_depts = employees.loc[
    employees['department'].isin(['Engineering', 'Sales'])
]

Combining Conditions with Column Selection

Filter rows and select specific columns in one operation:

# Names of high earners only
high_earner_names = employees.loc[
    employees['salary'] > 75000, 
    'name'
]
print(high_earner_names)
# E001      Alice
# E003    Charlie
# Name: name, dtype: object

Modifying Data with loc

loc isn’t just for reading data—it’s the preferred way to modify DataFrame values.

Updating Single Values

# Give Alice a raise
employees.loc['E001', 'salary'] = 90000

Updating Based on Conditions

# 10% raise for all Engineering employees
mask = employees['department'] == 'Engineering'
employees.loc[mask, 'salary'] = employees.loc[mask, 'salary'] * 1.10

# Set a flag column based on condition
employees['high_earner'] = False
employees.loc[employees['salary'] > 80000, 'high_earner'] = True

Updating Multiple Columns

# Update multiple columns for a specific row
employees.loc['E002', ['department', 'salary']] = ['Engineering', 78000]

# Update multiple columns based on condition
employees.loc[
    employees['department'] == 'Sales', 
    ['salary', 'high_earner']
] = [70000, False]

Common Pitfalls and Best Practices

Avoid Chained Indexing

Chained indexing is when you use multiple bracket operations in sequence. It’s unpredictable and triggers warnings:

# WRONG - chained indexing (may not modify original)
df[df['age'] > 30]['salary'] = 100000  # SettingWithCopyWarning

# RIGHT - use loc
df.loc[df['age'] > 30, 'salary'] = 100000

The problem with chained indexing is that the first operation might return a copy of the data, not a view. When you then try to modify it, you’re modifying the copy, not the original DataFrame.

Be Explicit About Copies

When you do want a copy, be explicit:

# Create an explicit copy to work with
subset = df.loc[df['status'] == 'active'].copy()
subset['processed'] = True  # Safe - we know it's a copy

Handle Missing Labels Gracefully

loc raises KeyError if a label doesn’t exist. Handle this when working with dynamic labels:

# Check if label exists first
if 'E005' in employees.index:
    data = employees.loc['E005']

# Or use reindex for safe access (returns NaN for missing)
safe_data = employees.reindex(['E001', 'E005'])

Performance Considerations

For large DataFrames, loc with boolean indexing is efficient because Pandas optimizes these operations internally. However, avoid calling loc in loops:

# SLOW - calling loc repeatedly
for idx in indices:
    df.loc[idx, 'value'] = compute_value(idx)

# FASTER - vectorized operation
df.loc[indices, 'value'] = df.loc[indices].apply(compute_value)

# FASTEST - truly vectorized when possible
df.loc[indices, 'value'] = df.loc[indices, 'other_col'] * 2

Use loc Even When It Seems Optional

Sometimes df['column'] works fine, but using loc makes your intent explicit:

# Works, but ambiguous
df['new_col'] = values

# Clearer intent
df.loc[:, 'new_col'] = values

This explicitness pays off when debugging and when others read your code.

loc is one of those Pandas features that seems simple but reveals depth as you use it. Master it, and you’ll write cleaner, faster, and more maintainable data manipulation code.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.