How to Use loc in Pandas
Pandas provides two primary indexers for accessing data: `loc` and `iloc`. Understanding the difference between them is fundamental to writing clean, bug-free data manipulation code.
Key Insights
locuses label-based indexing, meaning you reference rows and columns by their names, not their integer positions—this makes your code more readable and less prone to errors when data order changes.- The slicing behavior in
locis inclusive on both ends (df.loc['a':'c']includes ‘c’), which differs from standard Python slicing and catches many developers off guard. - Using
locfor assignments prevents the dreadedSettingWithCopyWarningand ensures you’re modifying the original DataFrame, not a copy.
Introduction to loc
Pandas provides two primary indexers for accessing data: loc and iloc. Understanding the difference between them is fundamental to writing clean, bug-free data manipulation code.
loc is a label-based indexer. You use it when you want to access rows and columns by their names (index labels and column names). iloc, on the other hand, is position-based—you use integer indices like you would with a Python list.
Here’s the quick mental model:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['NYC', 'LA', 'Chicago']
}, index=['a', 'b', 'c'])
# loc uses labels
df.loc['a'] # Gets row with index label 'a'
# iloc uses positions
df.iloc[0] # Gets first row (position 0)
Use loc when your index has meaningful labels (dates, IDs, names) and you want to reference them explicitly. Use iloc when you need positional access regardless of labels.
Basic Row Selection
The simplest use of loc is selecting rows by their index labels.
Selecting a Single Row
When you pass a single label to loc, you get a Series representing that row:
import pandas as pd
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
'salary': [85000, 72000, 91000, 68000]
}, index=['E001', 'E002', 'E003', 'E004'])
# Select single row - returns a Series
alice = employees.loc['E001']
print(alice)
# name Alice
# department Engineering
# salary 85000
# Name: E001, dtype: object
Selecting Multiple Rows
Pass a list of labels to select multiple rows. This returns a DataFrame:
# Select multiple rows - returns a DataFrame
subset = employees.loc[['E001', 'E003']]
print(subset)
# name department salary
# E001 Alice Engineering 85000
# E003 Charlie Engineering 91000
If you want a DataFrame even when selecting a single row, wrap the label in a list:
# Single row as DataFrame (not Series)
single_df = employees.loc[['E001']]
Column Selection with loc
The full signature of loc is df.loc[row_indexer, column_indexer]. The second parameter lets you specify which columns to return.
Selecting All Rows, Specific Columns
Use : as the row indexer to select all rows:
# All rows, single column - returns Series
names = employees.loc[:, 'name']
# All rows, multiple columns - returns DataFrame
contact_info = employees.loc[:, ['name', 'department']]
print(contact_info)
# name department
# E001 Alice Engineering
# E002 Bob Marketing
# E003 Charlie Engineering
# E004 Diana Sales
Selecting Specific Rows and Columns
Combine row and column selection for precise data extraction:
# Specific rows and columns
result = employees.loc[['E001', 'E002'], ['name', 'salary']]
print(result)
# name salary
# E001 Alice 85000
# E002 Bob 72000
# Single cell value
alice_salary = employees.loc['E001', 'salary']
print(alice_salary) # 85000
Slicing with loc
Label-based slicing with loc has one critical difference from standard Python slicing: both endpoints are inclusive.
dates = pd.DataFrame({
'revenue': [1000, 1200, 1100, 1400, 1300]
}, index=pd.date_range('2024-01-01', periods=5))
# Slice rows - BOTH endpoints included
subset = dates.loc['2024-01-02':'2024-01-04']
print(subset)
# revenue
# 2024-01-02 1200
# 2024-01-03 1100
# 2024-01-04 1400
This inclusive behavior makes sense for labels. If you ask for data from January 2nd to January 4th, you expect January 4th to be included.
Slicing Rows and Columns Together
You can slice both dimensions simultaneously:
data = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16]
}, index=['w', 'x', 'y', 'z'])
# Slice both rows and columns
result = data.loc['x':'z', 'B':'D']
print(result)
# B C D
# x 6 10 14
# y 7 11 15
# z 8 12 16
Conditional Selection (Boolean Indexing)
This is where loc becomes genuinely powerful. You can pass boolean conditions to filter rows dynamically.
Single Condition
# Employees earning more than 75000
high_earners = employees.loc[employees['salary'] > 75000]
print(high_earners)
# name department salary
# E001 Alice Engineering 85000
# E003 Charlie Engineering 91000
Multiple Conditions
Combine conditions using & (and), | (or), and ~ (not). Each condition must be wrapped in parentheses:
# Engineering employees earning more than 80000
senior_engineers = employees.loc[
(employees['department'] == 'Engineering') &
(employees['salary'] > 80000)
]
# Employees in Engineering OR Sales
eng_or_sales = employees.loc[
(employees['department'] == 'Engineering') |
(employees['department'] == 'Sales')
]
# Using isin() for multiple values
target_depts = employees.loc[
employees['department'].isin(['Engineering', 'Sales'])
]
Combining Conditions with Column Selection
Filter rows and select specific columns in one operation:
# Names of high earners only
high_earner_names = employees.loc[
employees['salary'] > 75000,
'name'
]
print(high_earner_names)
# E001 Alice
# E003 Charlie
# Name: name, dtype: object
Modifying Data with loc
loc isn’t just for reading data—it’s the preferred way to modify DataFrame values.
Updating Single Values
# Give Alice a raise
employees.loc['E001', 'salary'] = 90000
Updating Based on Conditions
# 10% raise for all Engineering employees
mask = employees['department'] == 'Engineering'
employees.loc[mask, 'salary'] = employees.loc[mask, 'salary'] * 1.10
# Set a flag column based on condition
employees['high_earner'] = False
employees.loc[employees['salary'] > 80000, 'high_earner'] = True
Updating Multiple Columns
# Update multiple columns for a specific row
employees.loc['E002', ['department', 'salary']] = ['Engineering', 78000]
# Update multiple columns based on condition
employees.loc[
employees['department'] == 'Sales',
['salary', 'high_earner']
] = [70000, False]
Common Pitfalls and Best Practices
Avoid Chained Indexing
Chained indexing is when you use multiple bracket operations in sequence. It’s unpredictable and triggers warnings:
# WRONG - chained indexing (may not modify original)
df[df['age'] > 30]['salary'] = 100000 # SettingWithCopyWarning
# RIGHT - use loc
df.loc[df['age'] > 30, 'salary'] = 100000
The problem with chained indexing is that the first operation might return a copy of the data, not a view. When you then try to modify it, you’re modifying the copy, not the original DataFrame.
Be Explicit About Copies
When you do want a copy, be explicit:
# Create an explicit copy to work with
subset = df.loc[df['status'] == 'active'].copy()
subset['processed'] = True # Safe - we know it's a copy
Handle Missing Labels Gracefully
loc raises KeyError if a label doesn’t exist. Handle this when working with dynamic labels:
# Check if label exists first
if 'E005' in employees.index:
data = employees.loc['E005']
# Or use reindex for safe access (returns NaN for missing)
safe_data = employees.reindex(['E001', 'E005'])
Performance Considerations
For large DataFrames, loc with boolean indexing is efficient because Pandas optimizes these operations internally. However, avoid calling loc in loops:
# SLOW - calling loc repeatedly
for idx in indices:
df.loc[idx, 'value'] = compute_value(idx)
# FASTER - vectorized operation
df.loc[indices, 'value'] = df.loc[indices].apply(compute_value)
# FASTEST - truly vectorized when possible
df.loc[indices, 'value'] = df.loc[indices, 'other_col'] * 2
Use loc Even When It Seems Optional
Sometimes df['column'] works fine, but using loc makes your intent explicit:
# Works, but ambiguous
df['new_col'] = values
# Clearer intent
df.loc[:, 'new_col'] = values
This explicitness pays off when debugging and when others read your code.
loc is one of those Pandas features that seems simple but reveals depth as you use it. Master it, and you’ll write cleaner, faster, and more maintainable data manipulation code.