Pandas - str.pad()/zfill() - Pad Strings
• `str.pad()` offers flexible string padding with configurable width, side (left/right/both), and fillchar parameters, while `str.zfill()` specializes in zero-padding numbers with sign-aware behavior
Key Insights
• str.pad() offers flexible string padding with configurable width, side (left/right/both), and fillchar parameters, while str.zfill() specializes in zero-padding numbers with sign-aware behavior
• Both methods operate on Series objects and handle missing values gracefully, returning NaN for null entries without raising errors
• Choose str.zfill() for numeric formatting and ID generation; use str.pad() when you need custom fill characters, alignment control, or text formatting
Understanding String Padding in Pandas
String padding adds characters to reach a specified width. Pandas provides two primary methods: str.pad() for general-purpose padding and str.zfill() for zero-padding numeric strings. Both methods work exclusively on Series objects through the .str accessor.
import pandas as pd
import numpy as np
# Create sample data
df = pd.DataFrame({
'product_id': ['1', '42', '305', '1024'],
'amount': ['5.99', '-12.50', '100', '-0.45'],
'status': ['OK', 'FAIL', 'PENDING', 'OK']
})
print(df)
product_id amount status
0 1 5.99 OK
1 42 -12.50 FAIL
2 305 100 PENDING
3 1024 -0.45 OK
Using str.zfill() for Numeric Padding
str.zfill() pads strings with zeros on the left. It handles signs intelligently by placing zeros after the sign character, making it ideal for numeric data.
# Pad product IDs to 6 digits
df['product_id_padded'] = df['product_id'].str.zfill(6)
print(df[['product_id', 'product_id_padded']])
product_id product_id_padded
0 1 000001
1 42 000042
2 305 000305
3 1024 001024
The method preserves signs in numeric strings:
# Zero-fill amounts with signs
df['amount_padded'] = df['amount'].str.zfill(8)
print(df[['amount', 'amount_padded']])
amount amount_padded
0 5.99 00005.99
1 -12.50 -0012.50
2 100 00000100
3 -0.45 -0000.45
Notice how negative signs remain at the front, with zeros inserted after them. This behavior makes str.zfill() superior to simple padding for financial or scientific data.
Using str.pad() for Flexible Padding
str.pad() provides complete control over padding direction and fill characters. The signature: str.pad(width, side='left', fillchar=' ').
# Left padding with zeros (similar to zfill)
df['status_left'] = df['status'].str.pad(10, side='left', fillchar='0')
# Right padding with spaces
df['status_right'] = df['status'].str.pad(10, side='right', fillchar=' ')
# Center padding with dashes
df['status_center'] = df['status'].str.pad(10, side='both', fillchar='-')
print(df[['status', 'status_left', 'status_right', 'status_center']])
status status_left status_right status_center
0 OK 00000000OK OK ----OK----
1 FAIL 000000FAIL FAIL ---FAIL---
2 PENDING 000PENDING PENDING -PENDING--
3 OK 00000000OK OK ----OK----
When using side='both', Pandas adds extra padding to the right if the total padding is odd.
Practical Use Cases
Database ID Formatting
Many systems require fixed-width identifiers for sorting or display purposes:
# Simulate database records with varying ID lengths
records = pd.DataFrame({
'user_id': ['5', '123', '4567', '89'],
'order_id': ['A1', 'B234', 'C56789', 'D0'],
})
# Format for display in reports
records['user_id_fmt'] = records['user_id'].str.zfill(6)
records['order_id_fmt'] = records['order_id'].str.pad(8, side='left', fillchar='0')
print(records)
user_id order_id user_id_fmt order_id_fmt
0 5 A1 000005 00000A1
1 123 B234 000123 0000B234
2 4567 C56789 004567 00C56789
3 89 D0 000089 00000D0
Creating Fixed-Width File Formats
Fixed-width formats remain common in legacy systems and data interchange:
# Generate fixed-width records
transactions = pd.DataFrame({
'account': ['12345', '67890', '11111'],
'amount': ['1250.50', '75.00', '10000.99'],
'type': ['CR', 'DR', 'CR']
})
# Create fixed-width format: account(10) + amount(12) + type(4)
transactions['fixed_width'] = (
transactions['account'].str.pad(10, side='right', fillchar=' ') +
transactions['amount'].str.pad(12, side='left', fillchar='0') +
transactions['type'].str.pad(4, side='right', fillchar=' ')
)
print(transactions['fixed_width'])
0 12345 000001250.50CR
1 67890 000000075.00DR
2 11111 000010000.99CR
Aligning Text Output
Create aligned columnar text for logs or reports:
# Log entries with aligned components
logs = pd.DataFrame({
'level': ['INFO', 'WARNING', 'ERROR', 'DEBUG'],
'module': ['auth', 'db', 'api', 'cache'],
'message': ['User login', 'Connection slow', 'Timeout', 'Cache miss']
})
logs['formatted'] = (
logs['level'].str.pad(8, side='right', fillchar=' ') +
logs['module'].str.pad(10, side='right', fillchar=' ') +
logs['message']
)
print(logs['formatted'])
0 INFO auth User login
1 WARNING db Connection slow
2 ERROR api Timeout
3 DEBUG cache Cache miss
Handling Missing Values
Both methods handle NaN values without errors, preserving them in the output:
# Data with missing values
data = pd.Series(['123', None, '45', np.nan, '6'])
print("Original:", data.tolist())
print("zfill(5):", data.str.zfill(5).tolist())
print("pad(5):", data.str.pad(5, fillchar='0').tolist())
Original: ['123', None, '45', nan, '6']
zfill(5): ['00123', None, '00045', nan, '00006']
pad(5): ['00123', None, '00045', nan, '00006']
Performance Considerations
For large datasets, both methods perform efficiently through vectorized operations:
import time
# Create large dataset
large_series = pd.Series([str(i) for i in range(1000000)])
# Benchmark zfill
start = time.time()
result_zfill = large_series.str.zfill(10)
print(f"zfill: {time.time() - start:.3f}s")
# Benchmark pad
start = time.time()
result_pad = large_series.str.pad(10, fillchar='0')
print(f"pad: {time.time() - start:.3f}s")
str.zfill() typically performs slightly faster than str.pad() due to its specialized implementation, but both complete in comparable time for most workloads.
Key Differences Between Methods
str.zfill() differs from str.pad(width, side='left', fillchar='0') in sign handling:
# Compare sign handling
numbers = pd.Series(['-42', '+100', '-5'])
print("zfill(6):")
print(numbers.str.zfill(6))
print("\npad(6) with '0':")
print(numbers.str.pad(6, side='left', fillchar='0'))
zfill(6):
0 -00042
1 +00100
2 -00005
pad(6) with '0':
0 000-42
1 00+100
2 0000-5
Use str.zfill() when working with signed numeric strings. Use str.pad() for everything else or when you need right/center alignment.
Chaining with Other String Methods
Combine padding with other string operations for complex transformations:
# Clean and format product codes
products = pd.Series([' abc-123 ', 'DEF-45', 'ghi-6789'])
formatted = (
products
.str.strip() # Remove whitespace
.str.upper() # Uppercase
.str.replace('-', '') # Remove hyphens
.str.zfill(10) # Pad to 10 characters
)
print(formatted)
0 000ABC123
1 0000DEF45
2 00GHI6789
This approach maintains readability while applying multiple transformations in a single pipeline.