Pandas - str.pad()/zfill() - Pad Strings

• `str.pad()` offers flexible string padding with configurable width, side (left/right/both), and fillchar parameters, while `str.zfill()` specializes in zero-padding numbers with sign-aware behavior

Key Insights

str.pad() offers flexible string padding with configurable width, side (left/right/both), and fillchar parameters, while str.zfill() specializes in zero-padding numbers with sign-aware behavior • Both methods operate on Series objects and handle missing values gracefully, returning NaN for null entries without raising errors • Choose str.zfill() for numeric formatting and ID generation; use str.pad() when you need custom fill characters, alignment control, or text formatting

Understanding String Padding in Pandas

String padding adds characters to reach a specified width. Pandas provides two primary methods: str.pad() for general-purpose padding and str.zfill() for zero-padding numeric strings. Both methods work exclusively on Series objects through the .str accessor.

import pandas as pd
import numpy as np

# Create sample data
df = pd.DataFrame({
    'product_id': ['1', '42', '305', '1024'],
    'amount': ['5.99', '-12.50', '100', '-0.45'],
    'status': ['OK', 'FAIL', 'PENDING', 'OK']
})

print(df)
  product_id  amount   status
0          1    5.99       OK
1         42  -12.50     FAIL
2        305     100  PENDING
3       1024   -0.45       OK

Using str.zfill() for Numeric Padding

str.zfill() pads strings with zeros on the left. It handles signs intelligently by placing zeros after the sign character, making it ideal for numeric data.

# Pad product IDs to 6 digits
df['product_id_padded'] = df['product_id'].str.zfill(6)

print(df[['product_id', 'product_id_padded']])
  product_id product_id_padded
0          1            000001
1         42            000042
2        305            000305
3       1024            001024

The method preserves signs in numeric strings:

# Zero-fill amounts with signs
df['amount_padded'] = df['amount'].str.zfill(8)

print(df[['amount', 'amount_padded']])
   amount amount_padded
0    5.99     00005.99
1  -12.50    -0012.50
2     100     00000100
3   -0.45    -0000.45

Notice how negative signs remain at the front, with zeros inserted after them. This behavior makes str.zfill() superior to simple padding for financial or scientific data.

Using str.pad() for Flexible Padding

str.pad() provides complete control over padding direction and fill characters. The signature: str.pad(width, side='left', fillchar=' ').

# Left padding with zeros (similar to zfill)
df['status_left'] = df['status'].str.pad(10, side='left', fillchar='0')

# Right padding with spaces
df['status_right'] = df['status'].str.pad(10, side='right', fillchar=' ')

# Center padding with dashes
df['status_center'] = df['status'].str.pad(10, side='both', fillchar='-')

print(df[['status', 'status_left', 'status_right', 'status_center']])
    status status_left status_right status_center
0       OK  00000000OK OK            ----OK----
1     FAIL  000000FAIL FAIL          ---FAIL---
2  PENDING  000PENDING PENDING       -PENDING--
3       OK  00000000OK OK            ----OK----

When using side='both', Pandas adds extra padding to the right if the total padding is odd.

Practical Use Cases

Database ID Formatting

Many systems require fixed-width identifiers for sorting or display purposes:

# Simulate database records with varying ID lengths
records = pd.DataFrame({
    'user_id': ['5', '123', '4567', '89'],
    'order_id': ['A1', 'B234', 'C56789', 'D0'],
})

# Format for display in reports
records['user_id_fmt'] = records['user_id'].str.zfill(6)
records['order_id_fmt'] = records['order_id'].str.pad(8, side='left', fillchar='0')

print(records)
  user_id order_id user_id_fmt order_id_fmt
0       5       A1      000005       00000A1
1     123     B234      000123     0000B234
2    4567   C56789      004567   00C56789
3      89       D0      000089     00000D0

Creating Fixed-Width File Formats

Fixed-width formats remain common in legacy systems and data interchange:

# Generate fixed-width records
transactions = pd.DataFrame({
    'account': ['12345', '67890', '11111'],
    'amount': ['1250.50', '75.00', '10000.99'],
    'type': ['CR', 'DR', 'CR']
})

# Create fixed-width format: account(10) + amount(12) + type(4)
transactions['fixed_width'] = (
    transactions['account'].str.pad(10, side='right', fillchar=' ') +
    transactions['amount'].str.pad(12, side='left', fillchar='0') +
    transactions['type'].str.pad(4, side='right', fillchar=' ')
)

print(transactions['fixed_width'])
0    12345     000001250.50CR  
1    67890     000000075.00DR  
2    11111     000010000.99CR  

Aligning Text Output

Create aligned columnar text for logs or reports:

# Log entries with aligned components
logs = pd.DataFrame({
    'level': ['INFO', 'WARNING', 'ERROR', 'DEBUG'],
    'module': ['auth', 'db', 'api', 'cache'],
    'message': ['User login', 'Connection slow', 'Timeout', 'Cache miss']
})

logs['formatted'] = (
    logs['level'].str.pad(8, side='right', fillchar=' ') +
    logs['module'].str.pad(10, side='right', fillchar=' ') +
    logs['message']
)

print(logs['formatted'])
0    INFO    auth      User login
1    WARNING db        Connection slow
2    ERROR   api       Timeout
3    DEBUG   cache     Cache miss

Handling Missing Values

Both methods handle NaN values without errors, preserving them in the output:

# Data with missing values
data = pd.Series(['123', None, '45', np.nan, '6'])

print("Original:", data.tolist())
print("zfill(5):", data.str.zfill(5).tolist())
print("pad(5):", data.str.pad(5, fillchar='0').tolist())
Original: ['123', None, '45', nan, '6']
zfill(5): ['00123', None, '00045', nan, '00006']
pad(5): ['00123', None, '00045', nan, '00006']

Performance Considerations

For large datasets, both methods perform efficiently through vectorized operations:

import time

# Create large dataset
large_series = pd.Series([str(i) for i in range(1000000)])

# Benchmark zfill
start = time.time()
result_zfill = large_series.str.zfill(10)
print(f"zfill: {time.time() - start:.3f}s")

# Benchmark pad
start = time.time()
result_pad = large_series.str.pad(10, fillchar='0')
print(f"pad: {time.time() - start:.3f}s")

str.zfill() typically performs slightly faster than str.pad() due to its specialized implementation, but both complete in comparable time for most workloads.

Key Differences Between Methods

str.zfill() differs from str.pad(width, side='left', fillchar='0') in sign handling:

# Compare sign handling
numbers = pd.Series(['-42', '+100', '-5'])

print("zfill(6):")
print(numbers.str.zfill(6))

print("\npad(6) with '0':")
print(numbers.str.pad(6, side='left', fillchar='0'))
zfill(6):
0    -00042
1    +00100
2    -00005

pad(6) with '0':
0    000-42
1    00+100
2    0000-5

Use str.zfill() when working with signed numeric strings. Use str.pad() for everything else or when you need right/center alignment.

Chaining with Other String Methods

Combine padding with other string operations for complex transformations:

# Clean and format product codes
products = pd.Series(['  abc-123  ', 'DEF-45', 'ghi-6789'])

formatted = (
    products
    .str.strip()           # Remove whitespace
    .str.upper()           # Uppercase
    .str.replace('-', '')  # Remove hyphens
    .str.zfill(10)         # Pad to 10 characters
)

print(formatted)
0    000ABC123
1    0000DEF45
2    00GHI6789

This approach maintains readability while applying multiple transformations in a single pipeline.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.