How to Rename Columns in Polars
Column renaming sounds trivial until you're staring at a dataset with columns named `Customer ID`, `customer_id`, `CUSTOMER ID`, and `cust_id` that all need to become `customer_id`. Or you've...
Key Insights
- Polars offers multiple column renaming strategies:
rename()for explicit mappings,alias()for expression-based transformations, and direct assignment for bulk replacements - The
alias()method shines when you’re already transforming data, letting you rename columns as part of your select or aggregation pipeline without extra steps - Programmatic renaming with lambda functions handles real-world messiness like inconsistent casing, spaces in column names, or adding standardized prefixes across dozens of columns
Introduction
Column renaming sounds trivial until you’re staring at a dataset with columns named Customer ID, customer_id, CUSTOMER ID, and cust_id that all need to become customer_id. Or you’ve inherited a CSV export where some analyst thought Q1 2024 Revenue ($) was a reasonable column name.
Polars, the Rust-powered DataFrame library that’s rapidly becoming the go-to choice for performance-critical Python data work, handles column renaming with the same philosophy it applies to everything else: give you multiple approaches optimized for different scenarios, and make them all fast.
This guide covers every practical method for renaming columns in Polars, from simple one-off changes to programmatic transformations across hundreds of columns. I’ll show you when to use each approach and how they behave differently in eager versus lazy execution modes.
Using the rename() Method
The rename() method is your workhorse for explicit column renaming. Pass it a dictionary mapping old names to new names, and Polars handles the rest.
import polars as pl
# Create a sample DataFrame
df = pl.DataFrame({
"firstName": ["Alice", "Bob", "Charlie"],
"lastName": ["Smith", "Jones", "Brown"],
"age_years": [30, 25, 35]
})
# Rename a single column
df_renamed = df.rename({"firstName": "first_name"})
print(df_renamed)
Output:
shape: (3, 3)
┌────────────┬──────────┬───────────┐
│ first_name ┆ lastName ┆ age_years │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞════════════╪══════════╪═══════════╡
│ Alice ┆ Smith ┆ 30 │
│ Bob ┆ Jones ┆ 25 │
│ Charlie ┆ Brown ┆ 35 │
└────────────┴──────────┴───────────┘
For multiple columns, just add more key-value pairs to the dictionary:
df_cleaned = df.rename({
"firstName": "first_name",
"lastName": "last_name",
"age_years": "age"
})
print(df_cleaned)
Output:
shape: (3, 3)
┌────────────┬───────────┬─────┐
│ first_name ┆ last_name ┆ age │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞════════════╪═══════════╪═════╡
│ Alice ┆ Smith ┆ 30 │
│ Bob ┆ Jones ┆ 25 │
│ Charlie ┆ Brown ┆ 35 │
└────────────┴───────────┴─────┘
The rename() method returns a new DataFrame by default—Polars DataFrames are immutable. If you try to rename a column that doesn’t exist, Polars raises a ColumnNotFoundError, which is actually helpful for catching typos early.
Using alias() in Expressions
When you’re already selecting or transforming columns, alias() lets you rename as part of the operation. This is idiomatic Polars and often cleaner than a separate rename() call.
df = pl.DataFrame({
"product_name": ["Widget", "Gadget", "Gizmo"],
"unit_price": [10.0, 25.0, 15.0],
"quantity_sold": [100, 50, 75]
})
# Rename while selecting
result = df.select(
pl.col("product_name").alias("product"),
pl.col("unit_price").alias("price"),
pl.col("quantity_sold").alias("units")
)
print(result)
The real power of alias() emerges when you’re computing new columns:
result = df.select(
pl.col("product_name").alias("product"),
(pl.col("unit_price") * pl.col("quantity_sold")).alias("total_revenue"),
pl.col("quantity_sold").rank().alias("sales_rank")
)
print(result)
Output:
shape: (3, 3)
┌─────────┬───────────────┬────────────┐
│ product ┆ total_revenue ┆ sales_rank │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ u32 │
╞═════════╪═══════════════╪════════════╡
│ Widget ┆ 1000.0 ┆ 3 │
│ Gadget ┆ 1250.0 ┆ 1 │
│ Gizmo ┆ 1125.0 ┆ 2 │
└─────────┴───────────────┴────────────┘
Use alias() when you’re already in an expression context. Use rename() when you just need to change names without other transformations.
Renaming All Columns at Once
Sometimes you need to replace all column names wholesale—maybe you’re reading a headerless CSV and assigning meaningful names, or conforming to a strict schema.
df = pl.DataFrame({
"column_1": [1, 2, 3],
"column_2": ["a", "b", "c"],
"column_3": [True, False, True]
})
# Replace all column names at once
df.columns = ["id", "category", "is_active"]
print(df)
Output:
shape: (3, 3)
┌─────┬──────────┬───────────┐
│ id ┆ category ┆ is_active │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ bool │
╞═════╪══════════╪═══════════╡
│ 1 ┆ a ┆ true │
│ 2 ┆ b ┆ false │
│ 3 ┆ c ┆ true │
└─────┴──────────┴───────────┘
Important caveat: This is an in-place mutation, which is unusual for Polars. The list length must exactly match the number of columns, or you’ll get an error.
For a more functional approach that returns a new DataFrame, use rename() with a complete mapping:
new_names = ["id", "category", "is_active"]
old_names = df.columns
df_renamed = df.rename(dict(zip(old_names, new_names)))
Programmatic Renaming with Functions
Real datasets have messy column names. You need programmatic solutions that can handle patterns, not just explicit mappings.
Polars’ rename() method accepts a callable that receives each column name and returns the new name:
df = pl.DataFrame({
"First Name": ["Alice", "Bob"],
"Last Name": ["Smith", "Jones"],
"Email Address": ["alice@example.com", "bob@example.com"]
})
# Convert to snake_case
def to_snake_case(name: str) -> str:
return name.lower().replace(" ", "_")
df_clean = df.rename(to_snake_case)
print(df_clean)
Output:
shape: (2, 3)
┌────────────┬───────────┬───────────────────┐
│ first_name ┆ last_name ┆ email_address │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞════════════╪═══════════╪═══════════════════╡
│ Alice ┆ Smith ┆ alice@example.com │
│ Bob ┆ Jones ┆ bob@example.com │
└────────────┴───────────┴───────────────────┘
Lambda functions work great for simpler transformations:
# Add prefix to all columns
df_prefixed = df.rename(lambda col: f"user_{col.lower().replace(' ', '_')}")
# Remove common suffix
df = pl.DataFrame({"name_col": [1], "age_col": [2], "city_col": [3]})
df_clean = df.rename(lambda col: col.removesuffix("_col"))
For more complex patterns, use regex:
import re
df = pl.DataFrame({
"Q1 2024 Revenue ($)": [1000],
"Q2 2024 Revenue ($)": [1500],
"Q3 2024 Revenue ($)": [1200]
})
def clean_column_name(name: str) -> str:
# Extract quarter and convert to clean format
match = re.match(r"Q(\d) (\d{4})", name)
if match:
return f"revenue_q{match.group(1)}_{match.group(2)}"
return name.lower().replace(" ", "_")
df_clean = df.rename(clean_column_name)
print(df_clean.columns)
# Output: ['revenue_q1_2024', 'revenue_q2_2024', 'revenue_q3_2024']
Renaming in Lazy vs Eager Mode
Polars’ lazy execution mode builds a query plan that gets optimized before execution. Column renaming works in lazy mode, but there are nuances worth understanding.
# Create a LazyFrame
lf = pl.LazyFrame({
"old_name": [1, 2, 3],
"another_old": ["a", "b", "c"]
})
# Chain operations including rename
result = (
lf
.rename({"old_name": "id", "another_old": "category"})
.filter(pl.col("id") > 1)
.select(pl.col("id"), pl.col("category").str.to_uppercase().alias("category_upper"))
.collect() # Execute the query
)
print(result)
Output:
shape: (2, 2)
┌─────┬────────────────┐
│ id ┆ category_upper │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪════════════════╡
│ 2 ┆ B │
│ 3 ┆ C │
└─────┴────────────────┘
Key considerations for lazy mode:
- Rename early in your pipeline if subsequent operations reference the new column names
- The query optimizer handles it efficiently—renaming doesn’t create intermediate DataFrames
- Use
alias()in expressions within lazy pipelines for cleaner code
# This is idiomatic for lazy pipelines
result = (
pl.scan_csv("data.csv")
.select(
pl.col("CustomerID").alias("customer_id"),
pl.col("OrderTotal").alias("order_total")
)
.group_by("customer_id")
.agg(pl.col("order_total").sum().alias("total_spend"))
.collect()
)
The direct column assignment (df.columns = [...]) doesn’t work on LazyFrames because they don’t materialize data until collect() is called.
Conclusion
Polars gives you the right tool for each column renaming scenario:
| Method | Best For | Works with LazyFrame? |
|---|---|---|
rename({"old": "new"}) |
Explicit, targeted renames | Yes |
alias() in expressions |
Renaming during transformations | Yes |
df.columns = [...] |
Bulk replacement of all names | No (eager only) |
rename(function) |
Programmatic pattern-based renaming | Yes |
For most production code, I recommend alias() within expression contexts and rename() with a callable for standardization tasks. The explicit dictionary approach works fine for one-off scripts, but programmatic renaming scales better when your column naming conventions evolve.
One final tip: establish column naming conventions early in your data pipeline. Whether you prefer snake_case, camelCase, or something else, apply the transformation immediately after data ingestion. Your future self—and anyone else reading your code—will thank you.