How to Add a New Column in Polars

If you're coming from pandas, your first instinct might be to write `df['new_col'] = value`. That won't work in Polars. The library takes an immutable approach to DataFrames—every transformation...

Key Insights

  • Polars DataFrames are immutable, so with_columns() returns a new DataFrame rather than modifying in place—this design enables powerful optimizations and predictable behavior
  • The expression API (pl.col(), pl.lit(), pl.when()) is the idiomatic way to create columns, offering both readability and performance over row-wise operations
  • Lazy mode with with_columns() allows Polars to optimize multiple column additions into a single pass over your data, dramatically improving performance on large datasets

Why Adding Columns in Polars Differs from Pandas

If you’re coming from pandas, your first instinct might be to write df['new_col'] = value. That won’t work in Polars. The library takes an immutable approach to DataFrames—every transformation returns a new DataFrame rather than modifying the original. This isn’t a limitation; it’s a deliberate design choice that enables Polars to parallelize operations and optimize query plans.

The with_columns() method is your primary tool for adding columns. Once you internalize this pattern, you’ll find it more expressive and less error-prone than pandas’ assignment syntax.

Using with_columns() for Basic Column Addition

The with_columns() method accepts one or more expressions and returns a new DataFrame with those columns added. Let’s start with the simplest cases.

import polars as pl

# Create a sample DataFrame
df = pl.DataFrame({
    "product": ["Widget", "Gadget", "Gizmo"],
    "price": [10.00, 25.50, 15.75],
    "quantity": [100, 50, 75]
})

# Add a constant column
df_with_status = df.with_columns(
    pl.lit("active").alias("status")
)

print(df_with_status)

Output:

shape: (3, 4)
┌─────────┬───────┬──────────┬────────┐
│ product ┆ price ┆ quantity ┆ status │
│ ---     ┆ ---   ┆ ---      ┆ ---    │
│ str     ┆ f64   ┆ i64      ┆ str    │
╞═════════╪═══════╪══════════╪════════╡
│ Widget  ┆ 10.0  ┆ 100      ┆ active │
│ Gadget  ┆ 25.5  ┆ 50       ┆ active │
│ Gizmo   ┆ 15.75 ┆ 75       ┆ active │
└─────────┴───────┴──────────┴────────┘

The pl.lit() function creates a literal value expression, and alias() names the resulting column. For calculated columns based on existing data, reference columns with pl.col():

# Add a calculated column
df_with_total = df.with_columns(
    (pl.col("price") * pl.col("quantity")).alias("total_value")
)

print(df_with_total)

Output:

shape: (3, 4)
┌─────────┬───────┬──────────┬─────────────┐
│ product ┆ price ┆ quantity ┆ total_value │
│ ---     ┆ ---   ┆ ---      ┆ ---         │
│ str     ┆ f64   ┆ i64      ┆ f64         │
╞═════════╪═══════╪══════════╪═════════════╡
│ Widget  ┆ 10.0  ┆ 100      ┆ 1000.0      │
│ Gadget  ┆ 25.5  ┆ 50       ┆ 1275.0      │
│ Gizmo   ┆ 15.75 ┆ 75       ┆ 1181.25     │
└─────────┴───────┴──────────┴─────────────┘

Creating Columns with Expressions

Polars expressions are where the library really shines. They’re composable, optimizable, and far more powerful than simple arithmetic.

Conditional Logic with when().then().otherwise()

This is Polars’ equivalent of SQL’s CASE WHEN or numpy’s where():

df = pl.DataFrame({
    "product": ["Widget", "Gadget", "Gizmo", "Thingamajig"],
    "price": [10.00, 25.50, 15.75, 99.99],
    "quantity": [100, 50, 75, 10]
})

# Create a tier column based on price
df_with_tier = df.with_columns(
    pl.when(pl.col("price") < 15)
    .then(pl.lit("budget"))
    .when(pl.col("price") < 50)
    .then(pl.lit("standard"))
    .otherwise(pl.lit("premium"))
    .alias("price_tier")
)

print(df_with_tier)

Output:

shape: (4, 4)
┌─────────────┬───────┬──────────┬────────────┐
│ product     ┆ price ┆ quantity ┆ price_tier │
│ ---         ┆ ---   ┆ ---      ┆ ---        │
│ str         ┆ f64   ┆ i64      ┆ str        │
╞═════════════╪═══════╪══════════╪════════════╡
│ Widget      ┆ 10.0  ┆ 100      ┆ budget     │
│ Gadget      ┆ 25.5  ┆ 50       ┆ standard   │
│ Gizmo       ┆ 15.75 ┆ 75       ┆ standard   │
│ Thingamajig ┆ 99.99 ┆ 10       ┆ premium    │
└─────────────┴───────┴──────────┴────────────┘

String Manipulation

Polars provides a rich set of string operations through the .str namespace:

df = pl.DataFrame({
    "email": ["alice@example.com", "bob@company.org", "charlie@startup.io"]
})

# Extract domain from email
df_with_domain = df.with_columns(
    pl.col("email").str.split("@").list.last().alias("domain"),
    pl.col("email").str.split("@").list.first().alias("username"),
    pl.col("email").str.to_uppercase().alias("email_upper")
)

print(df_with_domain)

Output:

shape: (3, 4)
┌────────────────────┬────────────┬─────────┬────────────────────────┐
│ email              ┆ domain     ┆ username┆ email_upper            │
│ ---                ┆ ---        ┆ ---     ┆ ---                    │
│ str                ┆ str        ┆ str     ┆ str                    │
╞════════════════════╪════════════╪═════════╪════════════════════════╡
│ alice@example.com  ┆ example.com┆ alice   ┆ ALICE@EXAMPLE.COM      │
│ bob@company.org    ┆ company.org┆ bob     ┆ BOB@COMPANY.ORG        │
│ charlie@startup.io ┆ startup.io ┆ charlie ┆ CHARLIE@STARTUP.IO     │
└────────────────────┴────────────┴─────────┴────────────────────────┘

Adding Multiple Columns at Once

One of Polars’ strengths is efficiently handling multiple operations. Pass multiple expressions to a single with_columns() call:

df = pl.DataFrame({
    "product": ["Widget", "Gadget", "Gizmo"],
    "price": [10.00, 25.50, 15.75],
    "quantity": [100, 50, 75],
    "cost": [6.00, 15.00, 9.50]
})

# Add multiple calculated columns at once
df_enriched = df.with_columns(
    (pl.col("price") * pl.col("quantity")).alias("revenue"),
    (pl.col("cost") * pl.col("quantity")).alias("total_cost"),
    (pl.col("price") - pl.col("cost")).alias("margin_per_unit"),
    ((pl.col("price") - pl.col("cost")) / pl.col("price") * 100).alias("margin_pct")
)

print(df_enriched)

This is more efficient than chaining multiple with_columns() calls because Polars can optimize and parallelize the operations.

For dynamic column creation, use list comprehensions:

# Create percentage columns for multiple numeric fields
numeric_cols = ["price", "cost"]
total_sum = df.select(pl.col(numeric_cols).sum())

df_with_pcts = df.with_columns([
    (pl.col(col) / pl.col(col).sum() * 100).alias(f"{col}_pct_of_total")
    for col in numeric_cols
])

print(df_with_pcts)

Adding Columns in Lazy Mode

For large datasets, lazy evaluation is essential. Instead of executing each operation immediately, Polars builds a query plan and optimizes it before execution:

# Simulate a larger dataset
df_large = pl.DataFrame({
    "id": range(1_000_000),
    "value": [i * 0.5 for i in range(1_000_000)],
    "category": ["A", "B", "C"] * 333333 + ["A"]
})

# Lazy evaluation with multiple column additions
result = (
    df_large
    .lazy()
    .with_columns(
        (pl.col("value") * 2).alias("doubled"),
        (pl.col("value").log()).alias("log_value"),
        pl.when(pl.col("category") == "A")
        .then(pl.col("value") * 1.1)
        .otherwise(pl.col("value"))
        .alias("adjusted_value")
    )
    .filter(pl.col("doubled") > 100)
    .collect()
)

The .lazy() call converts the DataFrame to a LazyFrame, and .collect() triggers execution. Between these calls, Polars optimizes the query—it might push filters before column creation, parallelize independent operations, or eliminate unused columns.

You can inspect the query plan with .explain():

query = (
    df_large
    .lazy()
    .with_columns((pl.col("value") * 2).alias("doubled"))
    .filter(pl.col("doubled") > 100)
)

print(query.explain())

Common Patterns and Use Cases

Date Component Extraction

Working with dates often requires extracting components:

df = pl.DataFrame({
    "order_date": ["2024-01-15", "2024-03-22", "2024-12-01"]
}).with_columns(pl.col("order_date").str.to_date())

df_with_date_parts = df.with_columns(
    pl.col("order_date").dt.year().alias("year"),
    pl.col("order_date").dt.month().alias("month"),
    pl.col("order_date").dt.weekday().alias("day_of_week"),
    pl.col("order_date").dt.quarter().alias("quarter")
)

print(df_with_date_parts)

Normalization and Scaling

Common in data preprocessing:

df = pl.DataFrame({
    "feature": [10, 20, 30, 40, 50]
})

df_normalized = df.with_columns(
    # Min-max normalization
    ((pl.col("feature") - pl.col("feature").min()) / 
     (pl.col("feature").max() - pl.col("feature").min())).alias("normalized"),
    
    # Z-score standardization
    ((pl.col("feature") - pl.col("feature").mean()) / 
     pl.col("feature").std()).alias("standardized")
)

print(df_normalized)

Flag Columns from Conditions

Creating boolean flags for filtering or analysis:

df = pl.DataFrame({
    "customer_id": [1, 2, 3, 4, 5],
    "total_purchases": [150, 500, 75, 1200, 300],
    "account_age_days": [365, 30, 180, 720, 90]
})

df_with_flags = df.with_columns(
    (pl.col("total_purchases") >= 500).alias("is_high_value"),
    (pl.col("account_age_days") >= 365).alias("is_established"),
    ((pl.col("total_purchases") >= 500) & 
     (pl.col("account_age_days") >= 365)).alias("is_vip")
)

print(df_with_flags)

Conclusion

Adding columns in Polars centers on the with_columns() method combined with the expression API. The key patterns to remember:

  • Use pl.lit() for constant values and pl.col() for referencing existing columns
  • Chain conditional logic with when().then().otherwise()
  • Add multiple columns in a single with_columns() call for better performance
  • Switch to lazy mode for large datasets to benefit from query optimization

The immutable design might feel unfamiliar at first, but it enables Polars’ impressive performance characteristics. Once you embrace expressions over imperative operations, you’ll write cleaner, faster data transformations.

For more details, the Polars user guide on expressions is an excellent resource.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.