How to Delete a Column in Polars
Deleting columns from a DataFrame is one of the most common data manipulation tasks. Whether you're cleaning up temporary calculations, removing sensitive data before export, or trimming down a wide...
Key Insights
- Polars uses an immutable design—
drop()returns a new DataFrame rather than modifying in place, making your data transformations predictable and safe - The
select()method withpl.exclude()or column selectors offers a powerful alternative todrop(), especially when you want to remove columns by pattern or data type - Always use
strict=Falsewhen dropping columns that might not exist to prevent runtime errors in production pipelines
Deleting columns from a DataFrame is one of the most common data manipulation tasks. Whether you’re cleaning up temporary calculations, removing sensitive data before export, or trimming down a wide dataset, you’ll need to know how to efficiently remove columns in Polars.
Unlike pandas, Polars follows an immutable design philosophy. Every operation returns a new DataFrame rather than modifying the original in place. This might feel unfamiliar at first, but it eliminates an entire class of bugs related to unexpected mutations and makes your data pipelines more predictable.
Let’s explore all the ways to delete columns in Polars, from simple single-column drops to advanced pattern-based removal.
Using drop() for Single Column Deletion
The drop() method is the most straightforward way to remove a column. Pass the column name as a string, and you get back a new DataFrame without that column.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"temp_score": [0.5, 0.8, 0.3]
})
print("Original DataFrame:")
print(df)
# Drop a single column
result = df.drop("temp_score")
print("\nAfter dropping 'temp_score':")
print(result)
Output:
Original DataFrame:
shape: (3, 4)
┌─────┬─────────┬─────┬────────────┐
│ id ┆ name ┆ age ┆ temp_score │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ f64 │
╞═════╪═════════╪═════╪════════════╡
│ 1 ┆ Alice ┆ 25 ┆ 0.5 │
│ 2 ┆ Bob ┆ 30 ┆ 0.8 │
│ 3 ┆ Charlie ┆ 35 ┆ 0.3 │
└─────┴─────────┴─────┴────────────┘
After dropping 'temp_score':
shape: (3, 3)
┌─────┬─────────┬─────┐
│ id ┆ name ┆ age │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════════╪═════╡
│ 1 ┆ Alice ┆ 25 │
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
└─────┴─────────┴─────┘
Notice that the original df remains unchanged. If you want to “update” your variable, you need to reassign it: df = df.drop("temp_score").
Dropping Multiple Columns
When you need to remove several columns at once, drop() accepts multiple arguments. You have two syntax options, and both work identically.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"temp_a": [0.1, 0.2, 0.3],
"temp_b": [0.4, 0.5, 0.6],
"debug_flag": [True, False, True]
})
# Option 1: Pass a list of column names
result1 = df.drop(["temp_a", "temp_b", "debug_flag"])
# Option 2: Pass column names as separate arguments
result2 = df.drop("temp_a", "temp_b", "debug_flag")
print("Using list syntax:")
print(result1)
print("\nUsing unpacked syntax:")
print(result2)
Both approaches produce the same result:
shape: (3, 3)
┌─────┬─────────┬─────┐
│ id ┆ name ┆ age │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪═════════╪═════╡
│ 1 ┆ Alice ┆ 25 │
│ 2 ┆ Bob ┆ 30 │
│ 3 ┆ Charlie ┆ 35 │
└─────┴─────────┴─────┘
I prefer the list syntax when the column names come from a variable or configuration, and the unpacked syntax when hardcoding a small number of columns. Choose whichever reads better in your specific context.
Using select() to Exclude Columns
Sometimes it’s easier to specify what you want to keep rather than what you want to remove. Other times, you want to exclude columns using expressions rather than explicit names. The select() method with pl.exclude() handles both scenarios elegantly.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"internal_score": [0.5, 0.8, 0.3],
"internal_rank": [3, 1, 2]
})
# Select all columns except specific ones
result = df.select(pl.exclude("internal_score"))
print("Excluding 'internal_score':")
print(result)
# Exclude multiple columns
result2 = df.select(pl.exclude(["internal_score", "internal_rank"]))
print("\nExcluding multiple columns:")
print(result2)
The pl.exclude() function is particularly powerful because it integrates with Polars’ expression system. You can combine it with other column selectors for complex selection logic.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"value_a": [10, 20, 30],
"value_b": [40, 50, 60]
})
# Select all columns, but exclude 'id', and double the remaining numeric columns
result = df.select(
pl.exclude("id")
)
print(result)
This approach shines when you’re building dynamic queries where the exact columns might vary between runs.
Dropping Columns by Pattern or Data Type
Real-world datasets often have naming conventions—prefixes like temp_, debug_, or _internal that mark columns for removal. Polars’ column selectors module (polars.selectors) makes bulk removal based on patterns or data types trivial.
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"temp_calculation": [0.1, 0.2, 0.3],
"temp_flag": [True, False, True],
"debug_info": ["x", "y", "z"]
})
# Drop all columns starting with "temp_"
result = df.select(~cs.matches("^temp_"))
print("After removing 'temp_' columns:")
print(result)
Output:
After removing 'temp_' columns:
shape: (3, 4)
┌─────┬─────────┬─────┬────────────┐
│ id ┆ name ┆ age ┆ debug_info │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════╪═════════╪═════╪════════════╡
│ 1 ┆ Alice ┆ 25 ┆ x │
│ 2 ┆ Bob ┆ 30 ┆ y │
│ 3 ┆ Charlie ┆ 35 ┆ z │
└─────┴─────────┴─────┴────────────┘
The ~ operator negates the selector, meaning “everything except what matches.” You can also drop columns by data type:
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"description": ["Engineer", "Designer", "Manager"],
"age": [25, 30, 35],
"salary": [50000.0, 60000.0, 70000.0]
})
# Drop all string columns
result = df.select(~cs.string())
print("After removing string columns:")
print(result)
# Drop all float columns
result2 = df.select(~cs.float())
print("\nAfter removing float columns:")
print(result2)
Output:
After removing string columns:
shape: (3, 3)
┌─────┬─────┬─────────┐
│ id ┆ age ┆ salary │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════════╡
│ 1 ┆ 25 ┆ 50000.0 │
│ 2 ┆ 30 ┆ 60000.0 │
│ 3 ┆ 35 ┆ 70000.0 │
└─────┴─────┴─────────┘
After removing float columns:
shape: (3, 4)
┌─────┬─────────┬─────────────┬─────┐
│ id ┆ name ┆ description ┆ age │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ i64 │
╞═════╪═════════╪═════════════╪═════╡
│ 1 ┆ Alice ┆ Engineer ┆ 25 │
│ 2 ┆ Bob ┆ Designer ┆ 30 │
│ 3 ┆ Charlie ┆ Manager ┆ 35 │
└─────┴─────────┴─────────────┴─────┘
Column selectors are composable. You can combine them with | (or), & (and), and ~ (not) operators to build precise selection logic.
Handling Non-Existent Columns
By default, drop() raises an error if you try to remove a column that doesn’t exist. In production pipelines where input schemas might vary, this can cause unexpected failures.
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"]
})
# This raises ColumnNotFoundError
try:
result = df.drop("nonexistent_column")
except pl.exceptions.ColumnNotFoundError as e:
print(f"Error: {e}")
# Use strict=False to silently ignore missing columns
result = df.drop("nonexistent_column", strict=False)
print("\nWith strict=False:")
print(result)
Output:
Error: unable to find column "nonexistent_column"; valid columns: ["id", "name"]
With strict=False:
shape: (3, 2)
┌─────┬─────────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════╡
│ 1 ┆ Alice │
│ 2 ┆ Bob │
│ 3 ┆ Charlie │
└─────┴─────────┘
Use strict=False when processing data from external sources where the schema isn’t guaranteed. Keep the default strict=True during development to catch typos and logic errors early.
Lazy Frame Considerations
Everything we’ve covered works identically with LazyFrames. Polars’ lazy evaluation defers computation until you call collect(), allowing the query optimizer to potentially eliminate unnecessary work.
import polars as pl
# Create a LazyFrame
lf = pl.LazyFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"temp_data": [0.1, 0.2, 0.3]
})
# Drop column in lazy context
result = lf.drop("temp_data").collect()
print(result)
# Using select with exclude in lazy context
result2 = lf.select(pl.exclude("temp_data")).collect()
print(result2)
The query optimizer is smart enough to recognize that if you drop a column early in a chain, it doesn’t need to compute that column at all. This can significantly improve performance when working with expensive transformations on columns you’ll eventually discard.
import polars as pl
lf = pl.scan_csv("large_file.csv")
# The optimizer knows 'expensive_column' isn't needed
result = (
lf
.with_columns(
pl.col("raw_data").str.extract_all(r"\d+").alias("expensive_column")
)
.drop("expensive_column", "raw_data")
.collect()
)
In this example, a naive executor would compute expensive_column and then throw it away. Polars’ optimizer recognizes this and skips the computation entirely.
Column deletion in Polars is straightforward once you understand the immutable design. Use drop() for explicit removal by name, select() with pl.exclude() for expression-based exclusion, and column selectors for pattern or type-based bulk removal. Remember strict=False for robust production code, and trust the lazy evaluation optimizer to handle the rest.