How to Use Struct Types in Polars

Polars struct types solve a common problem: how do you keep related data together without spreading it across multiple columns? A struct is a composite type that groups multiple named fields into a...

Key Insights

  • Struct types in Polars let you group multiple fields into a single column, enabling cleaner data organization and powerful aggregation patterns that return multiple values simultaneously.
  • The pl.struct() expression creates structs from existing columns, while .struct.field() and unnest() provide flexible extraction back to individual columns when needed.
  • GroupBy operations naturally produce struct columns when you aggregate multiple values, making structs essential for advanced analytical workflows.

Introduction to Struct Types

Polars struct types solve a common problem: how do you keep related data together without spreading it across multiple columns? A struct is a composite type that groups multiple named fields into a single column value. Think of it as embedding a dictionary or named tuple inside each row of your DataFrame.

If you’ve worked with JSON data, nested database records, or needed to return multiple values from a single operation, structs are your answer. They’re not just a convenience feature—they’re fundamental to how Polars handles complex aggregations and nested data structures.

Unlike Python dictionaries, Polars structs are strongly typed. Each field has a defined name and data type, enforced across all rows. This gives you the performance benefits of columnar storage while maintaining logical groupings of related data.

Creating Struct Columns

The primary way to create a struct is with pl.struct(), which combines existing columns into a single struct column.

import polars as pl

# Create a DataFrame with separate columns
df = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [28, 35, 42],
    "city": ["Boston", "Denver", "Seattle"]
})

# Combine columns into a struct
df_with_struct = df.with_columns(
    pl.struct(["name", "age", "city"]).alias("person")
)

print(df_with_struct)

Output:

shape: (3, 4)
┌─────────┬─────┬─────────┬─────────────────────────────────┐
│ name    ┆ age ┆ city    ┆ person                          │
│ ---     ┆ --- ┆ ---     ┆ ---                             │
│ str     ┆ i64 ┆ str     ┆ struct[3]                       │
╞═════════╪═════╪═════════╪═════════════════════════════════╡
│ Alice   ┆ 28  ┆ Boston  ┆ {"Alice",28,"Boston"}           │
│ Bob     ┆ 35  ┆ Denver  ┆ {"Bob",35,"Denver"}             │
│ Charlie ┆ 42  ┆ Seattle ┆ {"Charlie",42,"Seattle"}        │
└─────────┴─────┴─────────┴─────────────────────────────────┘

You can also create structs inline during DataFrame construction using dictionaries:

# Create DataFrame with struct column directly
df = pl.DataFrame({
    "id": [1, 2, 3],
    "metadata": [
        {"source": "web", "priority": 1},
        {"source": "api", "priority": 2},
        {"source": "web", "priority": 1},
    ]
})

print(df)
print(df.schema)

The schema shows the struct’s internal structure: {'id': Int64, 'metadata': Struct({'source': String, 'priority': Int64})}.

Accessing Struct Fields

Once you have a struct column, you’ll need to extract individual fields. The .struct.field() method handles this cleanly.

df = pl.DataFrame({
    "id": [1, 2, 3],
    "person": [
        {"name": "Alice", "age": 28, "city": "Boston"},
        {"name": "Bob", "age": 35, "city": "Denver"},
        {"name": "Charlie", "age": 42, "city": "Seattle"},
    ]
})

# Extract a single field
names = df.select(pl.col("person").struct.field("name"))
print(names)

# Extract and rename in one step
df_extracted = df.select(
    pl.col("id"),
    pl.col("person").struct.field("name").alias("person_name"),
    pl.col("person").struct.field("age").alias("person_age"),
)
print(df_extracted)

For extracting multiple fields at once, chain the operations or use struct.field() multiple times:

# Extract multiple fields with different transformations
df_transformed = df.select(
    pl.col("id"),
    pl.col("person").struct.field("name").str.to_uppercase().alias("NAME"),
    (pl.col("person").struct.field("age") * 12).alias("age_in_months"),
)
print(df_transformed)

This approach keeps your code expressive—you’re clearly stating which fields you need and what you’re doing with them.

Unnesting Structs

When you want to expand a struct column back into separate columns, use unnest(). This operation is the inverse of pl.struct().

df = pl.DataFrame({
    "id": [1, 2, 3],
    "person": [
        {"name": "Alice", "age": 28, "city": "Boston"},
        {"name": "Bob", "age": 35, "city": "Denver"},
        {"name": "Charlie", "age": 42, "city": "Seattle"},
    ]
})

# Unnest the struct column into separate columns
df_flat = df.unnest("person")
print(df_flat)

Output:

shape: (3, 4)
┌─────┬─────────┬─────┬─────────┐
│ id  ┆ name    ┆ age ┆ city    │
│ --- ┆ ---     ┆ --- ┆ ---     │
│ i64 ┆ str     ┆ i64 ┆ str     │
╞═════╪═════════╪═════╪═════════╡
│ 1   ┆ Alice   ┆ 28  ┆ Boston  │
│ 2   ┆ Bob     ┆ 35  ┆ Denver  │
│ 3   ┆ Charlie ┆ 42  ┆ Seattle │
└─────┴─────────┴─────┴─────────┘

The unnest() method works at the DataFrame level. If you need to unnest within an expression context (like inside a select() or with_columns()), extract fields individually using .struct.field() instead.

You can unnest multiple struct columns simultaneously:

df_multi = df.with_columns(
    pl.struct(["id"]).alias("ids_struct")
)
df_fully_flat = df_multi.unnest("person", "ids_struct")

Structs in GroupBy Operations

Here’s where structs become genuinely powerful. When you need multiple aggregation results per group, structs let you bundle them together elegantly.

df = pl.DataFrame({
    "category": ["A", "A", "B", "B", "B"],
    "value": [10, 20, 30, 40, 50],
    "quantity": [1, 2, 3, 4, 5],
})

# Return multiple aggregations as a struct
result = df.group_by("category").agg(
    pl.struct(
        pl.col("value").min().alias("min_value"),
        pl.col("value").max().alias("max_value"),
        pl.col("value").mean().alias("mean_value"),
        pl.col("quantity").sum().alias("total_quantity"),
    ).alias("stats")
)

print(result)

Output:

shape: (2, 2)
┌──────────┬──────────────────────────────┐
│ category ┆ stats                        │
│ ---      ┆ ---                          │
│ str      ┆ struct[4]                    │
╞══════════╪══════════════════════════════╡
│ A        ┆ {10,20,15.0,3}               │
│ B        ┆ {30,50,40.0,12}              │
└──────────┴──────────────────────────────┘

You can then unnest this to get a flat result:

result_flat = result.unnest("stats")
print(result_flat)

This pattern is cleaner than creating multiple separate aggregation columns when the values are logically related.

Filtering and Transforming Structs

Filtering based on struct field values requires accessing the field first, then applying your condition:

df = pl.DataFrame({
    "id": [1, 2, 3, 4],
    "config": [
        {"enabled": True, "threshold": 0.5},
        {"enabled": False, "threshold": 0.8},
        {"enabled": True, "threshold": 0.3},
        {"enabled": True, "threshold": 0.9},
    ]
})

# Filter rows where config.enabled is True and threshold > 0.4
filtered = df.filter(
    pl.col("config").struct.field("enabled") & 
    (pl.col("config").struct.field("threshold") > 0.4)
)
print(filtered)

For transformations, you can modify individual fields while preserving the struct:

# Update a field within the struct
df_updated = df.with_columns(
    pl.struct(
        pl.col("config").struct.field("enabled"),
        (pl.col("config").struct.field("threshold") * 100).alias("threshold"),
    ).alias("config")
)
print(df_updated)

For complex transformations that don’t fit neatly into expressions, use map_elements():

def transform_config(config: dict) -> dict:
    return {
        "enabled": config["enabled"],
        "threshold": config["threshold"],
        "category": "high" if config["threshold"] > 0.5 else "low"
    }

df_transformed = df.with_columns(
    pl.col("config").map_elements(
        transform_config,
        return_dtype=pl.Struct({
            "enabled": pl.Boolean,
            "threshold": pl.Float64,
            "category": pl.String,
        })
    )
)
print(df_transformed)

Use map_elements() sparingly—it bypasses Polars’ optimization engine. Prefer native expressions when possible.

Practical Use Cases

Parsing JSON into Structs

When you read JSON data, Polars often represents nested objects as structs automatically. You can also parse JSON strings explicitly:

df = pl.DataFrame({
    "id": [1, 2, 3],
    "json_data": [
        '{"user": "alice", "score": 95}',
        '{"user": "bob", "score": 87}',
        '{"user": "charlie", "score": 92}',
    ]
})

# Parse JSON strings into structs
df_parsed = df.with_columns(
    pl.col("json_data").str.json_decode(
        dtype=pl.Struct({"user": pl.String, "score": pl.Int64})
    ).alias("parsed")
)

# Extract fields from the parsed struct
df_final = df_parsed.select(
    pl.col("id"),
    pl.col("parsed").struct.field("user"),
    pl.col("parsed").struct.field("score"),
)
print(df_final)

Returning Multiple Values from Custom Functions

When you need a function to return multiple computed values, structs keep them organized:

def compute_stats(values: list) -> dict:
    import statistics
    return {
        "mean": statistics.mean(values),
        "stdev": statistics.stdev(values) if len(values) > 1 else 0.0,
        "count": len(values),
    }

df = pl.DataFrame({
    "group": ["A", "A", "A", "B", "B"],
    "value": [10.0, 12.0, 11.0, 20.0, 25.0],
})

result = df.group_by("group").agg(
    pl.col("value").map_elements(
        compute_stats,
        return_dtype=pl.Struct({
            "mean": pl.Float64,
            "stdev": pl.Float64,
            "count": pl.Int64,
        })
    ).alias("statistics")
).unnest("statistics")

print(result)

Structs in Polars aren’t just a data type—they’re a design pattern for organizing complex data. Use them when fields belong together logically, when aggregations need to return multiple values, or when working with naturally nested data like JSON. The key is recognizing that a struct column is still a column: you can filter it, transform it, and aggregate it just like any other, while keeping related data bundled together.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.