How to Convert Polars to Pandas

Key Insights

Use df.to_pandas() for straightforward conversion, but understand that data type mapping between Polars and Pandas can introduce subtle differences that affect downstream code.
Enable use_pyarrow_extension_types=True to preserve nullable types and avoid the performance penalty of converting to NumPy-backed arrays.
Minimize conversions by keeping your data in Polars as long as possible—only convert at integration boundaries with libraries that require Pandas.

Why Convert Between Polars and Pandas?

Polars has earned its reputation as the faster, more memory-efficient DataFrame library. But the Python data ecosystem was built on Pandas. Scikit-learn expects Pandas DataFrames. Matplotlib’s DataFrame integration assumes Pandas. Your company’s internal tools probably output Pandas objects.

You’ll need to convert between these libraries more often than you’d like. Maybe you’re migrating a legacy codebase incrementally. Maybe a third-party library only accepts Pandas input. Maybe you need a specific Pandas feature that Polars doesn’t replicate.

Whatever the reason, understanding how to convert cleanly—without losing data integrity or tanking performance—is essential knowledge for any Python developer working with DataFrames.

Basic Conversion with to_pandas()

The to_pandas() method is your primary tool. It works on both DataFrames and Series.

import polars as pl
import pandas as pd

# Create a Polars DataFrame
df_polars = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000.0, 60000.0, 75000.0],
    "active": [True, False, True]
})

# Convert to Pandas
df_pandas = df_polars.to_pandas()

print(type(df_pandas))
# <class 'pandas.core.frame.DataFrame'>

print(df_pandas)
#       name  age   salary  active
# 0    Alice   25  50000.0    True
# 1      Bob   30  60000.0   False
# 2  Charlie   35  75000.0    True

Series conversion works identically:

# Convert a single Polars Series to Pandas Series
series_polars = pl.Series("values", [1, 2, 3, 4, 5])
series_pandas = series_polars.to_pandas()

print(type(series_pandas))
# <class 'pandas.core.series.Series'>

This covers 80% of use cases. But the devil is in the data types.

Handling Data Types During Conversion

Polars and Pandas represent data differently under the hood. Polars uses Apache Arrow memory format. Pandas traditionally uses NumPy arrays, though recent versions support Arrow-backed types.

This matters when you have nullable integers or strings with null values:

# DataFrame with null values
df_polars = pl.DataFrame({
    "id": [1, 2, None, 4],
    "name": ["Alice", None, "Charlie", "David"],
    "score": [95.5, 87.0, None, 92.0]
})

# Default conversion (NumPy-backed)
df_pandas_numpy = df_polars.to_pandas()
print(df_pandas_numpy.dtypes)
# id       float64  <- Integer became float to handle NaN!
# name      object
# score    float64

# PyArrow-backed conversion
df_pandas_arrow = df_polars.to_pandas(use_pyarrow_extension_types=True)
print(df_pandas_arrow.dtypes)
# id       int64[pyarrow]    <- Nullable integer preserved
# name     string[pyarrow]
# score    double[pyarrow]

The default NumPy-backed conversion converts nullable integers to floats because NumPy’s integer types can’t represent NaN. This is a classic Pandas gotcha that trips up developers constantly.

Using use_pyarrow_extension_types=True preserves the original semantics. The tradeoff? Some older Pandas code might not handle PyArrow-backed types correctly. Test your downstream code before committing to this approach.

Here’s a more comprehensive type mapping example:

import polars as pl
from datetime import date, datetime, timedelta

df_polars = pl.DataFrame({
    "int_col": pl.Series([1, 2, 3], dtype=pl.Int32),
    "uint_col": pl.Series([1, 2, 3], dtype=pl.UInt64),
    "float_col": pl.Series([1.0, 2.0, 3.0], dtype=pl.Float64),
    "bool_col": [True, False, True],
    "str_col": ["a", "b", "c"],
    "date_col": [date(2024, 1, 1), date(2024, 1, 2), date(2024, 1, 3)],
    "datetime_col": [datetime(2024, 1, 1, 12, 0), datetime(2024, 1, 2, 12, 0), datetime(2024, 1, 3, 12, 0)],
    "duration_col": [timedelta(days=1), timedelta(days=2), timedelta(days=3)],
})

df_pandas = df_polars.to_pandas()
print(df_pandas.dtypes)
# int_col                  int32
# uint_col                uint64
# float_col              float64
# bool_col                  bool
# str_col                 object
# date_col                object  <- Note: becomes Python date objects
# datetime_col    datetime64[us]
# duration_col   timedelta64[us]

Notice that date_col becomes object dtype containing Python date objects, not a proper datetime type. If you need datetime operations in Pandas, convert dates to datetimes in Polars first:

df_polars = df_polars.with_columns(
    pl.col("date_col").cast(pl.Datetime).alias("date_as_datetime")
)

Converting LazyFrames

Polars LazyFrames don’t exist in memory until you collect them. You can’t convert something that doesn’t exist yet.

# Create a LazyFrame
lf = pl.scan_csv("large_file.csv")

# This won't work
# lf.to_pandas()  # AttributeError: 'LazyFrame' object has no attribute 'to_pandas'

# Collect first, then convert
df_pandas = lf.collect().to_pandas()

# Or chain it
df_pandas = (
    pl.scan_csv("large_file.csv")
    .filter(pl.col("value") > 100)
    .select(["id", "value", "category"])
    .collect()
    .to_pandas()
)

The performance consideration here is significant. If you’re working with a LazyFrame, you’ve probably done so to leverage Polars’ query optimization. Collecting triggers execution of the entire query plan. Converting to Pandas then copies that data into a different memory format.

Do your filtering, aggregation, and transformation in Polars before collecting. Don’t collect a 10GB dataset just to filter it in Pandas.

Memory and Performance Considerations

Conversion isn’t free. Let’s measure it:

import polars as pl
import time

# Create a reasonably large DataFrame
n_rows = 5_000_000
df_polars = pl.DataFrame({
    "id": range(n_rows),
    "value": [float(i) * 1.5 for i in range(n_rows)],
    "category": ["cat_" + str(i % 100) for i in range(n_rows)],
})

# Benchmark default conversion
start = time.perf_counter()
df_pandas_default = df_polars.to_pandas()
default_time = time.perf_counter() - start
print(f"Default conversion: {default_time:.3f}s")

# Benchmark PyArrow conversion
start = time.perf_counter()
df_pandas_arrow = df_polars.to_pandas(use_pyarrow_extension_types=True)
arrow_time = time.perf_counter() - start
print(f"PyArrow conversion: {arrow_time:.3f}s")

# Typical output:
# Default conversion: 0.847s
# PyArrow conversion: 0.023s

The PyArrow-backed conversion is dramatically faster because it can often perform zero-copy operations. The data is already in Arrow format in Polars, and PyArrow-backed Pandas can use that same memory directly.

The default NumPy conversion requires copying and potentially transforming every value. For large datasets, this difference matters.

Memory usage follows the same pattern. NumPy-backed DataFrames often use more memory due to how they handle strings and nullable types.

Converting Back: Pandas to Polars

The reverse operation uses pl.from_pandas():

import pandas as pd
import polars as pl

# Create a Pandas DataFrame
df_pandas = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "score": [95.5, 87.0, 92.0]
})

# Convert to Polars
df_polars = pl.from_pandas(df_pandas)

print(type(df_polars))
# <class 'polars.dataframe.frame.DataFrame'>

# Round-trip verification
df_original = pl.DataFrame({
    "x": [1, 2, 3],
    "y": [4.0, 5.0, 6.0]
})

df_roundtrip = pl.from_pandas(df_original.to_pandas())

print(df_original.equals(df_roundtrip))
# True (for simple types)

For Series:

series_pandas = pd.Series([1, 2, 3, 4, 5], name="values")
series_polars = pl.from_pandas(series_pandas)

Be aware that round-trip conversion might not preserve all type information perfectly, especially for complex nested types or categorical data with specific orderings.

Best Practices and When to Convert

Minimize conversion boundaries. Every conversion costs time and risks subtle data type changes. Structure your code to keep data in one format as long as possible.

# Bad: Converting back and forth
def process_data(df_polars):
    df_pandas = df_polars.to_pandas()
    df_pandas["new_col"] = df_pandas["value"] * 2  # Could do this in Polars
    df_polars = pl.from_pandas(df_pandas)
    df_pandas = df_polars.to_pandas()
    result = df_pandas.groupby("category").mean()  # Could do this in Polars
    return pl.from_pandas(result)

# Good: Stay in Polars, convert only at the boundary
def process_data(df_polars):
    return (
        df_polars
        .with_columns((pl.col("value") * 2).alias("new_col"))
        .group_by("category")
        .mean()
    )

Convert at integration points. When you need to pass data to scikit-learn, matplotlib, or other Pandas-native libraries, convert right before the call:

from sklearn.linear_model import LinearRegression

# Keep everything in Polars until sklearn needs it
df = (
    pl.scan_parquet("training_data.parquet")
    .filter(pl.col("valid") == True)
    .select(["feature1", "feature2", "feature3", "target"])
    .collect()
)

# Convert only for sklearn
X = df.select(["feature1", "feature2", "feature3"]).to_pandas()
y = df.select("target").to_pandas()

model = LinearRegression()
model.fit(X, y)

Use PyArrow types for better compatibility. If your Pandas code can handle them, PyArrow-backed types give you faster conversion and better null handling:

# Set this as your default pattern
df_pandas = df_polars.to_pandas(use_pyarrow_extension_types=True)

Consider whether you need Pandas at all. Many libraries are adding Polars support. Check if your dependency has native Polars integration before defaulting to conversion. Plotly, Altair, and several other visualization libraries now accept Polars DataFrames directly.

The goal isn’t to avoid Pandas entirely—it’s to convert intentionally, understanding the costs and tradeoffs involved.