How to Write to CSV in Polars

Key Insights

Polars provides write_csv() for eager DataFrames and sink_csv() for memory-efficient streaming writes with LazyFrames—choose based on your data size
Customizing CSV output through parameters like separator, null_value, and datetime_format gives you precise control over the output format without post-processing
Writing directly to cloud storage (S3, GCS) requires only the fsspec library and the appropriate cloud SDK—no code changes needed beyond the path

Why CSV Export in Polars Matters

Polars has rapidly become the go-to DataFrame library for Python developers who need speed. Built in Rust with a lazy evaluation engine, it consistently outperforms pandas by 10-100x on common operations. But raw speed means nothing if you can’t get your data out efficiently.

CSV remains the lingua franca of data exchange. Whether you’re feeding data to a legacy system, sharing results with non-technical stakeholders, or creating portable backups, you’ll need to write CSVs. Polars makes this straightforward while maintaining its performance edge.

Let’s dive into the practical details of writing CSV files with Polars.

Basic CSV Writing with `write_csv()`

The write_csv() method handles most CSV export needs. It’s available on any eager DataFrame and works exactly as you’d expect.

import polars as pl

# Create a sample DataFrame
df = pl.DataFrame({
    "user_id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "score": [85.5, 92.3, 78.1, 95.7, 88.4],
    "active": [True, True, False, True, False]
})

# Write to CSV - that's it
df.write_csv("users.csv")

This produces a standard CSV with headers:

user_id,name,score,active
1,Alice,85.5,true
2,Bob,92.3,true
3,Charlie,78.1,false
4,Diana,95.7,true
5,Eve,88.4,false

Notice that Polars writes boolean values as lowercase true/false. This differs from pandas’ True/False output, which can matter for downstream systems expecting specific formats.

The method returns None and writes directly to disk. For small to medium datasets (up to a few million rows), this approach works perfectly.

Customizing CSV Output

Real-world CSV requirements rarely match defaults. European systems often expect semicolons as delimiters. Legacy databases might need specific null representations. Polars handles all of this through write_csv() parameters.

import polars as pl
from datetime import datetime, date

df = pl.DataFrame({
    "product_id": [101, 102, 103, 104],
    "product_name": ["Widget A", "Gadget B", None, "Tool D"],
    "price": [29.99, None, 15.50, 42.00],
    "created_at": [
        datetime(2024, 1, 15, 10, 30),
        datetime(2024, 2, 20, 14, 45),
        datetime(2024, 3, 10, 9, 0),
        datetime(2024, 4, 5, 16, 15)
    ],
    "release_date": [
        date(2024, 2, 1),
        date(2024, 3, 15),
        date(2024, 4, 1),
        date(2024, 5, 10)
    ]
})

# European-style CSV with custom null handling
df.write_csv(
    "products_eu.csv",
    separator=";",
    null_value="N/A",
    datetime_format="%d/%m/%Y %H:%M",
    date_format="%d.%m.%Y",
    include_header=True,
    quote_style="necessary"  # Only quote when needed
)

Output:

product_id;product_name;price;created_at;release_date
101;Widget A;29.99;15/01/2024 10:30;01.02.2024
102;Gadget B;N/A;20/02/2024 14:45;15.03.2024
103;N/A;15.5;10/03/2024 09:00;01.04.2024
104;Tool D;42.0;05/04/2024 16:15;10.05.2024

Here’s a breakdown of the key parameters:

Parameter	Purpose	Default
`separator`	Field delimiter	`,`
`null_value`	String representation of nulls	empty string
`datetime_format`	Format string for datetime columns	ISO 8601
`date_format`	Format string for date columns	ISO 8601
`include_header`	Write column names as first row	`True`
`quote_style`	When to quote fields	`"necessary"`
`quote_char`	Character for quoting	`"`

For headerless output (common when appending to existing files):

df.write_csv("products_no_header.csv", include_header=False)

Writing to Different Destinations

Sometimes you don’t want a file on disk. You might need to return CSV data from an API, store it in a database blob, or write directly to cloud storage.

Writing to a Buffer

For in-memory operations, write to a StringIO or BytesIO buffer:

import polars as pl
from io import StringIO, BytesIO

df = pl.DataFrame({
    "id": [1, 2, 3],
    "value": ["a", "b", "c"]
})

# Write to StringIO for string operations
buffer = StringIO()
df.write_csv(buffer)
csv_string = buffer.getvalue()
print(csv_string)

# Write to BytesIO for binary operations (useful for HTTP responses)
byte_buffer = BytesIO()
df.write_csv(byte_buffer)
csv_bytes = byte_buffer.getvalue()

This pattern works well in web frameworks when you need to return CSV data:

from fastapi import Response

@app.get("/export")
def export_data():
    df = get_data_as_dataframe()
    buffer = BytesIO()
    df.write_csv(buffer)
    return Response(
        content=buffer.getvalue(),
        media_type="text/csv",
        headers={"Content-Disposition": "attachment; filename=export.csv"}
    )

Writing to Cloud Storage

Polars integrates with fsspec for cloud storage. Install the required backends:

pip install s3fs  # For AWS S3
pip install gcsfs  # For Google Cloud Storage
pip install adlfs  # For Azure Blob Storage

Then write using cloud URIs:

import polars as pl

df = pl.DataFrame({
    "transaction_id": [1001, 1002, 1003],
    "amount": [150.00, 275.50, 89.99]
})

# Write to S3
df.write_csv("s3://my-bucket/exports/transactions.csv")

# Write to GCS
df.write_csv("gs://my-bucket/exports/transactions.csv")

# Write to Azure Blob Storage
df.write_csv("abfs://container@account.dfs.core.windows.net/transactions.csv")

Authentication uses your environment’s default credentials (AWS credentials file, GOOGLE_APPLICATION_CREDENTIALS, Azure CLI login). For explicit credentials, use storage options:

df.write_csv(
    "s3://my-bucket/data.csv",
    storage_options={
        "key": "your-access-key",
        "secret": "your-secret-key",
        "endpoint_url": "https://custom-endpoint.com"  # For MinIO, etc.
    }
)

Handling Large Datasets with LazyFrames

Here’s where Polars truly shines. For datasets that push memory limits, sink_csv() streams data directly to disk without loading everything into memory.

The standard approach loads all data before writing:

# Memory-intensive: loads entire result into RAM
(
    pl.scan_parquet("huge_dataset/*.parquet")
    .filter(pl.col("status") == "active")
    .select(["id", "name", "value"])
    .collect()  # Everything loads here
    .write_csv("filtered_output.csv")
)

With sink_csv(), Polars streams results directly:

# Memory-efficient: streams data to disk
(
    pl.scan_parquet("huge_dataset/*.parquet")
    .filter(pl.col("status") == "active")
    .select(["id", "name", "value"])
    .sink_csv("filtered_output.csv")  # Streams without full materialization
)

The difference matters enormously at scale. I’ve processed 50GB datasets on machines with 8GB RAM using sink_csv(). The same operation with collect().write_csv() would crash.

sink_csv() supports most of the same parameters as write_csv():

(
    pl.scan_csv("raw_data/*.csv")
    .filter(pl.col("amount") > 100)
    .with_columns(
        pl.col("timestamp").str.to_datetime()
    )
    .sink_csv(
        "processed_output.csv",
        separator="\t",
        null_value="NULL",
        datetime_format="%Y-%m-%d %H:%M:%S"
    )
)

One caveat: sink_csv() doesn’t support all query operations. Operations requiring global state (like sort() on the full dataset or certain window functions) force Polars to materialize intermediate results. The engine handles this automatically, but peak memory usage may increase.

Common Gotchas and Best Practices

Float Precision

Polars writes floats with full precision by default. This can create unexpectedly large files:

df = pl.DataFrame({"value": [1/3, 2/3, 1/7]})
df.write_csv("precise.csv")
# Output: 0.3333333333333333,0.6666666666666666,0.14285714285714285

Round before export if you need cleaner output:

df.with_columns(pl.col("value").round(4)).write_csv("rounded.csv")
# Output: 0.3333,0.6667,0.1429

Datetime Handling

Always specify datetime_format when interoperability matters:

from datetime import datetime

df = pl.DataFrame({
    "event_time": [datetime(2024, 6, 15, 14, 30, 45, 123456)]
})

# Default ISO format with microseconds
df.write_csv("default.csv")
# Output: 2024-06-15T14:30:45.123456

# MySQL-compatible format
df.write_csv("mysql.csv", datetime_format="%Y-%m-%d %H:%M:%S")
# Output: 2024-06-15 14:30:45

Special Characters and Encoding

Polars writes UTF-8 by default. For systems requiring different encodings, write to a buffer and encode:

buffer = BytesIO()
df.write_csv(buffer)
csv_content = buffer.getvalue().decode('utf-8').encode('latin-1')

with open("latin1_output.csv", "wb") as f:
    f.write(csv_content)

For fields containing delimiters or quotes, Polars automatically handles quoting:

df = pl.DataFrame({
    "description": ['Contains, comma', 'Has "quotes"', 'Normal text']
})
df.write_csv("special_chars.csv")
# Output properly quotes and escapes

Performance Tips

Avoid unnecessary collects: Chain operations in lazy mode and call sink_csv() once
Write partitioned files: For very large outputs, write multiple smaller files
Use appropriate compression: Polars doesn’t compress CSV directly, but you can write to .csv.gz with some additional handling

Conclusion

Polars gives you two primary methods for CSV export: write_csv() for eager DataFrames and sink_csv() for streaming LazyFrame results.

Use write_csv() when your data fits comfortably in memory and you need maximum parameter flexibility. Use sink_csv() when processing large datasets where memory efficiency matters more than convenience.

Both methods support custom delimiters, null handling, datetime formatting, and cloud storage destinations. The API is clean and predictable—once you learn the parameters for one method, they transfer directly to the other.

For most workflows, start with write_csv(). When you hit memory limits or need to process files larger than RAM, switch to a lazy pipeline with sink_csv(). That’s the Polars way: start simple, scale when needed.