Parquet

Jan 23, 2026 Engineering

Spark Scala - Read Parquet File

Apache Parquet has become the de facto standard for storing analytical data in big data ecosystems. As a columnar storage format, Parquet stores data by column rather than by row, which provides…

Read more →

Nov 01, 2025 Python

PySpark - Write DataFrame to Parquet

• Parquet’s columnar storage format reduces file sizes by 75-90% compared to CSV while enabling faster analytical queries through predicate pushdown and column pruning

Read more →

Oct 25, 2025 Python

PySpark - Read Parquet File into DataFrame

Reading Parquet files in PySpark starts with initializing a SparkSession and using the DataFrame reader API. The simplest approach loads the entire file into memory as a distributed DataFrame.

Read more →

Oct 04, 2025 Pandas

Pandas - Write DataFrame to Parquet

• Parquet format reduces DataFrame storage by 80-90% compared to CSV while preserving data types and enabling faster read operations through columnar storage and built-in compression

Read more →

Sep 27, 2025 Pandas

Pandas - Read Parquet File (read_parquet)

Parquet is a columnar storage format designed for analytical workloads. Unlike row-based formats like CSV, Parquet stores data by column, enabling efficient compression and selective column reading.

Read more →

Jul 19, 2025 Python

How to Write to Parquet in Polars

Parquet has become the de facto standard for analytical data storage, and for good reason. Its columnar format enables efficient compression, predicate pushdown, and column pruning—features that…

Read more →

Jul 19, 2025 Engineering

How to Write to Parquet in PySpark

Parquet has become the de facto standard for storing analytical data in distributed systems. Its columnar storage format means queries that touch only a subset of columns skip reading irrelevant data…

Read more →

Jun 06, 2025 Pandas

How to Read Parquet Files in Pandas

Parquet is a columnar storage format that has become the de facto standard for analytical workloads. Unlike row-based formats like CSV where data is stored record by record, Parquet stores data…

Read more →

Jun 06, 2025 Python

How to Read Parquet Files in Polars

Parquet has become the de facto standard for analytical data storage. Its columnar format, efficient compression, and schema preservation make it ideal for data engineering workflows. But the tool…

Read more →

Jun 06, 2025 Engineering

How to Read Parquet Files in PySpark

Parquet has become the de facto standard for storing analytical data in big data ecosystems, and for good reason. Its columnar storage format means you only read the columns you need. Built-in…

Read more →