Csv | Application Architect

Jan 22, 2026 Engineering

Spark Scala - Read CSV File

CSV files refuse to die. Despite the rise of Parquet, ORC, and Avro, you’ll still encounter CSV in nearly every data engineering project. Legacy systems export it. Business users create it in Excel….

Read more →

Jan 11, 2026 Scala

Scala - Read CSV File

For simple CSV files without complex quoting or escaping, Scala’s standard library provides sufficient functionality. Use scala.io.Source to read files line by line and split on delimiters.

Read more →

Dec 20, 2025 R

R - Write CSV File (write.csv / readr::write_csv)

The write.csv() function is R’s built-in solution for exporting data frames to CSV format. It’s a wrapper around write.table() with sensible defaults for comma-separated values.

Read more →

Dec 15, 2025 R

R - Read CSV File (read.csv / readr::read_csv)

• R offers multiple CSV reading methods—base R’s read.csv() provides universal compatibility while readr::read_csv() delivers 10x faster performance with better type inference

Read more →

Nov 25, 2025 Python

Python - Read/Write CSV Files

The csv module provides straightforward methods for reading CSV files. The csv.reader() function returns an iterator that yields each row as a list of strings.

Read more →

Nov 01, 2025 Python

PySpark - Write DataFrame to CSV File

Writing a DataFrame to CSV in PySpark is straightforward using the DataFrameWriter API. The basic syntax uses the write property followed by format specification and save path.

Read more →

Oct 25, 2025 Python

PySpark - Read Multiple CSV Files

The simplest approach to reading multiple CSV files uses wildcard patterns. PySpark’s spark.read.csv() method accepts glob patterns to match multiple files simultaneously.

Read more →

Oct 24, 2025 Python

PySpark - Read CSV File into DataFrame

PySpark’s spark.read.csv() method provides the simplest approach to load CSV files into DataFrames. The method accepts file paths from local filesystems, HDFS, S3, or other distributed storage…

Read more →

Oct 24, 2025 Python

PySpark - Read CSV with Custom Schema

• Defining custom schemas in PySpark eliminates costly schema inference and prevents data type mismatches that cause runtime failures in production pipelines

Read more →

Oct 24, 2025 Python

PySpark - Read CSV with Header and InferSchema

• PySpark’s inferSchema option automatically detects column data types by sampling data, but adds overhead by requiring an extra pass through the dataset—use it for exploration, disable it for…

Read more →

Oct 12, 2025 Python

PySpark - Convert DataFrame to CSV

PySpark DataFrames are the backbone of distributed data processing, but eventually you need to export results for reporting, data sharing, or integration with systems that expect CSV format. Unlike…

Read more →

Oct 04, 2025 Pandas

Pandas - Write DataFrame to CSV (to_csv)

• The to_csv() method provides extensive control over CSV output including delimiters, encoding, column selection, and header customization with 30+ parameters for precise formatting

Read more →

Sep 26, 2025 Pandas

Pandas - Read CSV File (read_csv)

The read_csv() function reads comma-separated value files into DataFrame objects. The simplest invocation requires only a file path:

Read more →

Sep 26, 2025 Pandas

Pandas - Read CSV Skip Rows/Header

• Use skiprows parameter with integers, lists, or callable functions to exclude specific rows when reading CSV files, reducing memory usage and processing time for large datasets

Read more →

Sep 26, 2025 Pandas

Pandas - Read CSV with Custom Delimiter

The read_csv() function in Pandas defaults to comma separation, but real-world data files frequently use alternative delimiters. The sep parameter (or its alias delimiter) accepts any string or…

Read more →

Sep 26, 2025 Pandas

Pandas - Read CSV with Different Encodings

• CSV files can have various encodings (UTF-8, Latin-1, Windows-1252) that cause UnicodeDecodeError if not handled correctly—detecting and specifying the right encoding is critical for data integrity

Read more →

Sep 07, 2025 Python

NumPy - Read CSV with np.genfromtxt()

While pandas dominates CSV loading in data science workflows, np.genfromtxt() offers advantages when you need direct NumPy array output without pandas overhead. For numerical computing pipelines,…

Read more →

Jul 19, 2025 Pandas

How to Write to CSV in Pandas

Every data pipeline eventually needs to export data somewhere. CSV remains the universal interchange format—it’s human-readable, works with Excel, imports into databases, and every programming…

Read more →

Jul 19, 2025 Python

How to Write to CSV in Polars

Polars has rapidly become the go-to DataFrame library for Python developers who need speed. Built in Rust with a lazy evaluation engine, it consistently outperforms pandas by 10-100x on common…

Read more →

Jul 19, 2025 Engineering

How to Write to CSV in PySpark

CSV remains the lingua franca of data exchange. Despite its limitations—no schema enforcement, no compression by default, verbose storage—it’s universally readable. When you’re processing terabytes…

Read more →

Jun 05, 2025 Pandas

How to Read CSV Files in Pandas

CSV files remain the lingua franca of data exchange. Despite the rise of Parquet, JSON, and database connections, you’ll encounter CSVs constantly—from client exports to API downloads to legacy…

Read more →

Jun 05, 2025 Python

How to Read CSV Files in Polars

Polars has rapidly become the go-to DataFrame library for Python developers who need speed without sacrificing usability. Built in Rust with a Python API, it consistently outperforms pandas on CSV…

Read more →

Jun 05, 2025 Engineering

How to Read CSV Files in PySpark

CSV files refuse to die. Despite better alternatives like Parquet, Avro, and ORC, you’ll encounter CSV data constantly in real-world data engineering. Vendors export it, analysts create it, legacy…

Read more →