Python - Read/Write CSV Files

The `csv` module provides straightforward methods for reading CSV files. The `csv.reader()` function returns an iterator that yields each row as a list of strings.

Key Insights

  • Python’s csv module handles CSV operations with built-in support for delimiters, quoting, and dialects, while pandas offers high-performance operations for large datasets
  • DictReader and DictWriter provide intuitive column access by name, eliminating index-based errors and improving code readability
  • Proper error handling and encoding specification (UTF-8 BOM for Excel compatibility) prevent data corruption and ensure cross-platform reliability

Basic CSV Reading with the csv Module

The csv module provides straightforward methods for reading CSV files. The csv.reader() function returns an iterator that yields each row as a list of strings.

import csv

with open('employees.csv', 'r', encoding='utf-8') as file:
    csv_reader = csv.reader(file)
    
    # Skip header row
    next(csv_reader)
    
    for row in csv_reader:
        employee_id = row[0]
        name = row[1]
        department = row[2]
        salary = float(row[3])
        print(f"{name} ({department}): ${salary:,.2f}")

For CSV files with headers, DictReader provides cleaner access to columns by name:

import csv

with open('employees.csv', 'r', encoding='utf-8') as file:
    csv_reader = csv.DictReader(file)
    
    for row in csv_reader:
        print(f"{row['name']} - {row['department']}")
        print(f"Salary: ${float(row['salary']):,.2f}\n")

Writing CSV Files

The csv.writer() function writes rows to CSV files. Use newline='' to prevent blank lines on Windows systems.

import csv

data = [
    ['employee_id', 'name', 'department', 'salary'],
    ['E001', 'John Smith', 'Engineering', '95000'],
    ['E002', 'Sarah Johnson', 'Marketing', '78000'],
    ['E003', 'Mike Chen', 'Engineering', '102000']
]

with open('output.csv', 'w', newline='', encoding='utf-8') as file:
    csv_writer = csv.writer(file)
    csv_writer.writerows(data)

For dictionary-based writing, use DictWriter:

import csv

employees = [
    {'employee_id': 'E001', 'name': 'John Smith', 'department': 'Engineering', 'salary': 95000},
    {'employee_id': 'E002', 'name': 'Sarah Johnson', 'department': 'Marketing', 'salary': 78000},
    {'employee_id': 'E003', 'name': 'Mike Chen', 'department': 'Engineering', 'salary': 102000}
]

fieldnames = ['employee_id', 'name', 'department', 'salary']

with open('employees_dict.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(employees)

Handling Custom Delimiters and Quoting

CSV files don’t always use commas. Handle tab-delimited or pipe-delimited files by specifying the delimiter:

import csv

# Read tab-delimited file
with open('data.tsv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

# Write pipe-delimited file
with open('output.txt', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, delimiter='|')
    writer.writerow(['ID', 'Name', 'Status'])
    writer.writerow(['001', 'Product A', 'Active'])

Control quoting behavior for fields containing special characters:

import csv

data = [
    ['name', 'description', 'price'],
    ['Widget', 'A "premium" product', '29.99'],
    ['Gadget', 'Includes, commas', '49.99']
]

with open('products.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerows(data)

Quoting options:

  • csv.QUOTE_MINIMAL: Quote only fields with special characters (default)
  • csv.QUOTE_ALL: Quote all fields
  • csv.QUOTE_NONNUMERIC: Quote all non-numeric fields
  • csv.QUOTE_NONE: Never quote (raises error if special characters present)

Using CSV Dialects

Dialects define CSV formatting rules. Python includes predefined dialects for common formats:

import csv

# Excel dialect (default)
with open('excel_format.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, dialect='excel')
    writer.writerow(['Name', 'Value'])
    writer.writerow(['Test', '123'])

# Unix dialect (uses \n line terminator)
with open('unix_format.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, dialect='unix')
    writer.writerow(['Name', 'Value'])

# Custom dialect
csv.register_dialect('custom', delimiter='|', quoting=csv.QUOTE_ALL, 
                     lineterminator='\n', skipinitialspace=True)

with open('custom_format.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, dialect='custom')
    writer.writerow(['Field1', 'Field2'])

Error Handling and Data Validation

Production code requires robust error handling for malformed CSV files:

import csv
import sys

def read_csv_safely(filename):
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            reader = csv.DictReader(file)
            data = []
            
            for line_num, row in enumerate(reader, start=2):
                try:
                    # Validate required fields
                    if not row.get('employee_id'):
                        print(f"Warning: Missing employee_id on line {line_num}")
                        continue
                    
                    # Validate data types
                    salary = float(row['salary'])
                    if salary < 0:
                        print(f"Warning: Negative salary on line {line_num}")
                        continue
                    
                    data.append(row)
                    
                except ValueError as e:
                    print(f"Error on line {line_num}: Invalid number format - {e}")
                except KeyError as e:
                    print(f"Error on line {line_num}: Missing column - {e}")
            
            return data
            
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found")
        sys.exit(1)
    except csv.Error as e:
        print(f"CSV Error: {e}")
        sys.exit(1)

employees = read_csv_safely('employees.csv')
print(f"Successfully loaded {len(employees)} employees")

Working with Large CSV Files Using Pandas

For large datasets, pandas provides superior performance and functionality:

import pandas as pd

# Read CSV with pandas
df = pd.read_csv('large_dataset.csv', 
                 encoding='utf-8',
                 parse_dates=['date_column'],
                 dtype={'zip_code': str})

# Display basic info
print(df.head())
print(f"Shape: {df.shape}")
print(df.info())

# Filter and transform
high_earners = df[df['salary'] > 100000]
dept_summary = df.groupby('department')['salary'].agg(['mean', 'count'])
print(dept_summary)

# Write to CSV
df.to_csv('output.csv', index=False, encoding='utf-8')

Process large files in chunks to manage memory:

import pandas as pd

chunk_size = 10000
chunks = []

for chunk in pd.read_csv('huge_file.csv', chunksize=chunk_size):
    # Process each chunk
    filtered = chunk[chunk['status'] == 'active']
    chunks.append(filtered)

# Combine results
result = pd.concat(chunks, ignore_index=True)
result.to_csv('filtered_output.csv', index=False)

Excel Compatibility and Encoding

Excel requires UTF-8 BOM encoding to properly display Unicode characters:

import csv

data = [
    ['Name', 'City', 'Country'],
    ['François', 'Paris', 'France'],
    ['José', 'Madrid', 'España'],
    ['李明', 'Beijing', '中国']
]

# Write with UTF-8 BOM for Excel
with open('excel_compatible.csv', 'w', newline='', encoding='utf-8-sig') as file:
    writer = csv.writer(file)
    writer.writerows(data)

Convert between different CSV formats:

import csv

def convert_csv_format(input_file, output_file, input_delimiter=',', output_delimiter='\t'):
    with open(input_file, 'r', encoding='utf-8') as infile, \
         open(output_file, 'w', newline='', encoding='utf-8') as outfile:
        
        reader = csv.reader(infile, delimiter=input_delimiter)
        writer = csv.writer(outfile, delimiter=output_delimiter)
        
        for row in reader:
            writer.writerow(row)

convert_csv_format('comma_delimited.csv', 'tab_delimited.tsv', ',', '\t')

Performance Optimization

For maximum performance with pure Python, use csv.reader() with minimal processing:

import csv
from collections import defaultdict

def fast_csv_aggregation(filename):
    dept_salaries = defaultdict(list)
    
    with open(filename, 'r', encoding='utf-8') as file:
        reader = csv.reader(file)
        next(reader)  # Skip header
        
        for row in reader:
            dept = row[2]
            salary = float(row[3])
            dept_salaries[dept].append(salary)
    
    # Calculate averages
    return {dept: sum(salaries) / len(salaries) 
            for dept, salaries in dept_salaries.items()}

averages = fast_csv_aggregation('employees.csv')
for dept, avg in averages.items():
    print(f"{dept}: ${avg:,.2f}")

Choose the right tool based on your requirements: use the csv module for simple operations and when dependencies must be minimal, use pandas for complex data analysis and transformations. Both approaches handle CSV files reliably when combined with proper encoding specification and error handling.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.