Python - Read File Line by Line | Application Architect

Key Insights

Python provides multiple methods to read files line by line, each with different memory implications—readline() for manual control, readlines() for small files, and file iteration for memory efficiency
Context managers (with statement) automatically handle file closing and exception scenarios, preventing resource leaks that plague manual file handling
Processing large files requires streaming approaches that avoid loading entire contents into memory, with generators and buffered reading being the most production-ready patterns

Basic File Reading with readline()

The readline() method reads a single line from a file, advancing the file pointer to the next line. This approach gives you explicit control over when and how lines are read.

file = open('data.txt', 'r')

line = file.readline()
while line:
    print(line.strip())  # strip() removes trailing newline
    line = file.readline()

file.close()

This pattern works but has a critical flaw: if an exception occurs before close(), the file remains open. Always use context managers in production code.

Using Context Managers for Safe File Handling

The with statement ensures files are properly closed even when exceptions occur. This is the recommended approach for all file operations.

with open('data.txt', 'r') as file:
    line = file.readline()
    while line:
        print(line.strip())
        line = file.readline()

When the with block exits—whether normally or via exception—Python automatically calls file.close(). No manual cleanup required.

Iterating Directly Over File Objects

Python file objects are iterable. This is the most Pythonic way to read files line by line and offers excellent memory efficiency since lines are read on-demand.

with open('data.txt', 'r') as file:
    for line in file:
        print(line.strip())

This approach:

Uses buffered reading internally for performance
Loads one line at a time into memory
Works with files of any size
Provides clean, readable code

Reading All Lines with readlines()

The readlines() method returns a list containing all lines. Use this only for small files where you need random access to lines.

with open('data.txt', 'r') as file:
    lines = file.readlines()

# Now you can access lines by index
print(lines[0])
print(lines[-1])

# Or iterate
for line in lines:
    print(line.strip())

Warning: This loads the entire file into memory. A 1GB file creates a 1GB+ list in memory. For large files, use iteration instead.

Processing Lines with Enumeration

When you need line numbers alongside content, use enumerate() with file iteration.

with open('data.txt', 'r') as file:
    for index, line in enumerate(file, start=1):
        print(f"Line {index}: {line.strip()}")

This pattern is useful for error reporting, logging, or processing files with structured line formats.

Handling Different Encodings

Specify encoding explicitly when opening files, especially when working with non-ASCII text. The default encoding varies by platform.

# Read UTF-8 encoded file
with open('data.txt', 'r', encoding='utf-8') as file:
    for line in file:
        print(line.strip())

# Handle encoding errors gracefully
with open('data.txt', 'r', encoding='utf-8', errors='replace') as file:
    for line in file:
        print(line.strip())

The errors parameter controls behavior when encountering invalid characters:

'strict' (default): raises UnicodeDecodeError
'replace': substitutes invalid characters with �
'ignore': skips invalid characters
'backslashreplace': replaces with escaped sequences

Memory-Efficient Processing of Large Files

For files too large to fit in memory, process lines as they’re read without accumulating results.

def process_large_file(filename):
    line_count = 0
    total_length = 0
    
    with open(filename, 'r') as file:
        for line in file:
            line_count += 1
            total_length += len(line)
            
            # Process line here without storing it
            if 'ERROR' in line:
                print(f"Error found at line {line_count}")
    
    return line_count, total_length

count, length = process_large_file('large_log.txt')
print(f"Processed {count} lines, {length} total characters")

This approach maintains constant memory usage regardless of file size.

Using Generators for Filtered Reading

Create generator functions to yield only lines matching specific criteria, enabling memory-efficient filtering pipelines.

def read_matching_lines(filename, pattern):
    with open(filename, 'r') as file:
        for line in file:
            if pattern in line:
                yield line.strip()

# Use the generator
for line in read_matching_lines('app.log', 'ERROR'):
    print(line)

Generators are lazy—they only read and process lines when requested, making them ideal for chaining operations.

def read_lines(filename):
    with open(filename, 'r') as file:
        for line in file:
            yield line.strip()

def filter_errors(lines):
    for line in lines:
        if 'ERROR' in line:
            yield line

def extract_timestamp(lines):
    for line in lines:
        if line.startswith('['):
            timestamp = line.split(']')[0][1:]
            yield timestamp

# Chain operations
lines = read_lines('app.log')
errors = filter_errors(lines)
timestamps = extract_timestamp(errors)

for ts in timestamps:
    print(ts)

Reading Binary Files Line by Line

Binary files require different handling. Use 'rb' mode and work with bytes.

with open('data.bin', 'rb') as file:
    for line in file:
        # line is bytes, not str
        print(line)
        
        # Decode if needed
        text = line.decode('utf-8')
        print(text.strip())

Handling Files Without Trailing Newlines

Some files don’t end with a newline character. File iteration handles this correctly, but be aware when using readline().

with open('data.txt', 'r') as file:
    for line in file:
        # Works correctly even if last line has no newline
        print(repr(line))  # Shows '\n' presence

Reading Specific Number of Lines

To read only the first N lines without loading the entire file:

def read_first_n_lines(filename, n):
    with open(filename, 'r') as file:
        for i, line in enumerate(file):
            if i >= n:
                break
            print(line.strip())

read_first_n_lines('data.txt', 10)

Or using itertools.islice for cleaner code:

from itertools import islice

with open('data.txt', 'r') as file:
    for line in islice(file, 10):
        print(line.strip())

Performance Comparison

For a 100MB file with 1 million lines, here are typical performance characteristics:

import time

# Method 1: readlines() - loads entire file
start = time.time()
with open('large.txt', 'r') as f:
    lines = f.readlines()
    count = len(lines)
print(f"readlines(): {time.time() - start:.2f}s, Memory: HIGH")

# Method 2: iteration - memory efficient
start = time.time()
with open('large.txt', 'r') as f:
    count = sum(1 for line in f)
print(f"iteration: {time.time() - start:.2f}s, Memory: LOW")

File iteration typically matches or beats readlines() performance while using a fraction of the memory.

Error Handling Best Practices

Always handle potential file errors explicitly:

try:
    with open('data.txt', 'r') as file:
        for line in file:
            print(line.strip())
except FileNotFoundError:
    print("File not found")
except PermissionError:
    print("Permission denied")
except UnicodeDecodeError as e:
    print(f"Encoding error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

For production systems, use logging instead of print statements and implement appropriate recovery strategies.