Python - Read File Line by Line
The `readline()` method reads a single line from a file, advancing the file pointer to the next line. This approach gives you explicit control over when and how lines are read.
Key Insights
- Python provides multiple methods to read files line by line, each with different memory implications—
readline()for manual control,readlines()for small files, and file iteration for memory efficiency - Context managers (
withstatement) automatically handle file closing and exception scenarios, preventing resource leaks that plague manual file handling - Processing large files requires streaming approaches that avoid loading entire contents into memory, with generators and buffered reading being the most production-ready patterns
Basic File Reading with readline()
The readline() method reads a single line from a file, advancing the file pointer to the next line. This approach gives you explicit control over when and how lines are read.
file = open('data.txt', 'r')
line = file.readline()
while line:
print(line.strip()) # strip() removes trailing newline
line = file.readline()
file.close()
This pattern works but has a critical flaw: if an exception occurs before close(), the file remains open. Always use context managers in production code.
Using Context Managers for Safe File Handling
The with statement ensures files are properly closed even when exceptions occur. This is the recommended approach for all file operations.
with open('data.txt', 'r') as file:
line = file.readline()
while line:
print(line.strip())
line = file.readline()
When the with block exits—whether normally or via exception—Python automatically calls file.close(). No manual cleanup required.
Iterating Directly Over File Objects
Python file objects are iterable. This is the most Pythonic way to read files line by line and offers excellent memory efficiency since lines are read on-demand.
with open('data.txt', 'r') as file:
for line in file:
print(line.strip())
This approach:
- Uses buffered reading internally for performance
- Loads one line at a time into memory
- Works with files of any size
- Provides clean, readable code
Reading All Lines with readlines()
The readlines() method returns a list containing all lines. Use this only for small files where you need random access to lines.
with open('data.txt', 'r') as file:
lines = file.readlines()
# Now you can access lines by index
print(lines[0])
print(lines[-1])
# Or iterate
for line in lines:
print(line.strip())
Warning: This loads the entire file into memory. A 1GB file creates a 1GB+ list in memory. For large files, use iteration instead.
Processing Lines with Enumeration
When you need line numbers alongside content, use enumerate() with file iteration.
with open('data.txt', 'r') as file:
for index, line in enumerate(file, start=1):
print(f"Line {index}: {line.strip()}")
This pattern is useful for error reporting, logging, or processing files with structured line formats.
Handling Different Encodings
Specify encoding explicitly when opening files, especially when working with non-ASCII text. The default encoding varies by platform.
# Read UTF-8 encoded file
with open('data.txt', 'r', encoding='utf-8') as file:
for line in file:
print(line.strip())
# Handle encoding errors gracefully
with open('data.txt', 'r', encoding='utf-8', errors='replace') as file:
for line in file:
print(line.strip())
The errors parameter controls behavior when encountering invalid characters:
'strict'(default): raisesUnicodeDecodeError'replace': substitutes invalid characters with�'ignore': skips invalid characters'backslashreplace': replaces with escaped sequences
Memory-Efficient Processing of Large Files
For files too large to fit in memory, process lines as they’re read without accumulating results.
def process_large_file(filename):
line_count = 0
total_length = 0
with open(filename, 'r') as file:
for line in file:
line_count += 1
total_length += len(line)
# Process line here without storing it
if 'ERROR' in line:
print(f"Error found at line {line_count}")
return line_count, total_length
count, length = process_large_file('large_log.txt')
print(f"Processed {count} lines, {length} total characters")
This approach maintains constant memory usage regardless of file size.
Using Generators for Filtered Reading
Create generator functions to yield only lines matching specific criteria, enabling memory-efficient filtering pipelines.
def read_matching_lines(filename, pattern):
with open(filename, 'r') as file:
for line in file:
if pattern in line:
yield line.strip()
# Use the generator
for line in read_matching_lines('app.log', 'ERROR'):
print(line)
Generators are lazy—they only read and process lines when requested, making them ideal for chaining operations.
def read_lines(filename):
with open(filename, 'r') as file:
for line in file:
yield line.strip()
def filter_errors(lines):
for line in lines:
if 'ERROR' in line:
yield line
def extract_timestamp(lines):
for line in lines:
if line.startswith('['):
timestamp = line.split(']')[0][1:]
yield timestamp
# Chain operations
lines = read_lines('app.log')
errors = filter_errors(lines)
timestamps = extract_timestamp(errors)
for ts in timestamps:
print(ts)
Reading Binary Files Line by Line
Binary files require different handling. Use 'rb' mode and work with bytes.
with open('data.bin', 'rb') as file:
for line in file:
# line is bytes, not str
print(line)
# Decode if needed
text = line.decode('utf-8')
print(text.strip())
Handling Files Without Trailing Newlines
Some files don’t end with a newline character. File iteration handles this correctly, but be aware when using readline().
with open('data.txt', 'r') as file:
for line in file:
# Works correctly even if last line has no newline
print(repr(line)) # Shows '\n' presence
Reading Specific Number of Lines
To read only the first N lines without loading the entire file:
def read_first_n_lines(filename, n):
with open(filename, 'r') as file:
for i, line in enumerate(file):
if i >= n:
break
print(line.strip())
read_first_n_lines('data.txt', 10)
Or using itertools.islice for cleaner code:
from itertools import islice
with open('data.txt', 'r') as file:
for line in islice(file, 10):
print(line.strip())
Performance Comparison
For a 100MB file with 1 million lines, here are typical performance characteristics:
import time
# Method 1: readlines() - loads entire file
start = time.time()
with open('large.txt', 'r') as f:
lines = f.readlines()
count = len(lines)
print(f"readlines(): {time.time() - start:.2f}s, Memory: HIGH")
# Method 2: iteration - memory efficient
start = time.time()
with open('large.txt', 'r') as f:
count = sum(1 for line in f)
print(f"iteration: {time.time() - start:.2f}s, Memory: LOW")
File iteration typically matches or beats readlines() performance while using a fraction of the memory.
Error Handling Best Practices
Always handle potential file errors explicitly:
try:
with open('data.txt', 'r') as file:
for line in file:
print(line.strip())
except FileNotFoundError:
print("File not found")
except PermissionError:
print("Permission denied")
except UnicodeDecodeError as e:
print(f"Encoding error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
For production systems, use logging instead of print statements and implement appropriate recovery strategies.