NumPy - Save/Load as Text File (np.savetxt, np.loadtxt)
• `np.savetxt()` and `np.loadtxt()` provide straightforward text-based serialization for NumPy arrays with human-readable output and broad compatibility across platforms
Key Insights
• np.savetxt() and np.loadtxt() provide straightforward text-based serialization for NumPy arrays with human-readable output and broad compatibility across platforms
• Text format offers advantages for version control, debugging, and data exchange but trades file size and precision for readability compared to binary formats
• Custom delimiters, formatting options, and header metadata enable flexible data export while maintaining interoperability with spreadsheet applications and other data processing tools
Basic Save and Load Operations
NumPy’s text file I/O functions handle the most common use case with minimal configuration. np.savetxt() writes arrays to text files, while np.loadtxt() reads them back.
import numpy as np
# Create sample data
data = np.array([[1.5, 2.3, 3.7],
[4.1, 5.9, 6.2],
[7.8, 8.4, 9.6]])
# Save to text file
np.savetxt('data.txt', data)
# Load from text file
loaded_data = np.loadtxt('data.txt')
print(loaded_data)
# [[1.5 2.3 3.7]
# [4.1 5.9 6.2]
# [7.8 8.4 9.6]]
print(np.array_equal(data, loaded_data)) # True
The default behavior uses space delimiters and scientific notation for floating-point numbers. The resulting file is plain text, viewable in any text editor.
Custom Delimiters and Formatting
Control output format with delimiter and formatting parameters. CSV format requires comma delimiters, while fixed-width formats need specific formatting strings.
# Save as CSV
data = np.array([[100, 200, 300],
[400, 500, 600]])
np.savetxt('data.csv', data, delimiter=',')
# Custom formatting with precision control
float_data = np.array([[1.23456789, 2.34567890],
[3.45678901, 4.56789012]])
# Two decimal places, fixed-point notation
np.savetxt('formatted.txt', float_data, fmt='%.2f', delimiter='\t')
# Scientific notation with 4 decimal places
np.savetxt('scientific.txt', float_data, fmt='%.4e', delimiter=',')
# Mixed integer and float formatting
mixed_data = np.array([[1, 2.5], [3, 4.7]])
np.savetxt('mixed.txt', mixed_data, fmt=['%d', '%.3f'], delimiter=',')
The fmt parameter accepts printf-style format strings. For heterogeneous data, pass a list of format strings matching each column.
Headers and Footers
Add metadata to files using header and footer parameters. This improves file documentation and compatibility with analysis tools.
# Add descriptive header
data = np.random.randn(5, 3)
header_text = """Temperature, Pressure, Humidity
Sensor readings from 2024-01-15
Units: Celsius, kPa, Percent"""
np.savetxt('sensor_data.csv',
data,
delimiter=',',
header=header_text,
comments='# ')
# The file will contain:
# # Temperature, Pressure, Humidity
# # Sensor readings from 2024-01-15
# # Units: Celsius, kPa, Percent
# -0.5 1.2 0.8
# ...
Headers are prefixed with the comments string (default '# '). Set comments='' for headerless CSV files compatible with strict parsers.
# CSV without comment prefix
np.savetxt('clean.csv',
data,
delimiter=',',
header='temp,pressure,humidity',
comments='')
Loading with Custom Parameters
np.loadtxt() mirrors savetxt() parameters for reading custom formats. Handle different delimiters, skip rows, and select specific columns.
# Skip header rows
data = np.loadtxt('sensor_data.csv', delimiter=',', skiprows=3)
# Load specific columns (0-indexed)
subset = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))
# Unpack columns into separate arrays
col1, col2, col3 = np.loadtxt('data.csv', delimiter=',', unpack=True)
print(col1.shape) # (n,) - 1D array per column
The usecols parameter accepts column indices or names when combined with column name detection. unpack=True transposes the result, creating one array per column.
Handling Missing Data
Real-world datasets contain missing values. NumPy provides converters for custom parsing and missing data handling.
# Sample data with missing values (saved as 'incomplete.csv'):
# 1.5,2.3,N/A
# 4.1,,6.2
# 7.8,8.4,9.6
# Define converter function
def convert_missing(value):
if value.strip() in ['', 'N/A', 'NULL']:
return np.nan
return float(value)
# Apply converter to specific columns
data = np.loadtxt('incomplete.csv',
delimiter=',',
converters={0: convert_missing,
1: convert_missing,
2: convert_missing})
print(data)
# [[ 1.5 2.3 nan]
# [ 4.1 nan 6.2]
# [ 7.8 8.4 9.6]]
# Check for missing values
print(np.isnan(data).sum()) # 2
Converters map column indices to functions that transform string values during loading. This enables custom parsing logic for dates, categorical data, or domain-specific formats.
Data Type Control
Specify output data types explicitly to prevent type inference issues or optimize memory usage.
# Integer data
int_data = np.array([[1, 2, 3],
[4, 5, 6]], dtype=np.int32)
np.savetxt('integers.txt', int_data, fmt='%d')
# Load with explicit dtype
loaded_int = np.loadtxt('integers.txt', dtype=np.int32)
# Complex numbers
complex_data = np.array([1+2j, 3+4j, 5+6j])
# Save real and imaginary parts separately
combined = np.column_stack([complex_data.real, complex_data.imag])
np.savetxt('complex.txt', combined, fmt='%.4f')
# Reconstruct complex array
loaded_parts = np.loadtxt('complex.txt')
reconstructed = loaded_parts[:, 0] + 1j * loaded_parts[:, 1]
For complex numbers, save real and imaginary components as separate columns since text format doesn’t natively support complex notation.
Performance Considerations
Text I/O trades performance for readability. For large datasets, binary formats like .npy offer significant advantages.
import time
# Generate large dataset
large_data = np.random.randn(10000, 100)
# Text format timing
start = time.time()
np.savetxt('large.txt', large_data)
text_save_time = time.time() - start
start = time.time()
loaded_text = np.loadtxt('large.txt')
text_load_time = time.time() - start
# Binary format timing
start = time.time()
np.save('large.npy', large_data)
binary_save_time = time.time() - start
start = time.time()
loaded_binary = np.load('large.npy')
binary_load_time = time.time() - start
print(f"Text save: {text_save_time:.3f}s, load: {text_load_time:.3f}s")
print(f"Binary save: {binary_save_time:.3f}s, load: {binary_load_time:.3f}s")
# File size comparison
import os
print(f"Text size: {os.path.getsize('large.txt') / 1024:.1f} KB")
print(f"Binary size: {os.path.getsize('large.npy') / 1024:.1f} KB")
Text files are typically 2-5x larger and 10-50x slower for I/O operations. Use text format for small datasets, debugging, or when human readability matters.
Compressed Text Files
Combine text format readability with compression for reduced file sizes without switching to binary.
# Save compressed (requires gzip)
import gzip
data = np.random.randn(1000, 50)
# Manual compression
with gzip.open('data.txt.gz', 'wt') as f:
np.savetxt(f, data, fmt='%.6f')
# Load compressed
with gzip.open('data.txt.gz', 'rt') as f:
loaded = np.loadtxt(f)
# File size comparison
print(f"Uncompressed: {os.path.getsize('uncompressed.txt') / 1024:.1f} KB")
print(f"Compressed: {os.path.getsize('data.txt.gz') / 1024:.1f} KB")
Gzip compression typically reduces text file sizes by 60-80% while maintaining text format advantages. The file remains readable when decompressed with standard tools.
Structured Arrays
Export structured arrays with named fields by converting to regular arrays or using custom formatting.
# Structured array
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
people = np.array([('Alice', 25, 55.5),
('Bob', 30, 75.2),
('Charlie', 35, 68.9)], dtype=dt)
# Convert to regular array for saving
data_array = np.column_stack([people['age'], people['weight']])
np.savetxt('people.csv',
data_array,
delimiter=',',
header='age,weight',
comments='',
fmt=['%d', '%.1f'])
# Load and reconstruct
loaded = np.loadtxt('people.csv', delimiter=',', skiprows=1)
print(loaded)
For full structured array serialization with field names preserved, use np.save() or consider formats like HDF5 or Parquet for complex data structures.