R - Write CSV File (write.csv / readr::write_csv)
The `write.csv()` function is R's built-in solution for exporting data frames to CSV format. It's a wrapper around `write.table()` with sensible defaults for comma-separated values.
Key Insights
- Base R’s
write.csv()provides simple CSV export with automatic row names, whilereadr::write_csv()offers faster performance and cleaner defaults without row names - Control critical parameters like delimiters, quoting behavior, NA representation, and encoding to ensure data integrity across different systems and applications
- For production workflows, implement proper error handling, validate output files, and consider append operations for incremental data exports
Base R write.csv() Fundamentals
The write.csv() function is R’s built-in solution for exporting data frames to CSV format. It’s a wrapper around write.table() with sensible defaults for comma-separated values.
# Basic CSV export
data <- data.frame(
id = 1:5,
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
score = c(95.5, 87.3, 92.1, 88.9, 94.2),
passed = c(TRUE, TRUE, TRUE, TRUE, TRUE)
)
write.csv(data, "output.csv")
By default, write.csv() includes row names as the first column. Disable this behavior with row.names = FALSE:
write.csv(data, "output_no_rownames.csv", row.names = FALSE)
Handling Special Characters and Quoting
CSV files require proper quoting when fields contain delimiters, quotes, or newlines. Control this with the quote parameter:
# Data with special characters
messy_data <- data.frame(
description = c("Product, Type A", "Item with \"quotes\"", "Normal text"),
value = c(100, 200, 300)
)
# Quote all character fields (default for write.csv)
write.csv(messy_data, "quoted.csv", row.names = FALSE)
# Quote only fields that need it
write.csv(messy_data, "minimal_quotes.csv", row.names = FALSE, quote = FALSE)
# Explicitly quote specific columns
write.csv(messy_data, "selective_quotes.csv",
row.names = FALSE,
quote = c(1)) # Quote only first column
Managing NA Values and Missing Data
Different systems expect different representations for missing values. Control this with the na parameter:
data_with_na <- data.frame(
id = 1:4,
value1 = c(10, NA, 30, 40),
value2 = c(NA, 20, 30, NA)
)
# Default: writes "NA"
write.csv(data_with_na, "na_default.csv", row.names = FALSE)
# Empty string for NA
write.csv(data_with_na, "na_empty.csv", row.names = FALSE, na = "")
# Custom NA representation
write.csv(data_with_na, "na_custom.csv", row.names = FALSE, na = "NULL")
Using readr::write_csv() for Performance
The readr package provides write_csv() with better defaults and significantly faster performance for large datasets:
library(readr)
# Basic usage - no row names by default
write_csv(data, "readr_output.csv")
# Explicitly handle NA values
write_csv(data_with_na, "readr_na.csv", na = "")
# Performance comparison
large_data <- data.frame(
x = rnorm(1000000),
y = sample(letters, 1000000, replace = TRUE),
z = runif(1000000)
)
system.time(write.csv(large_data, "base_large.csv", row.names = FALSE))
# user system elapsed
# 12.34 0.45 12.81
system.time(write_csv(large_data, "readr_large.csv"))
# user system elapsed
# 2.15 0.31 2.47
Appending Data to Existing Files
For incremental exports or logging scenarios, append data without overwriting:
# Initial write
initial_data <- data.frame(timestamp = Sys.time(), value = 100)
write.csv(initial_data, "log.csv", row.names = FALSE)
# Append new data (base R)
new_data <- data.frame(timestamp = Sys.time(), value = 200)
write.table(new_data, "log.csv",
sep = ",",
append = TRUE,
col.names = FALSE,
row.names = FALSE)
# Append with readr
write_csv(new_data, "log.csv", append = TRUE)
Controlling Delimiters and Formats
While CSV implies comma separation, you can customize delimiters for different requirements:
# Semicolon delimiter (common in European locales)
write.csv2(data, "semicolon.csv", row.names = FALSE)
# Custom delimiter using write.table
write.table(data, "pipe_delimited.txt",
sep = "|",
row.names = FALSE,
quote = FALSE)
# Tab-separated values
write.table(data, "tab_delimited.tsv",
sep = "\t",
row.names = FALSE,
quote = FALSE)
Handling Encoding Issues
Character encoding matters when sharing files across systems or dealing with international characters:
# Data with international characters
intl_data <- data.frame(
name = c("José", "François", "Müller", "Søren"),
city = c("São Paulo", "Montréal", "München", "København")
)
# UTF-8 encoding (recommended)
write.csv(intl_data, "utf8.csv", row.names = FALSE, fileEncoding = "UTF-8")
# Latin1 encoding
write.csv(intl_data, "latin1.csv", row.names = FALSE, fileEncoding = "latin1")
# readr defaults to UTF-8
write_csv(intl_data, "readr_utf8.csv")
Production-Ready Export Function
Implement robust CSV export with validation and error handling:
export_csv_safe <- function(data, filepath, validate = TRUE) {
# Input validation
if (!is.data.frame(data)) {
stop("Input must be a data frame")
}
if (nrow(data) == 0) {
warning("Exporting empty data frame")
}
# Create directory if needed
dir_path <- dirname(filepath)
if (!dir.exists(dir_path)) {
dir.create(dir_path, recursive = TRUE)
}
# Export with error handling
tryCatch({
write_csv(data, filepath, na = "")
# Validate output if requested
if (validate) {
verify <- read_csv(filepath, show_col_types = FALSE)
if (nrow(verify) != nrow(data)) {
stop("Row count mismatch in exported file")
}
}
message(sprintf("Successfully exported %d rows to %s",
nrow(data), filepath))
return(invisible(TRUE))
}, error = function(e) {
stop(sprintf("Export failed: %s", e$message))
})
}
# Usage
export_csv_safe(data, "output/validated_export.csv")
Exporting Large Datasets Efficiently
For datasets that approach memory limits, consider chunked writing:
export_large_csv <- function(data, filepath, chunk_size = 100000) {
n_rows <- nrow(data)
n_chunks <- ceiling(n_rows / chunk_size)
for (i in seq_len(n_chunks)) {
start_row <- (i - 1) * chunk_size + 1
end_row <- min(i * chunk_size, n_rows)
chunk <- data[start_row:end_row, ]
if (i == 1) {
write_csv(chunk, filepath)
} else {
write_csv(chunk, filepath, append = TRUE)
}
message(sprintf("Wrote chunk %d/%d", i, n_chunks))
}
}
# Generate large dataset
very_large_data <- data.frame(
id = 1:500000,
value = rnorm(500000)
)
export_large_csv(very_large_data, "large_output.csv")
Comparing write.csv() vs write_csv()
Choose the right function based on your requirements:
Use write.csv() when:
- Working in base R environments without additional dependencies
- Need row names preserved in output
- Compatibility with legacy code is essential
Use readr::write_csv() when:
- Performance matters for large datasets
- Want cleaner defaults (no row names)
- Need consistent UTF-8 encoding
- Working within tidyverse workflows
Both functions handle the core task effectively, but readr::write_csv() provides better performance and more sensible defaults for modern data workflows. For production systems processing large volumes of data, the performance gains from readr justify the additional dependency.