Scala - File System Operations (os-lib)
Java's file I/O APIs evolved through multiple iterations—`java.io.File`, `java.nio.file.Files`, and various stream classes—resulting in fragmented, verbose code. os-lib consolidates these into a...
Key Insights
- os-lib provides a modern, type-safe alternative to Java’s verbose file I/O APIs with intuitive method names like
os.read(),os.write(), andos.list()that eliminate boilerplate - Path manipulation in os-lib uses immutable
Pathobjects with operator overloading (/for joining paths) making filesystem navigation more readable than string concatenation - Built-in error handling and resource management eliminate common pitfalls like unclosed streams and provide consistent exception behavior across operations
Why os-lib Over Java File APIs
Java’s file I/O APIs evolved through multiple iterations—java.io.File, java.nio.file.Files, and various stream classes—resulting in fragmented, verbose code. os-lib consolidates these into a cohesive API designed for Scala’s functional paradigm.
Add os-lib to your build.sbt:
libraryDependencies += "com.lihaoyi" %% "os-lib" % "0.9.3"
The difference in approach becomes immediately apparent:
// Java NIO approach
import java.nio.file.{Files, Paths}
import java.nio.charset.StandardCharsets
val content = new String(
Files.readAllBytes(Paths.get("/tmp/data.txt")),
StandardCharsets.UTF_8
)
// os-lib approach
import os._
val content = os.read(os.pwd / "tmp" / "data.txt")
Path Construction and Navigation
os-lib’s Path type represents absolute filesystem paths, while RelPath represents relative paths. The / operator joins path segments:
import os._
// Absolute paths
val home: Path = os.home
val configDir: Path = home / ".config" / "myapp"
// Relative paths
val relPath: RelPath = RelPath("src") / "main" / "scala"
// Converting between types
val absolutePath: Path = os.pwd / relPath
// Path components
val file = os.pwd / "data" / "users.json"
println(file.last) // "users.json"
println(file.ext) // "json"
println(file.baseName) // "users"
println(file / os.up) // parent directory
Path operations are immutable and composable:
val baseDir = os.pwd / "output"
val reports = baseDir / "reports"
val archive = baseDir / "archive"
// Create directory structure
os.makeDir.all(reports)
os.makeDir.all(archive)
// Generate timestamped filename
val timestamp = java.time.LocalDateTime.now()
.format(java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME)
val reportFile = reports / s"report_$timestamp.csv"
Reading Files
os-lib provides multiple methods for reading files based on your needs:
// Read entire file as string
val text: String = os.read(os.pwd / "config.txt")
// Read as byte array
val bytes: Array[Byte] = os.read.bytes(os.pwd / "image.png")
// Read lines as IndexedSeq
val lines: IndexedSeq[String] = os.read.lines(os.pwd / "data.csv")
// Stream lines for large files
val stream: geny.Generator[String] = os.read.lines.stream(
os.pwd / "large_log.txt"
)
stream.filter(_.contains("ERROR")).take(100).foreach(println)
// Read with specific encoding
val utf8Text = os.read(
os.pwd / "international.txt",
charSet = java.nio.charset.StandardCharsets.UTF_8
)
For structured data processing:
case class User(id: Int, name: String, email: String)
def parseUsers(path: Path): Seq[User] = {
os.read.lines(path)
.drop(1) // skip header
.map { line =>
val Array(id, name, email) = line.split(",")
User(id.toInt, name.trim, email.trim)
}
}
val users = parseUsers(os.pwd / "users.csv")
Writing Files
Writing operations automatically create parent directories and handle resource cleanup:
val outputDir = os.pwd / "output"
val dataFile = outputDir / "results.txt"
// Write string (overwrites existing)
os.write(dataFile, "Initial content\n")
// Append to file
os.write.append(dataFile, "Additional line\n")
// Write bytes
val imageBytes: Array[Byte] = loadImageData()
os.write.over(outputDir / "chart.png", imageBytes)
// Write lines
val logEntries = Seq(
"2024-01-15 10:23:45 INFO Application started",
"2024-01-15 10:23:46 INFO Database connected"
)
os.write(outputDir / "app.log", logEntries.mkString("\n"))
For streaming writes:
import java.io.OutputStream
def writeStreamingData(path: Path, dataSource: Iterator[String]): Unit = {
os.write.over(path, dataSource.mkString("\n"))
}
// Or with direct OutputStream access
def writeBinaryStream(path: Path)(write: OutputStream => Unit): Unit = {
val out = os.write.outputStream(path, createFolders = true)
try {
write(out)
} finally {
out.close()
}
}
Directory Operations
Listing and traversing directories:
// List immediate children
val entries: IndexedSeq[Path] = os.list(os.pwd / "src")
// Walk directory tree
val allScalaFiles: IndexedSeq[Path] = os.walk(os.pwd / "src")
.filter(_.ext == "scala")
// Walk with depth control
val topLevelDirs = os.walk(os.pwd, maxDepth = 1)
.filter(os.isDir)
// Stream large directories
os.walk.stream(os.pwd / "logs")
.filter(_.ext == "log")
.filter(p => os.size(p) > 1024 * 1024) // > 1MB
.foreach { logFile =>
println(s"Large log: $logFile (${os.size(logFile)} bytes)")
}
Directory manipulation:
val projectDir = os.pwd / "new-project"
// Create directory (fails if exists)
os.makeDir(projectDir)
// Create with parents (like mkdir -p)
os.makeDir.all(projectDir / "src" / "main" / "scala")
// Copy directory recursively
os.copy(projectDir, os.pwd / "project-backup", replaceExisting = true)
// Move/rename
os.move(projectDir / "old-name.txt", projectDir / "new-name.txt")
// Remove directory and contents
os.remove.all(os.pwd / "temp-data")
File Metadata and Permissions
Inspecting file properties:
val file = os.pwd / "data.json"
// Basic checks
println(s"Exists: ${os.exists(file)}")
println(s"Is file: ${os.isFile(file)}")
println(s"Is directory: ${os.isDir(file)}")
println(s"Size: ${os.size(file)} bytes")
// Timestamps
val mtime = os.mtime(file) // milliseconds since epoch
val lastModified = java.time.Instant.ofEpochMilli(mtime)
println(s"Last modified: $lastModified")
// Permissions (Unix-like systems)
val perms = os.perms(file)
println(f"Permissions: ${perms.toInt()}%o") // octal notation
// Set permissions
os.perms.set(file, "rwxr-xr-x")
Working with symbolic links:
val target = os.pwd / "original.txt"
val link = os.pwd / "link.txt"
// Create symbolic link
os.symlink(link, target)
// Read link without following
println(s"Link points to: ${os.readLink(link)}")
// Check if path is a symlink
println(s"Is symlink: ${os.isLink(link)}")
Practical Example: Log Rotation
Combining os-lib operations for a real-world task:
import os._
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
object LogRotation {
def rotateLog(
logFile: Path,
archiveDir: Path,
maxSizeMB: Int = 10,
keepDays: Int = 30
): Unit = {
if (!os.exists(logFile)) return
val sizeMB = os.size(logFile) / (1024 * 1024)
if (sizeMB >= maxSizeMB) {
// Create archive directory
os.makeDir.all(archiveDir)
// Generate archive filename
val timestamp = LocalDateTime.now()
.format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss"))
val archiveName = s"${logFile.baseName}_$timestamp.${logFile.ext}"
val archivePath = archiveDir / archiveName
// Move current log to archive
os.move(logFile, archivePath)
// Compress archive (requires gzip)
os.proc("gzip", archivePath).call()
// Clean old archives
val cutoffTime = System.currentTimeMillis() - (keepDays * 24 * 60 * 60 * 1000L)
os.list(archiveDir)
.filter(_.ext == "gz")
.filter(p => os.mtime(p) < cutoffTime)
.foreach(os.remove)
// Create new empty log
os.write(logFile, "")
}
}
def main(args: Array[String]): Unit = {
val logFile = os.pwd / "app.log"
val archiveDir = os.pwd / "logs" / "archive"
rotateLog(logFile, archiveDir, maxSizeMB = 10, keepDays = 30)
}
}
os-lib transforms filesystem operations from error-prone ceremony into concise, readable code. Its type-safe paths, automatic resource management, and functional API make it the pragmatic choice for Scala applications dealing with file I/O.