Scala - File System Operations (os-lib)

Java's file I/O APIs evolved through multiple iterations—`java.io.File`, `java.nio.file.Files`, and various stream classes—resulting in fragmented, verbose code. os-lib consolidates these into a...

Key Insights

  • os-lib provides a modern, type-safe alternative to Java’s verbose file I/O APIs with intuitive method names like os.read(), os.write(), and os.list() that eliminate boilerplate
  • Path manipulation in os-lib uses immutable Path objects with operator overloading (/ for joining paths) making filesystem navigation more readable than string concatenation
  • Built-in error handling and resource management eliminate common pitfalls like unclosed streams and provide consistent exception behavior across operations

Why os-lib Over Java File APIs

Java’s file I/O APIs evolved through multiple iterations—java.io.File, java.nio.file.Files, and various stream classes—resulting in fragmented, verbose code. os-lib consolidates these into a cohesive API designed for Scala’s functional paradigm.

Add os-lib to your build.sbt:

libraryDependencies += "com.lihaoyi" %% "os-lib" % "0.9.3"

The difference in approach becomes immediately apparent:

// Java NIO approach
import java.nio.file.{Files, Paths}
import java.nio.charset.StandardCharsets

val content = new String(
  Files.readAllBytes(Paths.get("/tmp/data.txt")),
  StandardCharsets.UTF_8
)

// os-lib approach
import os._

val content = os.read(os.pwd / "tmp" / "data.txt")

Path Construction and Navigation

os-lib’s Path type represents absolute filesystem paths, while RelPath represents relative paths. The / operator joins path segments:

import os._

// Absolute paths
val home: Path = os.home
val configDir: Path = home / ".config" / "myapp"

// Relative paths
val relPath: RelPath = RelPath("src") / "main" / "scala"

// Converting between types
val absolutePath: Path = os.pwd / relPath

// Path components
val file = os.pwd / "data" / "users.json"
println(file.last)           // "users.json"
println(file.ext)            // "json"
println(file.baseName)       // "users"
println(file / os.up)        // parent directory

Path operations are immutable and composable:

val baseDir = os.pwd / "output"
val reports = baseDir / "reports"
val archive = baseDir / "archive"

// Create directory structure
os.makeDir.all(reports)
os.makeDir.all(archive)

// Generate timestamped filename
val timestamp = java.time.LocalDateTime.now()
  .format(java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME)
val reportFile = reports / s"report_$timestamp.csv"

Reading Files

os-lib provides multiple methods for reading files based on your needs:

// Read entire file as string
val text: String = os.read(os.pwd / "config.txt")

// Read as byte array
val bytes: Array[Byte] = os.read.bytes(os.pwd / "image.png")

// Read lines as IndexedSeq
val lines: IndexedSeq[String] = os.read.lines(os.pwd / "data.csv")

// Stream lines for large files
val stream: geny.Generator[String] = os.read.lines.stream(
  os.pwd / "large_log.txt"
)
stream.filter(_.contains("ERROR")).take(100).foreach(println)

// Read with specific encoding
val utf8Text = os.read(
  os.pwd / "international.txt",
  charSet = java.nio.charset.StandardCharsets.UTF_8
)

For structured data processing:

case class User(id: Int, name: String, email: String)

def parseUsers(path: Path): Seq[User] = {
  os.read.lines(path)
    .drop(1) // skip header
    .map { line =>
      val Array(id, name, email) = line.split(",")
      User(id.toInt, name.trim, email.trim)
    }
}

val users = parseUsers(os.pwd / "users.csv")

Writing Files

Writing operations automatically create parent directories and handle resource cleanup:

val outputDir = os.pwd / "output"
val dataFile = outputDir / "results.txt"

// Write string (overwrites existing)
os.write(dataFile, "Initial content\n")

// Append to file
os.write.append(dataFile, "Additional line\n")

// Write bytes
val imageBytes: Array[Byte] = loadImageData()
os.write.over(outputDir / "chart.png", imageBytes)

// Write lines
val logEntries = Seq(
  "2024-01-15 10:23:45 INFO Application started",
  "2024-01-15 10:23:46 INFO Database connected"
)
os.write(outputDir / "app.log", logEntries.mkString("\n"))

For streaming writes:

import java.io.OutputStream

def writeStreamingData(path: Path, dataSource: Iterator[String]): Unit = {
  os.write.over(path, dataSource.mkString("\n"))
}

// Or with direct OutputStream access
def writeBinaryStream(path: Path)(write: OutputStream => Unit): Unit = {
  val out = os.write.outputStream(path, createFolders = true)
  try {
    write(out)
  } finally {
    out.close()
  }
}

Directory Operations

Listing and traversing directories:

// List immediate children
val entries: IndexedSeq[Path] = os.list(os.pwd / "src")

// Walk directory tree
val allScalaFiles: IndexedSeq[Path] = os.walk(os.pwd / "src")
  .filter(_.ext == "scala")

// Walk with depth control
val topLevelDirs = os.walk(os.pwd, maxDepth = 1)
  .filter(os.isDir)

// Stream large directories
os.walk.stream(os.pwd / "logs")
  .filter(_.ext == "log")
  .filter(p => os.size(p) > 1024 * 1024) // > 1MB
  .foreach { logFile =>
    println(s"Large log: $logFile (${os.size(logFile)} bytes)")
  }

Directory manipulation:

val projectDir = os.pwd / "new-project"

// Create directory (fails if exists)
os.makeDir(projectDir)

// Create with parents (like mkdir -p)
os.makeDir.all(projectDir / "src" / "main" / "scala")

// Copy directory recursively
os.copy(projectDir, os.pwd / "project-backup", replaceExisting = true)

// Move/rename
os.move(projectDir / "old-name.txt", projectDir / "new-name.txt")

// Remove directory and contents
os.remove.all(os.pwd / "temp-data")

File Metadata and Permissions

Inspecting file properties:

val file = os.pwd / "data.json"

// Basic checks
println(s"Exists: ${os.exists(file)}")
println(s"Is file: ${os.isFile(file)}")
println(s"Is directory: ${os.isDir(file)}")
println(s"Size: ${os.size(file)} bytes")

// Timestamps
val mtime = os.mtime(file) // milliseconds since epoch
val lastModified = java.time.Instant.ofEpochMilli(mtime)
println(s"Last modified: $lastModified")

// Permissions (Unix-like systems)
val perms = os.perms(file)
println(f"Permissions: ${perms.toInt()}%o") // octal notation

// Set permissions
os.perms.set(file, "rwxr-xr-x")

Working with symbolic links:

val target = os.pwd / "original.txt"
val link = os.pwd / "link.txt"

// Create symbolic link
os.symlink(link, target)

// Read link without following
println(s"Link points to: ${os.readLink(link)}")

// Check if path is a symlink
println(s"Is symlink: ${os.isLink(link)}")

Practical Example: Log Rotation

Combining os-lib operations for a real-world task:

import os._
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter

object LogRotation {
  def rotateLog(
    logFile: Path,
    archiveDir: Path,
    maxSizeMB: Int = 10,
    keepDays: Int = 30
  ): Unit = {
    if (!os.exists(logFile)) return
    
    val sizeMB = os.size(logFile) / (1024 * 1024)
    
    if (sizeMB >= maxSizeMB) {
      // Create archive directory
      os.makeDir.all(archiveDir)
      
      // Generate archive filename
      val timestamp = LocalDateTime.now()
        .format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss"))
      val archiveName = s"${logFile.baseName}_$timestamp.${logFile.ext}"
      val archivePath = archiveDir / archiveName
      
      // Move current log to archive
      os.move(logFile, archivePath)
      
      // Compress archive (requires gzip)
      os.proc("gzip", archivePath).call()
      
      // Clean old archives
      val cutoffTime = System.currentTimeMillis() - (keepDays * 24 * 60 * 60 * 1000L)
      os.list(archiveDir)
        .filter(_.ext == "gz")
        .filter(p => os.mtime(p) < cutoffTime)
        .foreach(os.remove)
      
      // Create new empty log
      os.write(logFile, "")
    }
  }
  
  def main(args: Array[String]): Unit = {
    val logFile = os.pwd / "app.log"
    val archiveDir = os.pwd / "logs" / "archive"
    
    rotateLog(logFile, archiveDir, maxSizeMB = 10, keepDays = 30)
  }
}

os-lib transforms filesystem operations from error-prone ceremony into concise, readable code. Its type-safe paths, automatic resource management, and functional API make it the pragmatic choice for Scala applications dealing with file I/O.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.