Scala - Stream/LazyList

• Scala's LazyList (formerly Stream in Scala 2.12) provides memory-efficient processing of potentially infinite sequences through lazy evaluation, computing elements only when accessed

Key Insights

• Scala’s LazyList (formerly Stream in Scala 2.12) provides memory-efficient processing of potentially infinite sequences through lazy evaluation, computing elements only when accessed • LazyList operations are composable and chainable, allowing complex transformations without creating intermediate collections until terminal operations force evaluation • Proper tail recursion and memoization strategies prevent stack overflow and redundant computation, making LazyList suitable for processing large datasets and implementing algorithms like the Sieve of Eratosthenes

Understanding LazyList Fundamentals

LazyList is Scala’s implementation of lazy sequences where elements are computed on-demand rather than eagerly. Unlike regular Lists that evaluate all elements immediately, LazyList defers computation until you actually need the values.

// Eager evaluation with List
val eagerList = List(1, 2, 3).map { x =>
  println(s"Computing $x")
  x * 2
}
// Prints: Computing 1, Computing 2, Computing 3 immediately

// Lazy evaluation with LazyList
val lazyList = LazyList(1, 2, 3).map { x =>
  println(s"Computing $x")
  x * 2
}
// Prints nothing until elements are accessed

println(lazyList.head) // Now prints: Computing 1, then 2

The key difference: LazyList memoizes computed values, so accessing the same element twice doesn’t recompute it. This distinguishes LazyList from views, which recompute on every access.

Creating Infinite Sequences

LazyList excels at representing infinite sequences without consuming infinite memory. This is impossible with strict collections.

// Infinite sequence of natural numbers
def naturals(n: Int): LazyList[Int] = 
  n #:: naturals(n + 1)

val numbers = naturals(1)
println(numbers.take(5).toList) // List(1, 2, 3, 4, 5)

// Fibonacci sequence
val fibs: LazyList[BigInt] = {
  def loop(a: BigInt, b: BigInt): LazyList[BigInt] =
    a #:: loop(b, a + b)
  loop(0, 1)
}

println(fibs.take(10).toList)
// List(0, 1, 1, 2, 3, 5, 8, 13, 21, 34)

// Random number generator
def randoms(seed: Long): LazyList[Double] = {
  val next = (seed * 0x5DEECE66DL + 0xBL) & 0xFFFFFFFFFFFFL
  val value = (next >>> 16).toDouble / Int.MaxValue
  value #:: randoms(next)
}

val randomSeq = randoms(System.currentTimeMillis())
println(randomSeq.take(3).toList)

The #:: operator (cons) is the lazy equivalent of :: for Lists. It constructs a LazyList without evaluating the tail.

Implementing the Sieve of Eratosthenes

LazyList enables elegant implementations of algorithms that work with infinite sequences. The Sieve of Eratosthenes demonstrates this perfectly.

def sieve(nums: LazyList[Int]): LazyList[Int] = {
  nums.head #:: sieve(nums.tail.filter(_ % nums.head != 0))
}

val primes = sieve(LazyList.from(2))

println(primes.take(20).toList)
// List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71)

// Find the 10000th prime
println(primes(9999)) // 104729

// Primes between 100 and 200
println(primes.dropWhile(_ < 100).takeWhile(_ < 200).toList)

This implementation filters multiples of each prime from subsequent numbers. The lazy evaluation ensures we only compute as many primes as needed.

Memory-Efficient Data Processing

When processing large datasets, LazyList prevents loading everything into memory at once.

import scala.io.Source

// Process large log file line by line
def processLogFile(filename: String): LazyList[String] = {
  val source = Source.fromFile(filename)
  source.getLines().to(LazyList)
}

// Extract error lines without loading entire file
def findErrors(filename: String): LazyList[String] = {
  processLogFile(filename)
    .filter(_.contains("ERROR"))
    .map(_.split("\\|")(2).trim)
}

// Only processes lines until condition met
def findFirstCriticalError(filename: String): Option[String] = {
  processLogFile(filename)
    .find(line => line.contains("ERROR") && line.contains("CRITICAL"))
}

// Batch processing with grouping
def processBatches[A](items: LazyList[A], batchSize: Int): LazyList[List[A]] = {
  if (items.isEmpty) LazyList.empty
  else {
    val (batch, rest) = items.splitAt(batchSize)
    batch.toList #:: processBatches(rest, batchSize)
  }
}

val data = LazyList.range(1, 1000000)
val batches = processBatches(data, 100)
println(batches.take(3).map(_.size).toList) // List(100, 100, 100)

Avoiding Common Pitfalls

LazyList has memoization characteristics that can cause memory issues if you’re not careful.

// PROBLEM: Retains reference to entire LazyList
val numbers = LazyList.from(1)
val filtered = numbers.filter(_ % 2 == 0)
println(filtered.take(1000000).last)
// Memory leak: all 1M elements are memoized

// SOLUTION: Drop references to processed elements
def processInChunks(start: Int, chunkSize: Int): Unit = {
  val chunk = LazyList.from(start)
    .filter(_ % 2 == 0)
    .take(chunkSize)
  
  chunk.foreach(println)
  // chunk goes out of scope, allowing GC
}

// PROBLEM: Stack overflow with deep recursion
def factorial(n: BigInt): LazyList[BigInt] = {
  if (n == 0) LazyList(1)
  else n #:: factorial(n - 1).map(_ * n)
}
// factorial(100000) causes stack overflow

// SOLUTION: Use tail recursion
def factorialSeq: LazyList[BigInt] = {
  def loop(n: BigInt, acc: BigInt): LazyList[BigInt] =
    acc #:: loop(n + 1, acc * (n + 1))
  loop(0, 1)
}

println(factorialSeq(100)) // Works fine

Combining LazyList with Other Collections

LazyList integrates seamlessly with Scala’s collection ecosystem.

// Convert between collections
val list = List(1, 2, 3, 4, 5)
val lazy = list.to(LazyList)
val backToList = lazy.toList

// Interleave two LazyLists
def interleave[A](a: LazyList[A], b: LazyList[A]): LazyList[A] = {
  if (a.isEmpty) b
  else if (b.isEmpty) a
  else a.head #:: b.head #:: interleave(a.tail, b.tail)
}

val evens = LazyList.from(0).filter(_ % 2 == 0)
val odds = LazyList.from(1).filter(_ % 2 != 0)
val interleaved = interleave(evens, odds)
println(interleaved.take(10).toList)
// List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

// Zip with index for infinite sequences
val indexed = LazyList.continually(scala.util.Random.nextInt(100))
  .zipWithIndex
  .take(5)
  .toList

println(indexed) // List((42,0), (17,1), (88,2), ...)

// Sliding window over infinite sequence
val windows = LazyList.from(1)
  .sliding(3)
  .take(5)
  .map(_.toList)
  .toList

println(windows)
// List(List(1,2,3), List(2,3,4), List(3,4,5), List(4,5,6), List(5,6,7))

Performance Considerations

Understanding when to use LazyList versus other collections is critical for performance.

import scala.concurrent.duration._

def benchmark[A](name: String)(f: => A): A = {
  val start = System.nanoTime()
  val result = f
  val end = System.nanoTime()
  println(s"$name: ${(end - start) / 1000000}ms")
  result
}

val size = 1000000

// LazyList wins when you need partial results
benchmark("LazyList take") {
  LazyList.range(0, size).map(_ * 2).take(10).toList
}

benchmark("List take") {
  List.range(0, size).map(_ * 2).take(10)
}

// List wins when you need all results
benchmark("LazyList full") {
  LazyList.range(0, size).map(_ * 2).toList
}

benchmark("List full") {
  List.range(0, size).map(_ * 2)
}

// LazyList for conditional processing
def findFirstMatch(n: Int): Option[Int] = {
  LazyList.from(1)
    .map(x => x * x)
    .find(_ > n)
}

benchmark("Find match") {
  findFirstMatch(1000000)
} // Stops as soon as condition met

LazyList shines when you need to work with potentially infinite sequences, process large datasets with early termination conditions, or compose multiple transformations without creating intermediate collections. For finite collections where you need all elements, stick with strict collections like List or Vector.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.