Scala - XML Processing

• Scala's native XML literals allow direct embedding of XML in code with compile-time validation, though this feature is deprecated in favor of external libraries for modern applications

Key Insights

• Scala’s native XML literals allow direct embedding of XML in code with compile-time validation, though this feature is deprecated in favor of external libraries for modern applications • The scala-xml library provides a complete API for parsing, querying, transforming, and generating XML documents with pattern matching and XPath-like operations • XML processing in Scala benefits from functional programming patterns like immutable transformations and the RuleTransformer class for complex document modifications

Native XML Support and Basic Operations

Scala historically included XML literals as a first-class language feature. While deprecated in Scala 2.13+ and removed in Scala 3, understanding this syntax remains valuable for maintaining legacy code and grasping XML processing concepts.

// Legacy XML literal syntax (Scala 2.x)
val book = <book id="123">
  <title>Functional Programming in Scala</title>
  <author>Paul Chiusano</author>
  <author>Runar Bjarnason</author>
  <year>2014</year>
</book>

// Accessing elements
println(book \ "title")  // <title>Functional Programming in Scala</title>
println((book \ "title").text)  // Functional Programming in Scala

// Accessing attributes
println((book \ "@id").text)  // 123

// Deep search
println(book \\ "author")  // All author elements

For modern Scala applications, add the scala-xml library dependency:

// build.sbt
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "2.1.0"

Parsing XML from Strings and Files

The scala-xml library provides multiple methods for loading XML from various sources:

import scala.xml._

// Parse from string
val xmlString = """<library>
  <book isbn="978-0134685991">
    <title>Effective Java</title>
    <price>45.99</price>
  </book>
</library>"""

val library = XML.loadString(xmlString)

// Parse from file
val libraryFromFile = XML.loadFile("library.xml")

// Parse from URL
val libraryFromUrl = XML.load("https://example.com/library.xml")

// Parse from InputStream
import java.io.FileInputStream
val stream = new FileInputStream("library.xml")
val libraryFromStream = XML.load(stream)
stream.close()

Error handling during parsing:

import scala.util.{Try, Success, Failure}

def safeLoadXml(path: String): Either[String, Elem] = {
  Try(XML.loadFile(path)) match {
    case Success(xml) => Right(xml)
    case Failure(ex) => Left(s"Failed to parse XML: ${ex.getMessage}")
  }
}

safeLoadXml("invalid.xml") match {
  case Right(xml) => println(s"Loaded: ${xml.label}")
  case Left(error) => println(error)
}

Querying XML with Path Operators

Scala provides operators for traversing XML structures similar to XPath:

val catalog = <catalog>
  <book category="programming">
    <title lang="en">Scala in Depth</title>
    <author>Joshua Suereth</author>
    <price>42.50</price>
  </book>
  <book category="programming">
    <title lang="en">Programming in Scala</title>
    <author>Martin Odersky</author>
    <author>Lex Spoon</author>
    <price>54.99</price>
  </book>
  <book category="fiction">
    <title lang="en">The Pragmatic Programmer</title>
    <author>David Thomas</author>
    <price>39.99</price>
  </book>
</catalog>

// Single-level search with \
val books = catalog \ "book"
println(s"Found ${books.length} books")

// Deep search with \\
val allAuthors = catalog \\ "author"
allAuthors.foreach(a => println(a.text))

// Filter by attribute
val programmingBooks = (catalog \ "book").filter(b => 
  (b \ "@category").text == "programming"
)

// Complex queries
val expensiveBooks = (catalog \ "book").filter { book =>
  (book \ "price").text.toDouble > 40.0
}

expensiveBooks.foreach { book =>
  val title = (book \ "title").text
  val price = (book \ "price").text
  println(s"$title: $$$price")
}

Pattern matching on XML nodes:

def processBook(book: Node): String = book match {
  case <book>{contents @ _*}</book> =>
    val title = (book \ "title").text
    val authors = (book \ "author").map(_.text).mkString(", ")
    s"$title by $authors"
  case _ => "Invalid book format"
}

(catalog \ "book").foreach(b => println(processBook(b)))

Transforming XML Documents

The RuleTransformer class enables sophisticated XML transformations:

import scala.xml.transform._

// Update all prices with 10% discount
object DiscountTransformer extends RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case elem @ <price>{price}</price> =>
      val newPrice = price.text.toDouble * 0.9
      <price>{f"$newPrice%.2f"}</price>
    case other => other
  }
}

val transformer = new RuleTransformer(DiscountTransformer)
val discountedCatalog = transformer(catalog)

// Add attributes
object AddIdTransformer extends RewriteRule {
  var counter = 0
  
  override def transform(n: Node): Seq[Node] = n match {
    case elem @ <book>{_*}</book> =>
      counter += 1
      elem.asInstanceOf[Elem] % Attribute(null, "id", s"book-$counter", Null)
    case other => other
  }
}

val catalogWithIds = new RuleTransformer(AddIdTransformer)(catalog)

// Remove elements
object RemoveFictionTransformer extends RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case elem @ <book>{_*}</book> 
      if (elem \ "@category").text == "fiction" => NodeSeq.Empty
    case other => other
  }
}

val programmingOnly = new RuleTransformer(RemoveFictionTransformer)(catalog)

Creating XML Programmatically

Generate XML structures dynamically:

def createBook(isbn: String, title: String, authors: Seq[String], price: Double): Elem = {
  <book isbn={isbn}>
    <title>{title}</title>
    {authors.map(author => <author>{author}</author>)}
    <price>{price}</price>
  </book>
}

val newBook = createBook(
  "978-1617290657",
  "Functional Programming in Scala",
  Seq("Paul Chiusano", "Runar Bjarnason"),
  44.99
)

// Build complex structures
case class Book(isbn: String, title: String, authors: List[String], price: Double)

def booksToXml(books: List[Book]): Elem = {
  <catalog>
    {books.map { book =>
      <book isbn={book.isbn}>
        <title>{book.title}</title>
        {book.authors.map(a => <author>{a}</author>)}
        <price>{book.price}</price>
      </book>
    }}
  </catalog>
}

val bookList = List(
  Book("123", "Book One", List("Author A"), 29.99),
  Book("456", "Book Two", List("Author B", "Author C"), 34.99)
)

val xmlCatalog = booksToXml(bookList)

Writing XML to Files

Persist XML documents with proper formatting:

import java.io.PrintWriter

// Basic file writing
def saveXml(elem: Elem, filename: String): Unit = {
  XML.save(filename, elem, "UTF-8", xmlDecl = true, doctype = null)
}

saveXml(catalog, "output.xml")

// Custom formatting with PrettyPrinter
val printer = new PrettyPrinter(80, 2)
val formatted = printer.format(catalog)

val writer = new PrintWriter("formatted.xml")
writer.write(formatted)
writer.close()

// Writing with specific encoding and declaration
def saveWithOptions(elem: Elem, filename: String): Unit = {
  val writer = new PrintWriter(filename, "UTF-8")
  writer.write("""<?xml version="1.0" encoding="UTF-8"?>""")
  writer.write("\n")
  writer.write(printer.format(elem))
  writer.close()
}

Real-World Example: REST API Response Parser

Practical implementation parsing XML API responses:

import scala.xml._
import scala.util.{Try, Success, Failure}

case class ApiBook(id: String, title: String, authors: List[String], available: Boolean)

object BookApiParser {
  def parseResponse(xmlResponse: String): Either[String, List[ApiBook]] = {
    Try {
      val xml = XML.loadString(xmlResponse)
      (xml \\ "book").map { book =>
        ApiBook(
          id = (book \ "@id").text,
          title = (book \ "title").text,
          authors = (book \ "author").map(_.text).toList,
          available = (book \ "available").text.toBoolean
        )
      }.toList
    }.toEither.left.map(ex => s"Parse error: ${ex.getMessage}")
  }
  
  def filterAvailable(books: List[ApiBook]): List[ApiBook] = 
    books.filter(_.available)
  
  def groupByAuthor(books: List[ApiBook]): Map[String, List[ApiBook]] = 
    books.flatMap(book => book.authors.map(_ -> book))
         .groupBy(_._1)
         .view.mapValues(_.map(_._2)).toMap
}

// Usage
val apiResponse = """<?xml version="1.0"?>
<response>
  <book id="1">
    <title>Scala Cookbook</title>
    <author>Alvin Alexander</author>
    <available>true</available>
  </book>
  <book id="2">
    <title>Functional Design</title>
    <author>Robert Martin</author>
    <available>false</available>
  </book>
</response>"""

BookApiParser.parseResponse(apiResponse) match {
  case Right(books) =>
    val available = BookApiParser.filterAvailable(books)
    println(s"Available books: ${available.map(_.title).mkString(", ")}")
  case Left(error) => println(error)
}

This implementation demonstrates type-safe XML processing with proper error handling, functional transformations, and practical data extraction patterns suitable for production systems.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.