XML has long since become the lingua franca of machine-to-machine communication on the Internet. The format’s combination of human readability, standardization, and tool support has made working with XML an inevitability for programmers. Yet, writing code that deals in XML is an unpleasant chore in most programming languages. Scala improves this situation.
As with the Actor functionality we learned about in Chapter 9, Scala’s XML support is implemented partly as a library, with some built-in syntax support. It feels to the programmer like an entirely natural part of the language. Convenient operators add a spoonful of syntactic sugar to the task of diving deep into complex document structures, and pattern matching further sweetens the deal. Outputting XML is just as pleasant.
Unusual in programming languages and particularly handy, Scala allows inline XML. Most anywhere you might put a string, you can put XML. This feature makes templating and configuration a breeze, and lets us test our use of XML without so much as opening a file.
Let’s explore working with XML in Scala. First, we’ll look at reading and navigating an XML document. Finally, we’ll produce XML output programmatically and demonstrate uses for inline XML.
We’ll start with the basics: how to turn a string full of XML into a data structure we can work with:
// code-examples/XML/reading/from-string-script.scala
import
scala.xml._val
someXMLInAString ="""
<sammich>
<bread>wheat</bread>
<meat>salami</meat>
<condiments>
<condiment expired="true">mayo</condiment>
<condiment expired="false">mustard</condiment>
</condiments>
</sammich>
"""
val
someXML = XML.loadString(someXMLInAString) assert(someXML.isInstanceOf[scala.xml.Elem
])
All fine and well. We’ve
transformed the string into a NodeSeq
, Scala’s type for
storing a sequence of XML nodes. Were our XML document in a file on disk,
we could have used the loadFile
method from the same
package.
Since we’re supplying the
XML ourselves, we can skip the XML.loadString
step and
just assign a chunk of markup to a val
or
var
:
// code-examples/XML/reading/inline-script.scala
import
scala.xml._val
someXML =<sammich>
<bread>
wheat
</bread>
<meat>
salami
</meat>
<condiments>
<condiment expired=
"true"
>
mayo
</condiment>
<condiment expired=
"false"
>
mustard
</condiment>
</condiments>
</sammich>
assert(someXML.isInstanceOf[scala.xml.Elem
])
If we paste the previous
example into the interpreter, we can explore our sandwich using some
handy tools provided by NodeSeq
:
scala> someXML "bread" res2: scala.xml.NodeSeq = <bread>wheat</bread>
That backslash—what the
documentation calls a projection function—says,
“Find me elements named bread
.” We’ll always get a
NodeSeq
back when using a projection function. If
we’re only interested in what’s between the tags, we can use the
text
method:
scala> (someXML "bread").text res3: String = wheat
It’s valid syntax to say someXML "bread"
text
, without parentheses or the dot before the call to
text
. You’ll still get the same result, but it’s
harder to read. Parentheses make your intent clear.
We’ve only inspected the
outermost layer of our sandwich. Let’s try to get a
NodeSeq
of the condiments:
scala> someXML "condiment" res4: scala.xml.NodeSeq =
What went wrong? The
function doesn’t descend into child elements of an
XML structure. To do that, we use its sister function,
\
(two backslashes):
scala> someXML \ "condiment" res5: scala.xml.NodeSeq = <condiment expired="true">mayo</condiment> <condiment expired="false">mustard</condiment>
Much better. (We split the
single output line into two lines so it would fit on the page.) We dove
into the structure and pulled out the two
<condiment>
elements. Looks like one of the
condiments has gone bad, though. We can find out if any of the
condiments has expired by extracting its expired
attribute. All it takes is an @
before the attribute
name:
scala> (someXML \ "condiment")(0) "@expired" res6: scala.xml.NodeSeq = true
We used the
(0)
to pick the first of the two condiments that were
returned by (someXML \ "condiment")
.
The previous bit of code
extracted the value of the
expired
attribute (true
, in this
case), but it didn’t tell us which condiment is expired. If we were
handed an arbitrary XML sandwich, how would we identify the expired
condiments? We can loop through the XML:
// code-examples/XML/reading/for-loop-script.scala
for
(condiment<-
(someXML \"condiment"
)) {if
((condiment"@expired"
).text =="true"
) println("the "
+ condiment.text +" has expired!"
) }
Because
NodeSeq
inherits the same familiar attributes that
most Scala collection types carry, tools like for
loops apply directly. In the example just shown, we extract the <condiment>
nodes, loop over each
of them, and test whether or not their expired
attribute equals the string "true"
. We have to
specify that we want the text
of a given condiment
; otherwise, we’d get a string
representation of the entire line of XML.
We can also use pattern
matching on XML structures. Cases in pattern matches can be written in
terms of XML literals; expressions between curly braces
({}
) escape back to standard Scala pattern matching
syntax. To match all XML nodes in the escaped portion of a pattern
match, use an underscore (wildcard) followed by an asterisk
(_*
). To bind what you’ve matched on to a variable,
prefix the match with the variable name and an @
sign.
Let’s put all that together into one example. We’ll include the original XML document again so you can follow along as we pattern match on XML:
// code-examples/XML/reading/pattern-matching-script.scala
import
scala.xml._val
someXML =<sammich>
<bread>
wheat
</bread>
<meat>
salami
</meat>
<condiments>
<condiment expired=
"true"
>
mayo
</condiment>
<condiment expired=
"false"
>
mustard
</condiment>
</condiments>
</sammich>
someXMLmatch
{case
<sammich>
{
ingredients @_
*}
</sammich>
=>
{for
(cond @<condiments>
{
_
*}
</condiments>
<-
ingredients) println("condiments: "
+ cond.text) } }
Here, we bind the contents
of our <sammich>
structure (that is, what’s
inside the opening and closing tag) to a variable called
ingredients
. Then, as we iterate through the
ingredients in a for
loop, we assign the elements
that are between the <condiments>
tags to a
temporary variable, cond
. Each
cond
is printed.
The same tools that let us easily manipulate complex data structures in Scala are readily available for XML processing. As a readable alternative to XSLT, Scala’s XML library makes reading and parsing XML a breeze. It also gives us equally powerful tools for writing XML, which we’ll explore in the next section.
While some languages construct XML through complex object serialization mechanisms, Scala’s support for XML literals makes writing XML far simpler. Essentially, when you want XML, just write XML. To interpolate variables and expressions, escape out to Scala with curly braces, as we did in the pattern matching examples earlier:
scala> var name = "Bob" name: java.lang.String = Bob scala> val bobXML = | <person> | <name>{name}</name> | </person> bobXML: scala.xml.Elem = <person> <name>Bob</name> </person>
As we can see, the
name
variable was substituted when we constructed the
XML document assigned to bobXML
. That evaluation only
occurs once; were name
subsequently redefined, the
<name>
element of bobXML
would
still contain the string “Bob”.
For a more complete example, let’s say we’re designing that favorite latter-day “hello world,” a blogging system. We’ll start with a class to represent an Atom-friendly blog post:
// code-examples/XML/writing/post.scala
import
java.text.SimpleDateFormatimport
java.util.Dateclass
Post
(val
title:String
,val
body:String
,val
updated:Date
) {lazy
val
dashedDate = {val
dashed =new
SimpleDateFormat
("yy-MM-dd"
) dashed.format(updated) }lazy
val
atomDate = {val
rfc3339 =new
SimpleDateFormat
("yyyy-MM-dd'T'h:m:ss'-05:00'"
) rfc3339.format(updated) }lazy
val
slug = title.toLowerCase.replaceAll("
\
W"
,"-"
)lazy
val
atomId ="tag:example.com,"
+ dashedDate +":/"
+ slug }
Beyond the obvious
title
and body
attributes, we’ve
defined several lazily loaded values in our Post
class. These attributes will come in handy when we transmute our posts
into an Atom feed, the standard way to syndicate blogs between computers
on the Web. Atom documents are a flavor of XML, and a perfect
application for demonstrating the process of outputting XML with
Scala.
We’ll define an
AtomFeed
class that takes a sequence of
Post
objects as its sole argument:
// code-examples/XML/writing/atom-feed.scala
import
scala.xml.XMLclass
AtomFeed
(posts:Seq[Post]
) {val
feed =<feed xmlns=
""
>
<title>
My Blog
</title>
<subtitle>
A fancy subtitle.
</subtitle>
<link href=
""
/>
<link href=
""
rel=
"self"
/>
<updated>
{
posts(0
).atomDate}
</updated>
<author>
<name>
John Doe
</name>
<uri>
</uri>
</author>
<id>
</id>
{
for
(post<-
posts)yield
<entry>
<title>
{
post.title}
</title>
<link href=
{
""
+ post.slug +".html"
}
rel=
"alternate"
/>
<id>
{
post.atomId}
</id>
<updated>
{
post.atomDate}
</updated>
<content type=
"html"
>
{
post.body}
</content>
<author>
<name>
John Doe
</name>
<uri>
</uri>
</author>
</entry>
}
</feed>
def
write
= XML.saveFull("/tmp/atom-example.xml"
, feed,"UTF-8"
,true
,null
) }
We’re making heavy use of
the ability to escape out to Scala expressions in this example. Whenever
we need a piece of dynamic information—for example, the date of the
first post in the sequence, formatted for the Atom standard—we simply
escape out and write Scala as we normally would. In the latter half of
the <feed>
element, we use a
for
comprehension to yield
successive blocks of dynamically formatted XML.
The write
method of AtomFeed
demonstrates the use of the
saveFull
method, provided by the
scala.xml
library. saveFull
writes
an XML document to disk, optionally in different encoding schemes and
with different document type declarations. Alternately, the
save
method within the same package will make use of
any java.io.Writer
variant, should you need
buffering, piping, etc.
Writing XML with Scala is straightforward: construct the document you need with inline XML, use interpolation where dynamic content is to be substituted, and make use of the handy convenience methods to write your completed documents to disk or to other output streams.
XML has become ubiquitous in software applications, yet few languages make working with XML a simple task. We learned how Scala accelerates XML development by making it easy to read and write XML.
In the next chapter, we’ll learn how Scala provides rich support for creating your own Domain-Specific Languages (DSLs).