The eXtensible Markup Language or simply XML is the standard data format for exchanging information among computer systems. The first two recipes of this chapter show how to parse XML using Groovy. There are two parsers available in the groovy.util
package, XmlParser
and XmlSlurper
. They both expose similar API; but there are use cases for when it is more appropriate to use one or the other. In this recipe, we look at how to read XML with XmlSlurper
and its main peculiarities.
For the examples in the rest of this recipe, we will work with an XML document (shown in the following code) containing a list of works from William Shakespeare. The document is named shakespeare.xml
:
<?xml version="1.0" ?> <bib:bibliography xmlns:bib="http://bibliography.org"xmlns:lit="http://literature.org"> <bib:author>William Shakespeare</bib:author> <lit:play> <lit:year>1589</lit:year> <lit:title>The Two Gentlemen of Verona.</lit:title> </lit:play> <lit:play> <lit:year>1594</lit:year> <lit:title>Love's Labour's Lost.</lit:title> </lit:play> <lit:play> <lit:year>1594</lit:year> <lit:title>Romeo and Juliet.</lit:title> </lit:play> <lit:play> <lit:year>1595</lit:year> <lit:title>A Midsummer-Night's Dream.</lit:title> </lit:play> </bib:bibliography>
Let's go through the process of parsing the previously mentioned XML file:
XmlSlurper
is to create an instance of the class and pass a java.io.File
object, which references the file we want to read, into the parse
method:def xmlSource = new File('shakespeare.xml') def bibliography = new XmlSlurper().parse(xmlSource)
parse
method returns an implementation of groovy.util.slurpersupport.GPathResult
, which can be used to navigate the XML element tree. For example, the following code will print the text representation of the author element:println bibliography.author
.
" operator. Also, a set of finder and iterator methods are available to build complex search expressions:bibliography.play .findAll { it.year.toInteger() > 1592 } .each { println it.title }
The expressions that are used to navigate (and eventually also modify) the XML tree are referred to as GPath expressions. More examples of those expressions can be found in the Searching in XML with GPath recipe.
William Shakespeare Love's Labour's Lost. Romeo and Juliet. A Midsummer-Night's Dream.
The previous example selects all the plays written after 1592
and prints their titles.
Groovy's XmlSlurper
resides in the groovy.util
package, which is imported automatically by Groovy. That's why we do not need an import
statement for that class.
XmlSlurper
is a SAX-based parser; it loads the full document in memory, but it doesn't require extra memory to process the document using GPath. GPath expressions are lazily evaluated and no extra objects are created when evaluating the expression. XmlSlurper
is also null
-safe: when accessing an attribute that doesn't exist, it returns an empty string; the same goes for a non-existing node.
As a rule of thumb, you want to use XmlSlurper
when you intend to process only a small part of the document; while it is more efficient to use XmlParser
when you have to process the whole XML.