In the previous recipe, Reading XML using XmlSlurper, we learned how to read an XML document using the XmlSlurper
provided by Groovy. Now it's time to look at the other parser available in Groovy, groovy.util.XmlParser
. Its internal implementation differs from groovy.util.XmlSlurper
, but it exposes a very similar API when it comes to document parsing, navigation, and modification.
In this recipe, we will cover the essential usage scenarios for the XmlParser
class and its differences from XmlSlurper
.
Let's use the same shakespeare.xml
file we used in the Reading XML using XmlSlurper recipe.
XmlSlurper
. You need to create an instance of XmlParser
and pass a file reference to its parse
method as shown:def xmlSource = new File('shakespeare.xml') def bibliography = new XmlParser().parse(xmlSource)
XmlSlurper
, GPath expressions (see the Searching in XML with GPath recipe for more advanced examples) are also possible with XmlParser
. For example, the code to print the titles of all plays written after 1592
would be as follows:println bibliography.'bib:author'.text() bibliography.'lit:play' .findAll { it.'lit:year' .text().toInteger() > 1592 } .each { println it.'lit:title'.text() }
William Shakespeare Love's Labour's Lost. Romeo and Juliet. A Midsummer-Night's Dream.
Navigating XML data with XmlParser
is slightly different from XmlSlurper
. In order to find an element, you need to use its fully qualified name (FQN) including the exact prefix. Since, in our XML example, we use the bib:
prefix for the author element, we need to refer to author's data as bib:author
(or as *:author
to be more independent).
If our XML example didn't contain a FQN, then we could have referred to them in a very similar way as we did for XmlSlurper
, for example, bibliography.author
.
In step 2, you may have noticed that we have used the text
method to get the textual representation of the author
element. That's because XmlParser
returns instances of groovy.util.Node
, whose toString
method does not return the element's textual content by default. There is also the attribute
method that accepts a name and returns the given attribute. If you ask for an attribute that doesn't exist, attribute
returns null (this is the opposite behavior of XmlSlurper
that returns an empty string).
The main difference between XmlParser
and XmlSlurper
is that the first uses the groovy.util.Node
type and its GPath expressions result in lists of nodes, which are easily manipulable using our knowledge of lists and collections. Compared to XmlSlurper
, XmlParser
consumes more memory because it has to create an intermediate data structure to represent the node tree, but it makes XML tree queries a bit faster. So, it's up to developers to decide which implementation better suits their needs.