XML namespaces, in a way, are similar to Java packages because they allow creating an additional context for grouping a set of elements. We already noted some differences in namespace handling for the XmlParser
and XmlSlurper
classes in the Reading XML using XmlParser and Reading XML using XmlSlurper recipes.
In this recipe, we dig a bit deeper into the details of XML namespace support in Groovy.
Let's use the same shakespeare.xml
file we used for the Reading XML using XmlParser and Reading XML using XmlSlurper recipes.
XmlParser
requires you to specify an element name exactly as it appears in the parsed XML, including the name of the prefix used in the actual XML content. This makes the code fragile because the namespace prefixes have to match.
XmlParser
more reliable in respect to namespaces, we can resort to the groovy.xml.Namespace
class as shown in the following code:import groovy.xml.Namespace def xmlSource = new File('shakespeare.xml') def bibliography = new XmlParser().parse(xmlSource) def bib = new Namespace('http://bibliography.org', 'bib') def lit = new Namespace('http://literature.org', 'lit') println bibliography[bib.author].text() println bibliography[lit.play].findAll { it[lit.year].text().toInteger() > 1592 }.size()
XmlSlurper
has a similar API for declaring the prefixes and namespaces required to navigate the nodes, shown in the following code:def xmlSource = new File('shakespeare.xml') def bibliography = new XmlSlurper().parse(xmlSource) bibliography.declareNamespace( bib: 'http://bibliography.org', lit: 'http://literature.org') println bibliography.'bib:author' println bibliography.'lit:play'.findAll { it.'lit:year'.toInteger() > 1592 }.size()
William Shakespare 3
Both of the previous code snippets extract the author's name and the number of plays written after 1592
from our reference bibliography data XML document.
In the case of XmlParser
, we declare two instances of the groovy.xml.Namespace
class. When we fetch a property (for example, bibliography[bib.author]
) from the Namespace
object, this is really what happens:
bib.author
expression returns a value of javax.xml.namespace.QName
type.bibliography[bib.author]
is translated by Groovy into a call to the getAt
method of the groovy.util.Node
class. This method accepts a QName
as an argument and returns a node if the QName
is found, as shown next:QName ns = bib.author Node n = bibliography.getAt(ns)
In the case of XmlSlurper
, the groovy.util.slurpersupport.GPathResult
class instance (returned by the parse
method) has an additional method to declare namespaces called, not surprisingly, declareNamespace
.
Please note that unlike XmlParser
, the XmlSlurper
implementation (or more specifically GPathResult
) does not force you to depend on namespaces or prefixes at all. You can refer to elements and attributes using their local names, and only resort to using namespace prefixes if there are same local names under different namespaces.
If you try to use a fully qualified name (for example. bib:author
) before declaring the namespace within an XmlSlurper
instance, you'll get no result back. Also, namespace prefixes defined by declareNamespace
do not have to match prefixes appearing in the actual XML file.
The declareNamespace
method takes a map of prefixes and namespaces. When those are defined, you can use them to reference elements using their fully qualified names.
If you plan to switch between XmlParser
and XmlSlurper
implementations and you need to parse XML that uses namespaces, then the safest approach is to use the *:
prefix for element or attribute queries. For example:
println bibliography.'*:author'.text()