XPath is a W3C-standard query language for selecting nodes from an XML document. That is somewhat equivalent to SQL for databases or regular expressions for text. XPath is a very powerful query language, and it's beyond the scope of this book to delve into the extended XPath capabilities. This recipe will show some basic queries to select nodes and groups of nodes.
Let's start as usual by defining an XML document that we can use for selecting nodes:
def todos = ''' <?xml version="1.0" ?> <todos> <task created="2012-09-24" owner="max"> <title>Buy Milk</title> <priority>3</priority> <location>WalMart</location> <due>2012-09-25</due> <alarm-type>sms</alarm-type> <alert-before>1H</alert-before> </task> <task created="2012-09-27" owner="lana"> <title>Pay the rent</title> <priority>1</priority> <location>Computer</location> <due>2012-09-30</due> <alarm-type>email</alarm-type> <alert-before>1D</alert-before> </task> <task created="2012-09-21" owner="rick"> <title>Take out the trash</title> <priority>3</priority> <location>Home</location> <due>2012-09-22</due> <alarm-type>none</alarm-type> <alert-before/> </task> </todos> '''
The previous snippet represents the data for an application for personal task management; there aren't enough of those these days! Surely no self-respecting to-do application comes without a powerful filtering feature, such as finding all due tasks, or showing only the task that I can execute in a specific place.
Let's go into the details of this recipe.
import javax.xml.parsers.DocumentBuilderFactory import javax.xml.xpath.* def inputStream = new ByteArrayInputStream(todos.bytes) def myTodos = DocumentBuilderFactory. newInstance(). newDocumentBuilder(). parse(inputStream). documentElement
def xpath = XPathFactory. newInstance(). newXPath()
def nodes = xpath.evaluate( '//task', myTodos, XPathConstants.NODESET ) nodes.each { println xpath.evaluate('title/text()', it) }
Buy Milk Pay the rent Take out the trash
xpath.evaluate( '//task/title/text()', myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
Note that we are using the API getNodeValue()
method to extract the content of the node.
xpath.evaluate( '//task[priority>1]/title/text()', myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
Buy Milk Take out the trash
lana
:xpath.evaluate( "//task[@owner='lana']/title/text()", myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
Pay the rent
location
and alarm-type
tag:xpath.evaluate( '//task[location="Computer" and ' + 'contains(alarm-type, "email")]/' + 'title/text()', myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
Pay the rent
contains
keyword to probe for a string match in a specific tag. We can also use the same keyword to search on all tags:xpath.evaluate( "//*[contains(.,'WalMart')]/title/text()", myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
Buy milk
In order to use XPath, we need to build a Java DOM parser. Neither XmlParser
nor XmlSlurper
offer XPath querying capabilities, so we have to resort to building a parser using the not-so-elegant Java API. In step 1, we create a new instance of DocumentBuilderFactory
from which we create a DocumentBuilder
. The default factory implementation defined by this plugin mechanism is com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
. The factory is used to produce a builder which parses the document.
The evaluate
function used to run the XPath queries takes the following three parameters:
//task
to select all the nodes named task
)In step 3, the return type is a NodeList
, a list implementation on which Groovy can easily iterate on. When the return type is not specified the default return type is a String.
What if you need to find all the tasks due today in your task list?
XPath 1.0 (the default implementation bundled with the JDK 6 and 7, dating back to 1999) doesn't support date functions, and you are left to rather ugly string comparison tricks.
The more recent specification of XPath, v2.0, supports date functions and a plethora of new extremely powerful features. Luckily third-party libraries supporting XPath 2.0 are available and can be readily used with Groovy. One of these libraries is Saxon 9, which supports XSLT 2.0, XQuery 1.0, and XPath 2.0 at the basic level of conformance defined by W3C.
In this example, we are going to build a query that filters out the task due today. The XML data defined at the beginning of this recipe is used also in this snippet:
@Grab('net.sf.saxon:Saxon-HE:9.4') @GrabExclude('xml-apis:xml-apis') import javax.xml.parsers.DocumentBuilderFactory import javax.xml.xpath.* import net.sf.saxon.lib.NamespaceConstant def today = '2012-09-21' def todos = ''' <?xml version="1.0" ?> <todos> ... </todos> ''' def inputStream = new ByteArrayInputStream(todos.bytes) def myTodos = DocumentBuilderFactory. newInstance(). newDocumentBuilder(). parse(inputStream). documentElement // Set the SAXON XPath implementation // by setting a System property System.setProperty( 'javax.xml.xpath.XPathFactory:' + NamespaceConstant.OBJECT_MODEL_SAXON, 'net.sf.saxon.xpath.XPathFactoryImpl' ) // Create the XPath 2.0 engine def xpathSaxon = XPathFactory. newInstance( XPathConstants.DOM_OBJECT_MODEL ).newXPath() // Print out all task names // expiring on 22, September 2012 xpathSaxon.evaluate( '//task[xs:date(due) = ' + 'xs:date("2012-09-22")]/title/text()', myTodos, XPathConstants.NODESET ).each { println it.nodeValue } // Bonus: print out all tasks // for which the due date falls in September xpathSaxon.evaluate( '//task[month-from-date(due)=9]/title/text()', myTodos, XPathConstants.NODESET ).each { println it.nodeValue }
More information about the XPath query language can be found at http://www.w3.org/TR/xpath. Also, the following links may be useful for further reading: