Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Searching in XML with XPath

XPath is a W3C-standard query language for selecting nodes from an XML document. That is somewhat equivalent to SQL for databases or regular expressions for text. XPath is a very powerful query language, and it's beyond the scope of this book to delve into the extended XPath capabilities. This recipe will show some basic queries to select nodes and groups of nodes.

Getting ready

Let's start as usual by defining an XML document that we can use for selecting nodes:

def todos = '''
<?xml version="1.0" ?>
<todos>
  <task created="2012-09-24" owner="max">
    <title>Buy Milk</title>
    <priority>3</priority>
    <location>WalMart</location>
    <due>2012-09-25</due>
    <alarm-type>sms</alarm-type>
    <alert-before>1H</alert-before>
  </task>
  <task created="2012-09-27" owner="lana">
    <title>Pay the rent</title>
    <priority>1</priority>
    <location>Computer</location>
    <due>2012-09-30</due>
    <alarm-type>email</alarm-type>
    <alert-before>1D</alert-before>
  </task>
  <task created="2012-09-21" owner="rick">
    <title>Take out the trash</title>
    <priority>3</priority>
    <location>Home</location>
    <due>2012-09-22</due>
    <alarm-type>none</alarm-type>
    <alert-before/>
  </task>
</todos>
'''

The previous snippet represents the data for an application for personal task management; there aren't enough of those these days! Surely no self-respecting to-do application comes without a powerful filtering feature, such as finding all due tasks, or showing only the task that I can execute in a specific place.

How to do it...

Let's go into the details of this recipe.

Before we can fire our XPath queries, the document has to be parsed using the Java DOM API:

import javax.xml.parsers.DocumentBuilderFactory
import javax.xml.xpath.*

def inputStream = new ByteArrayInputStream(todos.bytes)
def myTodos = DocumentBuilderFactory.
                newInstance().
                newDocumentBuilder().
                parse(inputStream).
                documentElement

Once the XML document is parsed, we can create an instance of the XPath engine:

def xpath = XPathFactory.
              newInstance().
              newXPath()

Now we are ready to run some queries on our task list. The simplest thing to do is to print all the task names:

def nodes = xpath.evaluate(
              '//task',
              myTodos,
              XPathConstants.NODESET
            )

nodes.each {
  println xpath.evaluate('title/text()', it)
}

The output yielded is as follows:

Buy Milk
Pay the rent
Take out the trash

OK, now that we've got the API basics out of the way, it's time for more complex queries. The next example shows how to print the titles in a more Groovy way:
```
xpath.evaluate(
       '//task/title/text()',
       myTodos,
       XPathConstants.NODESET
     ).each { println it.nodeValue }
```
Note that we are using the API getNodeValue() method to extract the content of the node.

What about fetching only tasks with a low priority (that is 2, 3, 4, and so on)?

xpath.evaluate(
        '//task[priority>1]/title/text()',
        myTodos,
        XPathConstants.NODESET
     ).each { println it.nodeValue }

The output yielded is as follows:
```
Buy Milk
Take out the trash
```

Naturally, it is also possible to filter by node attribute. For instance, to fetch all tasks assigned to lana:

xpath.evaluate(
        "//task[@owner='lana']/title/text()",
        myTodos,
        XPathConstants.NODESET
      ).each { println it.nodeValue }

The output yielded is as follows:
```
Pay the rent
```

Finally, we are going to retrieve nodes based on the actual content. Let's build a slightly more complex query that retrieves tasks based on the content of the location and alarm-type tag:

xpath.evaluate(
        '//task[location="Computer" and ' +
        'contains(alarm-type, "email")]/' +
        'title/text()',
        myTodos,
        XPathConstants.NODESET
      ).each { println it.nodeValue }

The output yielded is as follows:
```
Pay the rent
```
The previous snippet uses the XPath's contains keyword to probe for a string match in a specific tag. We can also use the same keyword to search on all tags:
```
xpath.evaluate(
        "//*[contains(.,'WalMart')]/title/text()",
        myTodos,
        XPathConstants.NODESET
      ).each { println it.nodeValue }
```
The output yielded is as follows:
```
Buy milk
```

How it works...

In order to use XPath, we need to build a Java DOM parser. Neither XmlParser nor XmlSlurper offer XPath querying capabilities, so we have to resort to building a parser using the not-so-elegant Java API. In step 1, we create a new instance of DocumentBuilderFactory from which we create a DocumentBuilder. The default factory implementation defined by this plugin mechanism is com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl. The factory is used to produce a builder which parses the document.

The evaluate function used to run the XPath queries takes the following three parameters:

The actual XPath query string (in step 3, we use //task to select all the nodes named task)
The document
The desired return type

In step 3, the return type is a NodeList, a list implementation on which Groovy can easily iterate on. When the return type is not specified the default return type is a String.

There's more...

What if you need to find all the tasks due today in your task list?

XPath 1.0 (the default implementation bundled with the JDK 6 and 7, dating back to 1999) doesn't support date functions, and you are left to rather ugly string comparison tricks.

The more recent specification of XPath, v2.0, supports date functions and a plethora of new extremely powerful features. Luckily third-party libraries supporting XPath 2.0 are available and can be readily used with Groovy. One of these libraries is Saxon 9, which supports XSLT 2.0, XQuery 1.0, and XPath 2.0 at the basic level of conformance defined by W3C.

In this example, we are going to build a query that filters out the task due today. The XML data defined at the beginning of this recipe is used also in this snippet:

@Grab('net.sf.saxon:Saxon-HE:9.4')
@GrabExclude('xml-apis:xml-apis')
import javax.xml.parsers.DocumentBuilderFactory
import javax.xml.xpath.*
import net.sf.saxon.lib.NamespaceConstant

def today = '2012-09-21'

def todos = '''
<?xml version="1.0" ?>
<todos>
   ...
</todos>
'''

def inputStream = new ByteArrayInputStream(todos.bytes)
def myTodos     = DocumentBuilderFactory.
                    newInstance().
                    newDocumentBuilder().
                    parse(inputStream).
                    documentElement

// Set the SAXON XPath implementation
// by setting a System property
System.setProperty(
  'javax.xml.xpath.XPathFactory:' +
  NamespaceConstant.OBJECT_MODEL_SAXON,
  'net.sf.saxon.xpath.XPathFactoryImpl'
)

// Create the XPath 2.0 engine
def xpathSaxon = XPathFactory.
                   newInstance(
                     XPathConstants.DOM_OBJECT_MODEL
                   ).newXPath()

// Print out all task names
// expiring on 22, September 2012
xpathSaxon.evaluate(
             '//task[xs:date(due) = ' +
             'xs:date("2012-09-22")]/title/text()',
             myTodos,
             XPathConstants.NODESET
           ).each { println it.nodeValue }

// Bonus: print out all tasks
// for which the due date falls in September
xpathSaxon.evaluate(
             '//task[month-from-date(due)=9]/title/text()',
             myTodos,
             XPathConstants.NODESET
           ).each { println it.nodeValue }

Table of Contents for
Searching in XML with XPath

Searching in XML with XPath

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for Searching in XML with XPath

Create new playlist

Sign In

Sign Up

Searching in XML with XPath

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for
Searching in XML with XPath