In the beginning of this chapter, I took a look at a link that used an XPointer to locate a specific element in a document; that example looked like this:
<MOVIE_REVIEW xmlns:xlink = "http://www.w3.org/1999/xlink" xlink:type = "simple" xlink:show = "new" xlink:href = "http://www.starpowdermovies.com/reviews.xml# xpointer(/child::*[position()=126]/child::*[position()=first()])"> Mr. Blandings Builds His Dream House </MOVIE_REVIEW>
You can see the XPointer part here:
xpointer(/child::*[position()=126]/child::*[position()=first()])
This XPointer is appended to the URI I'm using here, following a # character.
You might notice that this XPointer expression looks a lot like the XPath expressions we used in Chapter 13, and with good reason—XPointers are built on XPaths, with certain additions that I'll note here.
Because XPointers are built on XPaths, they have all the power of XPaths. Among other things, this means that you can use an XPointer made up of location steps that target an individual location in a document without having to add any markup to that document. You can also use the id() function to target specific elements if you do want to add ID attributes to those elements.
However, because XPointers extend XPaths, there are some differences. The biggest difference is that because users can select parts of documents using the mouse, if they prefer, XPointers enable you to select points and ranges in addition to the normal XPath nodes. A point is just what it sounds like: a specific location in a document. A range is made up of all the XML between two points, which can include part of elements and text strings.
To support points and ranges, XPointer extends the idea of nodes into locations. Every location is an XPath node, a point, or a range. Therefore, node sets become location sets in the XPointer specification.
How do you create an XPointer? Like XPaths, XPointers are made of location paths that are divided into location steps, separated by the / character. A location step is made up of an axis, a node test, and zero or more predicates, like this:
axis::node_test[predicate]
For example, in the expression
child::PLANET[position() = 5]
child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate.
You can create location paths with one or more location steps, such as /descendant::PLANET/child::NAME, which selects all the <NAME> elements that have a <PLANET> parent.
XPointers augment what's available with XPaths, so I'm going to take a look at these three parts—axes, node tests, and predicates—for XPointers now.
The XPointer axes are the same as the XPath axes, and we're already familiar with them. Axes tell you which direction you should search and give you a starting position to search from. Here's the list of possible axes:
Although XPointers use the same axes as XPaths, there are some new node tests. We'll take a look at these next.
Here are the node tests you can use with XPointers, and what they match:
Node Test | Matches |
* | Any element |
node() | Any node |
text() | A text node |
comment() | A comment node |
processing-instruction() | A processing instruction node |
point() | A point in a resource |
range() | A range in a resource |
Note in particular the last two—point() and range(). These correspond to the two new constructs added in XPointers, points and ranges, and I'll talk more about them at the end of this chapter.
To extend XPath to include points and ranges, the XPointer specification created the concept of a location, which can be an XPath node, a point, or a range. However, node tests are still called node tests, not location tests; when discussing node tests, the XPointer specification specifically extends the definition of node types to include points and ranges so that node tests can work with those types. For the moment, then, we're stuck with the idea that locations can be XPath nodes, points, or ranges—and that the node types in node tests can also be XPath nodes, points, or ranges. Presumably, this contradiction will be cleared up in the final XPointer recommendation.
XPointers support the same types of expressions as XPaths. As in Chapter 13, these are the possible types of expressions you can use in predicates (refer to Chapter 13 for more information):
Node sets
Booleans
Numbers
Strings
Result tree fragments
As we saw in Chapter 13, there are functions to deal with all these types in XPath. The XPointer specification supports all those functions and also adds functions to cast subexpressions to the particular types defined in XPath, such as boolean(), string(), text(), and number(). It also adds the function unique(), to enable you to test whether an XPointer locates a single location rather than multiple locations or no locations.
XPointer also makes some additions to the functions that return location sets, and I'll take a look at those functions now.
Four XPointer functions return location sets:
Function | Description |
---|---|
id() | Returns all the elements with a specific ID |
root() | Returns a location set with one location, the root node |
here() | Returns a location set with one location, the current location |
origin() | Same as here(), except that this function is used with out-of-line links |
The id() function is the one we saw in Chapter 13 when discussing XPath. You can use this function to return all locations with a given ID.
The root() function works just like the / character—it refers to the root node (which is not the same as the document node—the root node corresponds to the very beginning of the prolog, while the document node corresponds to the top-level element in the document). The root() function is not actually part of the XPath specification, but the XPointer specification refers to it as if it were. Whether or not it will be included in the final XPointer recommendation is unclear.
The here() function refers to the current element. This is useful because XPointers are usually stored in text nodes or attribute values, and you might want to refer to the current element (not just the current node). For example, you might want to refer to the second previous <NAME> sibling element of the element that contains an XPointer, and you can use an expression like this to do so:
here()/preceding-sibling::NAME[position() = 2]
The origin() function is much like the here() function, but you use it with out-of-line links. It refers to the original element, which may be in another document, from which the current link was activated. This can be very helpful if the link itself is in a linkbase and needs to refer not to the element that the link is in, but the original element from which the link is activated.
You can use the abbreviated XPath syntax in XPointers as well. I'll take a look at a few examples, using planets.xml as the document we'll be navigating:
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS> <PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!—At perihelion—> </PLANET> <PLANET> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!—At perihelion—> </PLANET> <PLANET> <NAME>Earth</NAME> <MASS UNITS="(Earth = 1)">1</MASS> <DAY UNITS="days">1</DAY> <RADIUS UNITS="miles">2107</RADIUS> <DENSITY UNITS="(Earth = 1)">1</DENSITY> <DISTANCE UNITS="million miles">128.4</DISTANCE><!—At perihelion—> </PLANET> </PLANETS>
Here are a few XPointer examples—note that, as with XPath, you can use the [] operator; here, it extracts a particular location from a location set.
In XPath, you can locate data only at the node level. That's fine when you're working with software that handles XML data in terms of nodes, such as XSL transformations, but it's not good enough for all purposes. For example, a user working with a displayed XML document might be able to click the mouse at a particular point, or even select a range of XML content. (Note that such ranges might not start and end on node boundaries at all—they might contain parts of various trees and subtrees.) To give you finer control over XML data, you can work with points and ranges in XPointer.
How do you define a point in the XPointer specification? To do so, you must use two items—a node, and an index that can hold a positive integer or zero. The node specifies an origin for the point, and the index indicates how far the point you want is from that origin.
But what should the index be measured in terms of—characters in the document, or number of nodes? In fact, there are two different types of points, and the index value you use is measured differently for those types.
When the origin node, also called the container node, of a point can have child nodes (which means that it's an element node or the root node), then the point is called a node-point.
The index of a node-point is measured in child nodes. Here, the index of a node-point must be equal to or less than the number of child nodes in the origin node. If you use an index of zero, the point is immediately before any child nodes. An index of 5 locates a point immediately after the fifth child node.
You can use axes with node-points: A node-point's siblings are the children of the container node before or after the node-point. Points don't have any children, however.
If the origin node can't contain any child nodes, only text, then the index is measured in characters. Points like these are called character-points.
The index of a character-point must be a positive integer or zero, and less than or equal to the length of the text string in the node. If the index is zero, the point is immediately before the first character; an index of 5 locates the point immediately after the fifth character. Character-points do not have preceding or following siblings, or children.
For example, you can treat <DOCUMENT> as a container node in this document:
<DOCUMENT> Hi there! </DOCUMENT>
In this case, there are nine character-points here, one before every character. The character-point at index 0 is right before the first character, H; the character-point at index 1 is right before the i; and so on.
In addition, you should note that the XPointer specification collapses all consecutive whitespace into a single space, so four spaces is the same as one space when calculating an index for a character-point. Also, you cannot place points inside a start tag, end tag, processing instruction, or comment, or inside any markup.
To create a point, you use the start-point() function with a predicate, like this:
start-point()[position()=10]
Here's an example; say that I wanted to position a point just before the e in the text in Mercury's <NAME> element:
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS> <PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET> . . .
In this case, I could use an expression like this to refer to the point right before the character e:
xpointer(/PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 1])
Similarly, I can access the point right before the 6 in the text in Mercury's <DAY> element, 58.65 (which, of course, is text, not a number), this way:
xpointer(/PLANETS/PLANET[1]/DAY/text()/start-point()[position() = 3])
You can create ranges with two points, a start point and an end point, as long as they are in the same document and the start point is not after the end point. (If the start point and the end point are the same, the range is collapsed.) A range is all of the XML structure between those two points.
A range doesn't have to be a neat subsection of a document; it can extend from one subtree to another in the document, for example. All you need are a valid start point and a valid end point in the same document.
To create a range, you use two location paths, separated with the keyword to in the xpointer() function. For example, here's how to create a range that includes the whole word Mercury in planets.xml:
xpointer(/PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 0] to /PLANETS/PLANET[1]/NAME/text()/start-point()[position() = 7])
Here's how to create a range that includes the entire text value in Mercury's <RADIUS> element, 1516:
xpointer(/PLANETS/PLANET[1]/RADIUS/text()/start-point()[position() = 0] to /PLANETS/PLANET[1]/RADIUS/text()/start-point()[position() = 4])
The XPointer specification adds a number of functions to those available in XPath to handle ranges:
The XPointer specification also includes a function for basic string matching, string-range(). This function returns a location set with one range for every nonoverlapping match to the search string. The match operation is case- sensitive.
You can also specify optional index and length arguments to specify how many characters after the match the range should start and how many characters should be in the range. Here's how you use string-range() in general:
string-range(location_set, string, [index, [length]])
Matching an Empty StringAn empty string, "", matches to the location immediately before any character, so you can use an empty string to match to the very beginning of any string. |
For example, this expression returns a location set containing ranges covering all matches to the word "Saturn":
string-range(/, "Saturn")
To extract a specific match from the location set returned, you use the [] operator. For example, this expression returns a range covering the second occurrence of "Saturn" in the document:
string-range(/, "Saturn")[2]
This expression returns a range covering the third occurrence of the word "Jupiter" in the <NAME> element of the sixth <PLANET> element in a document:
string-range(//PLANET[6]/NAME, "Jupiter")[3]
You can also specify the range you want to return using the index (which starts with a value of 1) and length arguments. For example, this expression returns a range covering the letters er in the third occurrence of the word "Jupiter" in the <NAME> element of the sixth <PLANET> element:
string-range(//PLANET[6]/NAME, "Jupiter", 6, 2)[3]
If you want to locate a specific point, you can create a collapsed (zero-length) range, like this:
string-range(//PLANET[6]/NAME, "Jupiter", 6, 0)[3]
Another way to get a specific point is to use the start-point() function, which returns the start point of a range:
start-point(string-range(//PLANET[6]/NAME, "Jupiter", 6, 2)[3])
Here's an expression that locates the second @ character in any text node in the document and the five characters following it:
string-range(/, "@", 1, 6)[2]
Because it's so common to refer to elements by location or ID, XPointer adds a few abbreviated forms of reference. Here's an example; suppose that you wanted to locate Venus's <DAY> element in planets.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS> <PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET> <PLANET> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET> . . .
You could do so with this rather formidable expression:
http://www.starpowdermovies.com/planets.xml# xpointer(/child::*[position()=1]/ child::*[position()=2]/child::*[position()=3])
As you know from Chapter 13, the child:: part is optional in XPath expressions, and the predicate [position() = x] can be abbreviated as [x]. In XPointer, you can abbreviate this still more, omitting the [ and ]. Here's the result, which is fairly compact:
http://www.starpowdermovies.com/planets.xml#1/2/3
When you see location steps made up of single numbers in this way, those location steps correspond to the location of elements.
In a similar way, you can use words as location steps, not just numbers, if those words correspond to ID values of elements in the document. For example, say that I give Venus's <PLANET> element the ID "Planet_Of_Love". (Here I'm assuming that this element's ID attribute is declared with the type ID in a DTD.)
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS> <PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET> <PLANET ID = "Planet_Of_Love"> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET> . . .
Now you could reach the <DAY> element in Venus's <PLANET> element like this:
http://www.starpowdermovies.com/planets.xml# xpointer(//child::*[id("Planet_Of_Love")]/child::*[position()=3]
However, there's also an abbreviated version that's much shorter. In this case, I use the fact that you can use an element's ID value as a location step, and the result looks like this:
http://www.starpowdermovies.com/planets.xml#Planet_Of_Love/3
As you can see, this form is considerably shorter.
In this example, I used the id() function; to use that function, you should declare ID attributes so that they have the type ID. However, not all documents have a DTD or schema, so XPointer enables you to specify alternative patterns using multiple XPointers. Here's how that might look in this case, where I specify two XPointers in one location step:
http://www.starpowdermovies.com/planets.xml# xpointer(id("Planet_Of_Love"))xpointer(//*[@id="Planet_Of_Love"])/3
If the first XPointer, which relies on the id() function, fails, the second XPointer is supposed to be used instead, and that one locates any element that has an attribute named ID with the required value. It remains to be seen how much of this syntax applications will actually implement.
That's it for XLinks and XPointers. As you can see, there's a lot of power here—far more than with simple HTML hyperlinks. However, the XLink and XPointer standards have been proposed for quite a few years now, and there have been practically no implementations of them. Hopefully the future will bring more concrete results.
In the next chapter, I'm going to start looking at some popular XML applications in depth, starting with the most popular one of all: XHTML.