XPointer and XPath

XPath models an XML document as a hierarchy of nodes. XPointer incorporates the concept of a node and also makes use of XPath functions, some of which were discussed in Chapter 9.

The XPath notion of a node is generalized in XPointer to include the additional notions of a point and a range. The following XML code snippet can help illustrate these concepts.

<text>Here is some text.</text> 

All the text between the start and end tags of the text element is the value of a corresponding text node. The point exactly between the initial H and the first e of Here is a point. That type of point is a character-point.

A range could be the text string is some, with the starting point between the space character and the initial i of is, and the ending point between the final e of some and the following space character.

XPath operates using location paths. The output from one location step is used as the input for the following location step. XPointer works similarly. The location set returned by one pointer part is the start for processing by the next pointer part, if there is one.

XPointer incorporates XPath functions and adds functions that manipulate or return points and ranges.

Node tests in XPath have a counterpart—a test—in XPointer. An XPointer test can be applied to a node, point, or range.

The XPointer Framework and Schemes

As this book was being written, a significant rewrite of the XPointer specification emerged after (from a public viewpoint) a static period of several months. The XPointer draft specification was split into four documents: the XPointer Framework Working Draft and three draft specifications that describe XPointer schemes—the xpointer(), xmlns(), and element() schemes.

An XPointer processor takes as input an XML document and a URI that includes a fragment identifier. The output from an XPointer processor is either identification of a part of the XML document corresponding to the fragment identifier or an error.

The XPointer Framework

The XPointer Framework is a specification that defines the context for the xpointer(), xmlns(), and element() schemes.

Scheme-Based XPointers

A scheme-based XPointer consists of one or more pointer parts, each of which is of the following general form:

						schemeName(characterSequence)+ 

In other words, the pointer part begins with the scheme name followed by an opening parenthesis. A sequence of XML characters follows, and the pointer part is completed by a closing parenthesis.

So, an XPointer from the xpointer() scheme to select Chapter element nodes would look like this:

xpointer(//Chapter) 

If the sequence of XML characters contains an unbalanced parenthesis character, that unbalanced parenthesis must be escaped.

As indicated by the + cardinality operator, an XPointer may use more than one scheme.

You will look at each of the proposed XPointer schemes in turn. First, let’s look at the xpointer() scheme.

The xpointer() Scheme

The xpointer() scheme is the most extensive of the XPointer schemes. In fact, it was originally envisaged as being the only scheme until technical issues led to the development of the xmlns() scheme (discussed later in this chapter).

The xpointer() scheme is associated with the namespace URI www.w3.org/2001/05/XPointer.

Caution

The namespace URI given is associated with a Working Draft for the xpointer() scheme. It is possible that the namespace URI will change for the final version of the specification.



Points in the xpointer() Scheme

The xpointer() scheme recognizes two types of points: a node-point and a character-point. Both types of points are defined in terms of a container node (the node within whose content the point is situated) and an index.

A node-point is a point between nodes that are children of the container node of the point. The index for a node-point lies between zero (the index of the node-point immediately before the first node in the container node) and the number of child nodes that the container node has.

A node-point corresponds conceptually to a gap between nodes. Because character-points occur within nodes, they are envisaged as occurring between the node-points before and after their container node.

The self axis and the descendant-or-self axis of a point location contain the point itself. The parent axis contains the container node of the point. The ancestor axis contains the container node and its ancestors. The ancestor-or-self axis also contains the point itself. All other axes are empty.

Points do not have an expanded name, and the string value of a point is the empty string.

Ranges in the xpointer() Scheme

A range is defined by its start point and its end point. A range consists of all the XML structure and content between the start point and the end point of the range. The start point of a range need not be in the same node as the end point if the container node of the start point is of type root, element, or text. However, both points must be in the same XML document or external parsed entity. The start point must not come later in the document than the end point.

A special case arises when the start point and the end point are the same point. In that case, the range is referred to as a collapsed range.

A range does not have an expanded name. The string value of a range consists of the character content of text nodes inside the range.

The axes of a range are identical to the axes of its start point. The parent axis of the range contains the parent node of the start point.

The XPointer start-point() and end-point() functions can be used to navigate to the start point and end point, respectively, of a range.

Functions in the xpointer() Scheme

The xpointer() scheme adds functions to those available from the XPath function library.

The string-range() function takes two required arguments (a location set and a string) and two optional arguments (numbers). The string-range() function returns a location for each occurrence of the string argument in the location set argument.

The range() function takes a location set argument and returns a location set. The range() function returns a covering range for each location in the argument location set.

Note

A covering range is a range that totally encompasses a location. For a range, the covering range is the same range. For a point, the start point and end point of the covering range are the point itself. Definitions of other covering ranges are included in the XPointer specification.



The range-inside() function returns a location set and takes a single location set argument.

The range-to() function returns a range for each location in the context whose start point is returned by the start-point() function and whose end point is returned by the end-point() function.

The start-point() and end-point() functions respectively address the starting point and ending point of a range.

The here() function is meaningful only when the context is an XML document or an external parsed entity. If the expression being evaluated is in a text node inside an element node, the here() function returns the element node. Otherwise the here() function returns the node that directly contains the expression being evaluated.

The origin() function is meaningful only when it is processed in response to traversal of a link expressed in an XML document.

Some xpointer() Scheme Examples

To locate an element node that has an ID attribute of value "CRES99", you can write this:

xpointer(id("CRES99")) 

If you want to reference a range that includes the first and second chapters of a document, you could write this:

xpointer(//chapter[number='1'])/range-to(//chapter[number='2']) 

This assumes that the document contains chapter elements with a number attribute corresponding to the chapter number.

The xmlns() Scheme

The xmlns() scheme is intended for use with the XPointer Framework to ensure correct interpretation of namespace prefixes in XPointers.

You might assume that using namespace prefixes would be possible using only the xpointer() scheme, but take a look at the following XML code snippet and think of the ambiguity it introduces:

<myPrefix:myElement 
 xmlns:myPrefix="http://www.XMML.com/Namespace"> 
 <AnElement> 
  First piece of text. 
 </AnElement> 
 <myPrefix:myElement 
  xmlns:myPrefix="http://www.XMML.com/AnotherNamespace"> 
 <AnElement> 
  Second piece of text. 
  </AnElement> 
  <!-- Some content could go here --> 
 </myPrefix:myElement> 
</myPrefix:myElement> 

If you had the XPointer that follows, which XPointer location(s) is it intended to refer to?

xpointer(//myPrefix:myElement/AnElement) 

Is it intended to refer to both AnElement elements? Or only one? If so, which? The myPrefix:myElement is declared to be associated with two different namespaces. For an XML processor, that doesn’t cause difficulties because it uses the namespace URI, not the namespace prefix. But for XPointer you need to specify which namespace URI you are referring to.

To remove that ambiguity, the xmlns() scheme has been provided.

You can refer unambiguously to the outer myPrefix:myElement element using the following XPointer:

xmlns(myPrefix:http://www.XMML.com/Namespace) 
  xpointer(//myPrefix:myElement/AnElement) 

Or, you can refer unambiguously to the inner one using this:

xmlns(myPrefix:http://www.XMML.com/AnotherNamespace) 
  xpointer(//myPrefix:myElement/AnElement) 

Remember that an XML processor uses the expanded name rather than the namespace prefix for processing. So, if you want to access both AnElement elements, you could use both xmlns() pointer parts:

xmlns(a:http://www.XMML.com/Namespace) 
  xmlns(b:http://www.XMML.com/AnotherNamespace) 

You could use a or b as the namespace prefix in further pointer parts using the xpointer() scheme.

The element() Scheme

The element() scheme is intended to be used with the XPointer Framework to provide basic addressing of elements in XML documents.

The element() scheme can use two forms of syntax: a name or a child sequence.

Suppose you had the following XML document:

<myDocument> 
<Introduction>Some text</Introduction> 
<MainText>Some main text</MainText> 
<Postscript>Some postscript text</Postscript> 
</myDocument> 

You could select the MainText element by name using this line:

element(//MainText) 

Or, you could select it as a child sequence using this code:

element(/1/2) 

The syntax of the child sequence is to be understood as follows. The initial / character indicates that the root node is the initial context location. The 1 indicates that you are selecting the first element child of the root node—in this case, the myDocument element node. The next / character is a separator. The 2 indicates that you are selecting the second element child node of the myDocument node—in this case, the MainText element node.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset