Chapter 27. XQuery for XSLT Users

XQuery and XSLT have a lot in common: a data model, a set of built-in functions and operators, and the use of path expressions. This chapter delves further into the details of the similarities and differences between XQuery and XSLT. It also alerts XSLT 1.0/XPath 1.0 users to differences and potential compatibility issues when moving to XQuery.

XQuery and XPath

XPath started out as a language for selecting elements and attributes from an XML document while traversing its hierarchy and filtering out unwanted content. XPath 1.0 is a fairly simple yet useful recommendation that specifies path expressions and a limited set of functions. XPath has since become much more than that, encompassing a wide variety of expressions and functions, not just path expressions.

XQuery and XPath overlap to a very large degree. They have the same data model and the same set of built-in functions and operators. XPath is essentially a subset of XQuery. XQuery has a number of features that are not included in XPath, such as FLWORs and XML constructors. This is because these features are not relevant to selecting, but instead have to do with structuring or sorting query results. The two languages are consistent in that any expression that is valid in both languages evaluates to the same value by using both languages.

XQuery Versus XSLT

XQuery and XSLT are both languages designed to query and manipulate XML documents. There is an enormous amount of overlap among the features and capabilities of these two languages. In fact, the line between querying and transformation is somewhat blurred. For example, suppose someone wants a list of all the product names from the catalog, but wants to call them product_name in the results. On the one hand, this could be considered a query: “Retrieve all the name elements from the catalog, but give them the alias product_name.” On the other hand, it could be considered a transformation: “Transform all the name elements to product_name elements, and ignore everything else in the document.”

Shared Components

The good news is that if you’ve learned one of these two languages, you’re well on your way toward learning the other. XQuery and XSLT are developed together, with compatibility between them in mind. Among the components they share are:

The data model

Both languages use the data model described in “The XQuery Data Model”. They have the same concepts of sequences, atomic values, nodes, function items, maps, and arrays. Namespaces are handled identically. In addition, they share the same type system and relationship to XML Schema.

XPath

XQuery is essentially a superset of XPath. XSLT makes use of XPath expressions in many areas, from the expressions used to match templates to the instructions that copy nodes from input documents.

Built-in functions and operators

All the built-in functions described in Appendix A can be used in both XQuery and XSLT, with the same results. All the operators, such as comparison and arithmetic operators, yield identical values in both languages. XSLT has some additional built-in functions, such as current and document, that are not part of XQuery and therefore not covered in this book.

Equivalent Components

In addition to the components they directly share, XQuery and XSLT also have some features that are highly analogous in the two languages; they just use a different syntax. XSLT instructions relating to flow control (e.g., xsl:if and xsl:for-each) have direct equivalents in XQuery (conditional and FLWOR expressions). Literal result elements in XSLT are analogous to direct XML constructors in XQuery, while the use of xsl:element and xsl:attribute in XSLT is like using computed constructors in XQuery. Some of these commonly used features are listed in Table 27-1.

Table 27-1. Comparison of XSLT and XQuery features
XSLT featureXQuery equivalentChapter/Section
xsl:for-eachfor clause in a FLWOR expression“The for Clause”
XPath for expressionfor clause in a FLWOR expression“The for Clause”
xsl:variablelet clause in a FLWOR expression or global variable declaration“The let Clause”, “Variable Declarations”
xsl:sortorder by clause in a FLWOR expression“The order by Clause”
xsl:if, xsl:chooseConditional expressions (if-then-else)“Conditional (if-then-else) Expressions”
Literal result elementsDirect constructors“Direct Element Constructors”
xsl:elementComputed constructors“Computed Constructors”
xsl:attributeComputed constructors“Computed Constructors”
xsl:functionUser-defined functions“Function Declarations”
Named templatesUser-defined functions“Function Declarations”
xsl:value-ofThe path or other expression that would appear in the select attributeChapter 4
xsl:copy-ofThe path or other expression that would appear in the select attributeChapter 4
xsl:sequenceThe path or other expression that would appear in the select attributeChapter 4
xsl:includeModule import“Importing a Library Module”
xsl:templateNo direct equivalent; can be simulated with user-defined functions“Paradigm differences: push versus pull”
xsl:analyze-stringXPath analyze-string functionAppendix A, “analyze-string” section
xsl:for-each-group (with group-by attribute)FLWOR, optionally with group by clause“Grouping”
xsl:for-each-group (with group-adjacent, group-starting-with, or group-ending-with attribute)FLWOR with window clause“Windowing”

Differences

The most obvious difference between XQuery and XSLT is the syntax. A simple XQuery query might take the form:

<ul type="square">{
  for $prod in doc("catalog.xml")/catalog/product[@dept = 'ACC']
  order by $prod/name
  return <li>{data($prod/name)}</li>
}</ul>

The XSLT equivalent of this query is:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <ul type="square">
      <xsl:for-each select="catalog/product[@dept = 'ACC']">
        <xsl:sort select="name"/>
        <li><xsl:value-of select="name"/></li>
      </xsl:for-each>
    </ul>
  </xsl:template>
</xsl:stylesheet>

XQuery is somewhat less verbose, and many people find it less cumbersome than using the XML syntax of XSLT. Users who know SQL find XQuery familiar and intuitive. Its terseness also makes it more convenient to embed in program code than XSLT.

On the other hand, XSLT stylesheets use XML syntax, which means that they can be easily parsed and/or created by standard XML tools. This is convenient for the dynamic generation of stylesheets.

Paradigm differences: push versus pull

The most significant difference between XQuery and XSLT lies in their ability to react to unpredictable content. To understand this difference, we must digress briefly into the two different paradigms for developing XSLT stylesheets, which are sometimes called pull and push. Pull stylesheets, also known as program-driven stylesheets, tend to be used for highly structured, predictable documents. They use xsl:for-each and xsl:value-of elements to specifically request the information that is desired. An example of a pull stylesheet is shown in Example 27-1.

Example 27-1. A pull stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="catalog">
    <ul>
      <xsl:for-each select="product">
        <li>Product #: <xsl:value-of select="number"/></li>
        <li>Product name: <xsl:value-of select="name"/></li>
      </xsl:for-each>
    </ul>
  </xsl:template>
</xsl:stylesheet>

The stylesheet is counting on the fact that the product elements appear as children of catalog and that each product element has a single name and a single number child. The template states exactly what to do with the descendants of the catalog element, and where they can be found.

By contrast, push stylesheets use multiple templates that specify what to do for each element type, and then pass processing off to other templates by using xsl:apply-templates. Which templates are used depends on the type of children of the current node. This is sometimes called a content-driven approach, because the stylesheet is simply reacting to child elements found in the input content by matching them to templates. Example 27-2 shows a push stylesheet that is equivalent to Example 27-1.

Example 27-2. A push stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="catalog">
    <ul>
      <xsl:apply-templates/>
    </ul>
  </xsl:template>
  <xsl:template match="product">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="number">
    <li>Product #: <xsl:value-of select="."/></li>
  </xsl:template>
  <xsl:template match="name">
    <li>Product name: <xsl:value-of select="."/></li>
  </xsl:template>
  <xsl:template match="node()"/>
</xsl:stylesheet>

This may not seem like a particularly useful approach for a predictable document like the product catalog. However, consider a narrative document structure, such as an HTML paragraph. The p (paragraph) element has mixed content and may contain various inline elements such as b (bold) and i (italic) to style the text in the paragraph, as in:

<p>It was a <b>dark</b> and <i>stormy</i> night.</p>

This input is less predictable because there is no predefined number or order of the b or i children in any given paragraph. A push stylesheet on this paragraph is shown in Example 27-3.

Example 27-3. A push stylesheet on narrative content
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="p">
    <para><xsl:apply-templates/></para>
  </xsl:template>
  <xsl:template match="b">
    <Strong><xsl:apply-templates/></Strong>
  </xsl:template>
  <xsl:template match="i">
    <Italics><xsl:apply-templates/></Italics>
  </xsl:template>
</xsl:stylesheet>

It would be difficult to write a good pull stylesheet on the narrative paragraph. Example 27-4 shows an attempt.

Example 27-4. An attempt at a pull stylesheet on narrative content
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="p">
    <para>
      <xsl:for-each select="node()">
        <xsl:choose>
          <xsl:when test="self::text()">
             <xsl:value-of select="."/>
          </xsl:when>
          <xsl:when test="self::b">
            <Strong><xsl:value-of select="."/></Strong>
          </xsl:when>
          <xsl:when test="self::i">
            <Italics><xsl:value-of select="."/></Italics>
          </xsl:when>
        </xsl:choose>
      </xsl:for-each>
    </para>
  </xsl:template>
</xsl:stylesheet>

However, this stylesheet is not very robust, because it does not handle the case where a b element is embedded within an i element. It is cumbersome to maintain because the code would have to be repeated if b and i can also appear in some other parent element besides p. If a change is made, or a new type of inline element is added, it has to be changed in multiple places.

The distinction between push and pull XSLT stylesheets is relevant to the comparison with XQuery. XQuery can easily handle the scenarios supported by pull stylesheets. The equivalent of Example 27-1 in XQuery is:

for $catalog in doc("catalog.xml")/catalog
return <ul>{for $prod in $catalog/product
            return (<li>Product #: {data($prod/number)}</li>,
                    <li>Product name: {data($prod/name)}</li> )
       }</ul>

XQuery has a much harder time emulating the push stylesheet model, due to its lack of templates. In order to write a query that modifies the HTML paragraph, you could use a brittle pull model analogous to the one shown in Example 27-4. Alternatively, you could emulate templates using user-defined functions, as shown in Example 27-5. This is somewhat better in that it supports b elements within i elements and vice versa, and it specifies in one place what to do with each element type. However, it is still more cumbersome than its XSLT equivalent and does not support features of XSLT like modes, priorities, or imports that override templates.

Example 27-5. Emulating templates with user-defined functions
declare function local:apply-templates($nodes as node()*) as node()* {
  for $node in $nodes
  return typeswitch ($node)
        case element(p) return local:p-template($node)
        case element(b) return local:b-template($node)
        case element(i) return local:i-template($node)
        case element() return local:apply-templates($node/(@*|node()))
        default return $node
};
declare function local:p-template($node as node()) as node()* {
   <para>{local:apply-templates($node/(@*|node()))}</para>
};
declare function local:b-template($node as node()) as node()* {
   <Strong>{local:apply-templates($node/(@*|node()))}</Strong>
};
declare function local:i-template($node as node()) as node()* {
   <Italics>{local:apply-templates($node/(@*|node()))}</Italics>
};
local:apply-templates(doc("p.xml")/p)

It is very important to note that this does not mean that XQuery is not good for querying narrative content. On the contrary, XQuery is an easy and fast method of searching within large bodies of narrative content. However, it is not ideal for taking that retrieved narrative content and significantly transforming or restructuring it at a detailed level.

Optimization for particular use cases

Implementations of XSLT and XQuery tend to be optimized for particular use cases. XSLT implementations are generally built for transforming one whole document. They load the entire input document into memory and take one or more complete passes through the document. This is appropriate behavior when an entire document is being transformed, since the entire document needs to be accessed anyway. Additional input documents can be accessed using the doc or document functions, in which case they too are loaded into memory. Some XSLT processors now support streaming, which means that entire documents don’t need to be in memory at once, improving performance and reducing memory requirements.

XQuery implementations, on the other hand, are generally optimized for selecting fragments of data—possibly across many documents—for example, from a database. When content is loaded into the database, it is broken into chunks that are usually smaller than the entire documents. Those chunks are indexed so that they can be retrieved quickly. XQuery queries can access these chunks without being forced to load the entire documents that contain them. This makes selecting a subset of information from a large body of XML documents much faster using the average XQuery implementation.

Convenient features of XSLT

XSLT has several convenient features that are absent from XQuery:

  • xsl:result-document allows the creation of multiple output files directly in a stylesheet.

  • xsl:import allows you to override templates and functions in an imported stylesheet. It gives you some of the capabilities of inheritance and polymorphism from object languages, which is particularly useful when writing large application suites designed to handle a variety of related and overlapping tasks. This is harder to organize in XQuery, which has neither the polymorphism of object-oriented languages nor the function pointers of a language like C. The modules of XQuery also have significant limitations when writing large applications, such as the rule banning cyclic imports.

  • xsl:key can be used to define keys on large input data to improve processing performance.

Using XQuery and XSLT Together

XQuery and XSLT each have their strengths and it sometimes makes sense to use them together. For example, you might want to use XQuery to search an XML database, but then use XSLT to do a detailed transformation of the retrieved content to HTML. It makes sense to pipeline them together, with the XSLT transforming the output of the XQuery query. Some XQuery implementations provide some ability to invoke XSLT transformations using implementation-defined extension functions.

Starting in version 3.1, there is a standard way to invoke XSLT from XQuery, using the transform function. This allows you to pass a source item and a stylesheet to the function, and retrieve the transformed results. For example:

let $result := transform(
  map {
    "stylesheet-location": "render.xsl",
    "source-node": doc("catalog.xml")
  })
return $result?output

could be used to transform the document node of catalog.xml by using the XSLT at the specified location. The function returns a map, where the output entry contains the transformation results.

Conversely, it is also possible to invoke XQuery functions from XSLT, using the load-xquery-module function, which dynamically loads an XQuery library module and provides access to its functions and variables. Because this is an XPath function, it could be called, for example, in an XSLT select attribute. These two functions are described in detail in Appendix A, in the “transform” and “load-xquery-module” sections.

XQuery Backward Compatibility with XPath 1.0

XPath 1.0 is essentially a subset of XQuery. If you already know XPath 1.0 from using it in XSLT 1.0, you will probably find parts of XQuery very familiar.

Backward- and cross-compatibility are mostly maintained among the three languages, so that an expression in any of the three languages will usually yield the same results. However, there are a few important differences, which are described in this section. All of these differences from XPath 1.0 and XPath 2.0 are also relevant if you plan to use XSLT 2.0 or 3.0.

The few areas of backward incompatibility between XPath 1.0 and later versions are discussed in greater detail in the XPath 2.0 specification, which is at http://www.w3.org/TR/xpath20. In XSLT, you can choose to process 2.0 and 3.0 stylesheets while setting an XPath 1.0 Compatibility Mode to treat XPath expressions just like XPath 1.0 expressions. This helps to avoid unexpected changes in the behavior of stylesheets when they are upgraded from 1.0. The mode is not available in XQuery.

This section uses the term “XQuery/XPath 2.0+” to indicate all versions of XQuery, and versions 2.0 and later of XPath.

Data Model

The XPath 1.0 data model has the concept of a node-set, which is a set of nodes that are always in document order. In XQuery/XPath 2.0+, there is the similar concept of a sequence. However, sequences differ in that they are ordered (not necessarily in document order), and they can contain duplicates. Another difference is that sequences can contain atomic values and function items as well as nodes.

Root nodes in XPath 1.0 have been renamed document nodes in XQuery/XPath 2.0+. Namespace nodes (and the namespace axis) are now deprecated in XPath 2.0 and later, and not at all accessible in XQuery. They have been replaced by two functions that provide information about the namespaces in scope: in-scope-prefixes and namespace-uri-for-prefix.

New Expressions

XPath 2.0 and later versions encompass a lot more than just paths. Some of the new kinds of expressions include:

  • Conditional expressions (if-then-else)

  • for expressions, which are a subset of XQuery FLWORs that have only one for or one let clause and a return clause

  • Quantified expressions (some/every-satisfies)

  • Ordered sequence constructors (($x, $y))

  • Additional operators to combine sequences (intersect, except)

  • Node comparison operators (is, <<, >>)

  • Arrays and maps (starting in version 3.1)

These new expressions are all part of XPath itself, not just XQuery.

Path Expressions

If you already use XPath 1.0, the path expressions in XQuery should be familiar. The basic syntax and meaning of node tests and predicates is the same. The set of axes is almost the same, except that the namespace:: axis is not available.

There are some additional enhancements to path expressions. One is the ability to have the last step in a path return atomic values rather than nodes. So, for example:

doc("catalog.xml")//product/name/substring(., 1, 5)

will return the first five characters of each product name, resulting in a sequence of four string atomic values. This makes it really easy to do things that were tough in XPath 1.0; for example, summing over the product of price and quantity becomes:

sum(//item/(@price * @qty))

Another improvement is that it is now possible to have any expression as a step. You can take advantage of all the newly added kinds of expressions described in the previous section. It also allows you to create navigational functions that are very useful as steps, for example:

customer/prod:orders-for-customer(.)/product-code

Function Conversion Rules

In XPath 1.0, if you call a function that is expecting a single value, and pass it a sequence of multiple values, it simply takes the first value and discards the others. For example:

substring(doc("catalog.xml")//product/name, 1, 5)

In this case, there are four product nodes. XPath 1.0 just takes the name of the first one and returns Fleec. In XQuery/XPath 2.0+, type error XPTY0004 is raised.

XQuery is strongly typed, while XPath 1.0 is not. In XPath 1.0, if you pass a value to a function that is of a different type—for example, a number to a function expecting a string, or vice versa—the value is cast automatically. In XQuery/XPath 2.0+, type error XPTY0004 is raised. For example:

substring(12345678, 1, 4)

attempts to take the substring of a number. It will return 1234 in XPath 1.0, and raise type error XPTY0004 in XQuery/XPath 2.0+. Instead, you would need to explicitly convert the value into a string, as in:

substring(string(12345678), 1, 4)

Arithmetic and Comparison Expressions

In XPath 1.0, performing an arithmetic operation on a “missing” value results in the value NaN. In XQuery/XPath 2.0+, it returns the empty sequence—for example, the expression:

catalog/foo * 2

Similar to the function conversion rules, in XPath 1.0, an arithmetic expression will take the first value of a sequence and discard the rest. In XQuery/XPath 2.0+, it raises type error XPTY0004.

In XPath 1.0, operands of all types are automatically converted to numbers for arithmetic operations. In XQuery/XPath 2.0+, the operands must be untyped or numeric.

It is possible in XQuery/XPath 2.0+ to compare non-numeric values such as strings by using the operators <, <=, >, and >=. In XPath 1.0, this was not supported. By default, XQuery/XPath 2.0+ treats untyped operands of a comparison like strings, whereas they were treated as numbers in XPath 1.0. This is a significant compatibility risk, because the results of the comparison will be different if, for example, you are comparing <price>29.99</price> to <price>100.00</price>. XPath 1.0 would say that the first price is less than the second, while XQuery/XPath 2.0+ (in the absence of a schema that says they are numeric) would say that the second price is less, because its string value is lower.

Built-in Functions

There are almost 200 built-in functions in XQuery/XPath 3.1, as compared to 27 in XPath 1.0. All the built-in functions from XPath 1.0 are also supported in XQuery/XPath 2.0+, with a couple of minor differences:

  • Some XQuery/XPath 2.0+ function calls return the empty sequence, where in XPath 1.0 they would have returned a zero-length string. This is the case if the empty sequence is passed as the first argument to substring, substring-before, or substring-after.

  • Some XQuery/XPath 2.0+ function calls return the empty sequence, where in XPath 1.0 they would have returned the value NaN. This is the case if the empty sequence is passed to round, floor, or ceiling.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset