Chapter 6. Exploiting XSLT 2.0

If the only new thing we have to offer is an improved version of the past, then today can only be inferior to yesterday. Hypnotized by images of the past, we risk losing all capacity for creative change.

Robert Hewison

Introduction

XSLT 2.0 has numerous additions and enhancements that make doing hard things in XSLT easier. This chapter will help the XSLT 1.0 veteran make the transition to 2.0 and also help the XSLT newbie understand how to better approach stylesheet design in 2.0.

XSLT 2.0 derives much of its improved functionality from XPath 2.0, so if you have skipped Chapter 1, then you should consider reading that first.

As with most progress in software technology, version 2.0 is an improved version of the old rather than a complete rethinking of stylesheet design. It also falls short in some features that could have made it a much better language (e.g., introspection and direct support for higher-order functions). However, XSLT 2.0 goes a long way toward elevating the drudgery of developing complex stylesheet logic. The key features in 2.0 are XPath functions, grouping, enhanced modes, cleaner idioms for reusable code, a richer type system, and enhanced support for text processing. All of these features are used to improve the 2.0 recipes that appear in this edition, but this chapter provides a one-stop reference for the new features themselves.

6.1. Convert Simple Named Templates to XSLT Functions

Problem

XSLT 1.0 did not support writing XPath functions in XSLT, and named templates are an awkward substitute.

Solution

Prefer XSLT 2.0 functions over named templates when the purpose is solely to compute a result rather then create serialized content. Below I show a potpourri of examples where functions are much more convenient compared to named templates:

<!-- Mathematical computations -->

<xsl:function name="ckbk:factorial" as="xs:decimal">
   <xsl:param name="n" as="xs:integer"/>
   <xsl:sequence select="if ($n eq 0) then 1 
                         else $n * ckbk:factorial($n - 1)"/> 
</xsl:function>


<-- Simple mappings -->

<xsl:function name="ckbk:decodeColor" as="xs:string">
     <xsl:param name="colorCode" as="xs:integer"/>
     <xsl:variable name="colorLookup"
                  select="('black','red','orange','yellow',
                           'green','blue','indigo','violet','white')"/>
     <xsl:sequence select="if ($colorCode ge 0 and 
                               $colorCode lt count($colorLookup)) 
                           then $colorLookup[$colorCode] 
                           else 'no color'"/>
</xsl:function>

<-- String manipulations -->

<xsl:function name="ckbk:reverse">
    <xsl:param name="input" as="xs:string"/>
    <xsl:sequence select="codepoints-to-string(reverse(
                           string-to-codepoints($input)))"/>
</xsl:function>

Discussion

Recall that named templates are an alternative to templates that are invoked strictly by matching. Named templates act much like procedures in transitional languages because an XSLT programmer explicitly transfers control to a named template via xsl:call-tempate, rather than relying on the more declarative semantics of template matching. A nice feature of XSLT (both 1.0 and 2.0) is that you can mix these styles by giving a template both a pattern and a name.

User-defined XSLT 2.0 functions are not a substitute for named templates. The key question to ask yourself choosing one over the other is: are you simply computing a result or are you creating a reusable content producer? The former is better expressed as a function and the latter as a template. In XSLT 2.0 Programmer’s Reference, Michael Kay recommends using functions in cases where you are simply selecting nodes and templates when you are creating new ones, even though XSLT will allow you to use functions for the latter.

This function is selecting nodes:

<xsl:function name="getParts" as="item()*">
  <xsl:param name="startPartId" as="xs:string"/> 
  <xsl:param name="endPartId" as="xs:string"/>
  <xsl:sequence select="//Parts/part[@partId ge $startPartId 
                                     and @partId le $endPartId]"/> 
</xsl:function>

This function is creating new nodes but perhaps a template would make more sense:

<xsl:function name="getPartsElem" as="item()">
      <xsl:param name="startPartId" as="xs:string"/> 
      <xsl:param name="endPartId" as="xs:string"/>
      <Parts>
          <xsl:copy-of select="//Parts/part[@partId ge $startPartId 
                                         and @partId le $endPartId]"/>
      <Parts>     
</xsl:function>

6.2. Prefer for-each-group over Muenchian Method of Grouping

Problem

XSLT 1.0 did not have explicit support for grouping so indirect and potentially confusing techniques had to be invented.

Solution

Take advantage of the powerful xsl:for-each-group instruction for all your grouping needs. This instruction has a mandatory select attribute where you provide an expression that defines the population of nodes you wish to group. You then use one of four grouping attributes to define the criteria for dividing the population into groups. These are explained next. As each group is processed, you can use the function current-group() to access all nodes in the current group. You use the function current-grouping-key() to access the value of the key that defines the group being processed when grouping by value or adjacent nodes. The current-grouping-key() function has no value when grouping by start or ending node.

You can also sort groups by inserting one or more xsl:sort instruction to define the sorting criteria just as you do when using xsl:for-each.

Group by values (group-by="expression”)

A classic grouping problem arises quite often when processing data into reports. Consider sales data. Product managers will often want data grouped by sales region, product type, or salesperson, depending on what problem they are trying to solve. You use the group-by attribute to define an expression that determines that value or values that cause nodes in the population to group together. For example, group-by="@dept" would cause nodes that have the same dept value to group together:

<xsl:template match="Employees">
  <EmployeesByDept>
    <xsl:for-each-group select="employee" group-by="@dept">
      <dept name="{current-grouping-key()}">
        <xsl:copy-of select="current-group()"/>
      </dept>
    </xsl:for-each-group>
  </EmployeesByDept>
</xsl:template>

Group by adjacent nodes (group-adjacent="expression”)

In some contexts, such as document processing, you want to consider nodes that share a common value provided they are also adjacent to each other. As with group-by, group-adjacent defines an expression used to determine the value used to perform the grouping, but two nodes that have such a value will only be in the same group if they are adjacent in the population. The value of group-adjacent must be singleton, as empty sequences or multi-valued sequences will cause an error.

Consider a document consisting of para elements interspersed with other heading elements. You would like to extract only the para elements without losing track of the fact that some sequences of para elements belong together as part of the same topic:

<xsl:template match="doc">
    <xsl:copy>
      <xsl:for-each-group select="*" group-adjacent="name()">
        <xsl:if test="self::para">
          <topic>
            <xsl:copy-of select="current-group()"/>
          </topic>
        </xsl:if>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

Group by starting node (group-starting-with="pattern”)

Frequently, especially in document processing, a group of related nodes is demarcated by a particular node such as a title or subtitle, or other type of heading node. Grouping by starting node makes it easy to process these loosely structured documents. The group-starting-with attribute defines a pattern that matches nodes in the population that are the starting nodes of the group. This is similar to the patterns you use with the match attribute in xsl:template instructions. When the pattern matches a node in the population, all subsequent nodes are part of the group until another match is made. The first node in the population defines a group whether it matches or not. This implies that a population will have at least one group, the entire population, even if the pattern is never matched.

A classic example involves reconstituting structure from an unstructured document. XHTML is a good example of a loosely structured markup language, especially in regard to the use of heading elements (h1, h2, etc.). The following transformation will add some structure by nesting each group, designated by a starting h1 element, in a div element:

<xsl:template match="body">
  <xsl:copy>
    <xsl:for-each-group select="*" group-starting-with="h1">
      <div>
       <xsl:apply-templates select="current-group()"/>
      </div>
   </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

Group by ending node (group-ending-with="pattern”)

This form of grouping is similar to group-starting-with but uses the group-ending-with pattern to define the last node that will be in the current group. The first node in the population starts a new group, so there is always at least one group even if the pattern does not match any nodes.

Of all the grouping methods, grouping by ending node will typically find less application. This is because documents designed for human consumption use leading elements, such as headings, to single new groups, rather than trailing ones. In XSLT 2.0 Programmer’s Reference, Michael Kay provides an example of a series of documents having been broken into chunks for purpose of transmission. In this example, the document boundaries are separated by the absence of an attribute continued='yes‘. A slightly more probable example is one where you want to add structure to a flat document by chunking elements based on some criteria that designate the end of a chunk. For example, you can group paragraphs into sections of five paragraphs with the following code:

 <xsl:stylesheet version="2.0"
                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
 
  <xsl:template match="doc">
    <xsl:copy>
      <xsl:for-each-group select="para" 
                          group-ending-with="para[position() mod 5 eq 0]">
        <section>
          <xsl:for-each select="current-group()">
            <xsl:copy>
              <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
          </xsl:for-each>
        </section>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Discussion

The Muenchian Method, named after Steve Muench of Oracle, was an innovative way to group data in XSLT 1.0. It took advantage of XSLT’s ability to index documents using a key. The trick involves using the index to efficiently figure out the set of unique grouping keys and then using this set to process all nodes in the group:

<xsl:key name="products-by-category" select="product" use="@category"/>

<xsl:template match="/">

    <xsl:for-each select="//product[count(. | key('products-by-category', @category)[1]) = 1]">
    <xsl:variable name="current-grouping-key" 
                  select="@category"/>
    <xsl:variable name="current-group" 
                  select="key('current-grouping-key', 
                              $current-grouping-key)"/>
    <xsl:for-each select="$current-group/*">
       <!-- processing for elements in group -->
       <!-- you can use xsl:sort here also, if necessary -->
    </xsl:for-each/>
  </xsl:for-each/>

<xsl:template match="/">

Although the Muenchian method will continue to work in 2.0, you should prefer for-each-group because it is likely to be as efficient and probably more so. Just as important, it will make your code more comprehensible, especially to XSLT novices. Further, you use the same basic instruction to get access to the four distinct grouping capabilities. The Muenchian method can only be used for value-based grouping. Backward compatibility to XSLT 1.0 is probably the only compelling reason to continue to use Muenchian grouping in XSLT 2.0.

6.3. Modularizing and Modes

Problem

XSLT 1.0 limitations on use o f modes often resulted in duplication of code or extra effort.

Solution

Use XSLT 2.0’s new mode attribute’s capabilities to eliminate code duplication. Consider a simple example of a stylesheet that uses two different modes to process a document in two passes. In each pass, you would like to ignore text nodes, by default. In XSLT 1.0, you would have to write something like the following:

<xsl:template match="text()" mode="mode1"/>
<xsl:template match="text()" mode="mode2"/>

However, in 2.0, you can remove the redundancy:

<xsl:template match="text()" mode="mode1 mode2"/>

Or if the intention is to match in all modes:

<xsl:template match="text()" mode="#all"/>

Granted, this is a small improvement, but it has a large payback for stylesheets that are more complex, use a large number of modes, share a lot of code between modes, or are under frequent maintenance.

Discussion

A rule of thumb that I adhere to when using modes in 2.0 is to always use #current rather than an explicitly named mode if my intention is to continue processing in the present mode:

<xsl:template match="author" mode="index">
  <div class="author">
    <xsl:apply-templates mode="#current"/>
  </div>
</xsl:template>

This has two immediately beneficial consequences. First, if you later decide you picked a bad name for the mode and want to change it, you will not need to change any of the calls to xsl:apply-templates. Second, if you add new modes to the template, it will continue to work without further change:

<xsl:template match="author" mode="index body">
  <div class="author">
    <xsl:apply-templates mode="#current"/>
  </div>
</xsl:template>

6.4. Using Types for Safety and Precision

Problem

XSLT 1.0’s limited type checking put limitations on how robust your stylesheets could be.

Solution

Use XSLT 2.0’s extended type system to create precise and type-safe functions and templates.

Use the as attribute on elements that hold or return data.

These elements include xsl:function, xsl:param, xsl:template, xsl:variable, and xsl:with-param.

Use the type attribute on elements that create data.

These elements include xsl:attribute, xsl:copy, xsl:copy-of, xsl:document, xsl:element, and xsl:result-document.

Discussion

All conforming XSLT 2.0 processors allow you to use the simple data types such as xs:integer or xs:string to describe variables or parameters. Further, these types can be used with the symbols *, + and ? to describe sequences of these simple types:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<!-- x is a sequence of zero or more strings -->
<xsl:variable name="x" select="para" as="xs:string*"/>

<!-- y is a sequence of one or more strings. We code the select in a way
     that guarantees this although if you knew there must be at least one
     para element, you could ommit the if expression -->
<xsl:variable name="y" 
              select="if (para) then para else ''" 
              as="xs:string+"/>

<!-- z is a sequence of one or more strings.  -->
<xsl:variable name="z" select="para[1]" as="xs:string?"/>

</xsl:stylesheet>

With a schema-aware processor, you can go even further and refer to both simple and complex types from a user-defined schema. A schema-aware processor must be made aware of user-defined types via the top-level instruction xsl:import-schema:

<xsl:stylesheet version="2.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns:my="http://www.mydomain.com/ns/my">

<xsl:import-schema schema-location="my-schema.xsd"/>

<xsl:template match="/">
 <!--Validate that the resulting element conforms to my:BookType -->
 <xsl:element name="Book" type="my:BookType">
     <xsl:apply-templates/>
 </xsl:element>
</xsl:template>

<!-- ... -->

</xsl:stylesheet>

You should not use xsl:import-schema if you do not have access to a schema-aware processor. If you use a schema-aware processor but wish to make your stylesheets compatible with non-schema-aware processors, then you should use the attribute use-when="system-property('xsl:schema-aware') = 'yes'" on all elements that require a schema-aware processor.

6.5. Avoiding 1.0 to 2.0 Porting Pitfalls

Problem

Not every 1.0 stylesheet will work transparently in 2.0.

Solution

If you need to port 1.0 stylesheets to 2.0, you will want to watch out for several gotchas. Some of these problems can be eliminated by using XSLT 1.0 compatibility mode; that is, by using version=1.0 in the stylesheet element, <xsl:stylesheet version="1.0">. However, if you want to begin migrating old stylesheets to 2.0, there are other solutions to these incompatibilities, as explained next.

Warning

XSLT 2.0 processors are not obligated to support backward compatibility mode, although most probably will. If it is not supported, the processor will signal an error.

Sequences do not transparently convert to atomic items

Consider the fragment <xsl:value-of select="substring-before(Name, ' ')"/>. What happens if this is evaluated in a context that includes more than one Name element? In XSLT 1.0, only the first one would be used, and the rest would be ignored. However, XSLT 2.0 is stricter and signals a type error because the first argument if substring-before can only be a sequence of 0 or 1 strings.

To remedy this, you should get in the habit of writing <xsl:value-of select="substring-before(Name [1] , ' ')"/>. On the other hand, you may want to know about these errors because they might signal a bug in the way the stylesheet or its input documents are written. An alternative fix, which may be applicable in some circumstances, is to combine multiple nodes into a single node before presenting the sequence to a function expecting only one. For example, <xsl:value-of select="substring-before( string-join(Name) , ' ')"/> would not generate an error.

Types do not transparently convert to other types

XSLT 1.0 was very lax when it came to type conversions. If a function expected a number and you provided a string, it would do its best to convert the string to a number and visa versa. The same applied to conversions among Boolean and integer or Boolean and string. The old behavior can be preserved by using 1.0 compatibility mode. However, you can also explicitly convert values:

    <xsl:variable name="a" select=" '10' "/>
    <xsl:value-of select="sum($a, 1)"/> <!-- Error -->
    <xsl:value-of select="sum(number($a), 1)"/> <!-- OK -->

    <xsl:value-of select="string-length(10)"/> <!-- Error -->
    <xsl:value-of select="string-length(string(10))"/> <!-- OK -->
    <xsl:value-of select="string-length(string(true()))"/> 
                                                   <!-- OK, equals 4 -->

    <xsl:value-of select="1 + true()"/> <!-- Error -->
    <xsl:value-of select="1 + number(true())"/> <!-- OK, equals 2 -->

Extra parameters are not ignored

In XSLT 1.0, if you invoked a template with xsl:call-template passed parameters that the template did not define, the extra parameters were silently ignored. In 2.0, this is an error. You can disable this error by using 1.0 compatibility mode. There is no other work around, except removing the extra parameters or introducing defaults into the existing template.

Stricter semantic checking will cause errors on questionable usage

Examples of this can be seen in both the xsl:number and xsl:template instructions. If you use level and value attributes together in xsl:number, the level attribute is ignored in 1.0, but this is an error in 2.0. Similarly, with xsl:template, you cannot specify priority or mode in 2.0, if there is no match attribute defined.

Discussion

Use of backward-compatibility mode to correct errors in 1.0 stylesheets has other consequences. In particular, it means that some things will behave differently. In 1.0 compatibility mode:

  • xsl:value-of will output only the first item of a sequence rather than all items separated by spaces.

  • an attribute value template (e.g. <foo values="{foo}"/>) will output only the first item of a sequence rather than all items separated by spaces.

  • the first number in a sequence will be output rather than all numbers separated by spaces when using xsl:number.

    For these reasons, it would be wise not to rely on backward-compatibility mode for new stylesheet development intended to target a 2.0-compliant processor.

6.6. Emulating Object-Oriented Reuse and Design Patterns

Problem

You would like to graduate from cut and paste XSLT reuse to creating libraries of reusable XSLT code.

Solution

Clearly XSLT 2.0 is not an object-oriented programming language. However, object orientation is as much about how to engineer generic reusable code as it is about the creation of classes, objects, inheritance, encapsulation, and the like.

There are two features of XSLT 2.0 that facilitate an object-oriented style. The first is the instruction xsl:next-match and the second is tunnel parameters. The xsl:next-match instruction is a generalization of XSLT 1.0’s xsl:apply-imports. Recall that in XSLT 1.0, you use xsl:apply-imports to invoke templates of lower import precedence. The xsl:next-match instruction generalizes this behavior by allowing you to invoke matching templates of lower priority within the same stylesheet and importing stylesheets. This is akin to calling a base class method in OO programming:

<xsl:template match="author | title | subtitle | deck" priority="1">
  <a name="{generate-id()}">
    <span class="{name()}">
      <xsl:apply-templates/>
    </span>
  </a>
</xsl:template>  

<xsl:template match="author" priority="2">
  <div>
    <span class="by">By </span>
    <xsl:next-match/>
  </div>
</xsl:template>  

<xsl:template match="title" priority="2">
  <h1 class="title"><xsl:next-match/></h1>
</xsl:template>  

<xsl:template match="deck" priority="2">
  <h2 class="deck"><xsl:next-match/></h2>
</xsl:template>  

<xsl:template match="subtitle" priority="2">
  <h2 class="subtitle"><xsl:next-match/></h2>
</xsl:template>

A further enhancement in 2.0 is the ability to pass parameters to both xsl:next-match and xsl:apply-imports:

<xsl:next-match>
  <xsl:with-param name="indent" select="2"/>
</xsl:next-match>

A further capability in XSLT 2.0 templates is tunnel parameters , a form of dynamic scoping that is popular in functional programming. Tunnel parameters allow calls to xsl:apply-templates to pass parameters that are not necessarily known to the immediately matching templates. However, these templates are transparently carried over from call to call until they arrive at a template that contains such parameters. Note that the attribute tunnel="yes" must be used both at the point of call and the point where the parameter is accepted:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!--Standard processing rules for doc -->
<xsl:import href="doc.xslt"/>

<!-- A custom parameter not envisioned by the author of doc.xslt -->
<xsl:param name="customParam"/>

<xsl:template match="/">
 <!--Invoke templates from doc.xslt that have no 
     knowledge of customParam -->
  <xsl:apply-templates> 
     <xsl:with-param name="customParam" 
                     select="$customParam" tunnel="yes"/>
   </xsl:apply-templates> 
</xsl:template>


<xsl:template match="heading1">
    <!-- Do something special with heading1 elements 
         based on customParam -->
     <xsl:param name="customParam" tunnel="yes"/>
     <!-- ... 
                 -->
</xsl:template>


</xsl:stylesheet>

This is an extremely important enhancement to XSLT, because it allows you to decouple application-specific templates from generic ones without introducing global parameters or variables.

Discussion

In object-oriented development, the notion of design patterns is quite popular. These are tried and true techniques that have broad application in a variety of problems. The patterns facilitate communication between developers by providing semi-standard names for the techniques described by the pattern, as well as the applicable context.

The facilities described in this recipe and Recipe 6.3 can be used to implement some of the standard patterns from an XSLT perspective. Next we adapt some standard patterns to the domain of XSLT.

Template method

Define the skeleton of a stylesheet in an operation, deferring some steps to templates implemented by importing stylesheets. Template Method lets others redefine certain steps of a transformation without changing the transformation’s structure.

Consider a stylesheet that defines the standard by which your company renders an XML document to the web:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:xs="http://www.w3.org/2001/XMLSchema" >
    
  <xsl:output method="xhtml" indent="yes"/>

  <xsl:param name="titlePrefix" select=" '' " as="xs:string"/>
    
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:value-of 
                    select="concat($titlePrefix, /*/title)"/></title>
      </head>
      <body>
        <xsl:call-template name="standard-processing-sequence"/>
      </body>
    </html>
    </xsl:template>

  <xsl:template name="standard-processing-sequence">
    <xsl:apply-templates mode="front-matter">
      <xsl:with-param name="mode" select=" 'front-matter' " 
                      tunnel="yes" as="xs:string"/>
    </xsl:apply-templates>
        
    <xsl:apply-templates mode="toc">
      <xsl:with-param name="mode" select=" 'toc' " 
                      tunnel="yes" as="xs:string"/>
    </xsl:apply-templates>
        
    <xsl:apply-templates mode="body">
        <xsl:with-param name="mode" select=" 'body' " 
                      tunnel="yes" as="xs:string"/>
    </xsl:apply-templates>
        
    <xsl:apply-templates mode="appendicies">
      <xsl:with-param name="mode" select=" 'appendicies' " 
                      tunnel="yes" as="xs:string"/>
    </xsl:apply-templates>
  </xsl:template>
    
  <xsl:template match="/*" mode="#all">
    <xsl:param name="mode"  tunnel="yes" as="xs:string"/>
    <div class="{$mode}">
      <xsl:apply-templates mode="#current"/>
    </div>
  </xsl:template>

  <!-- Default templates for various modes go here - 
       these can be overridden in importing stylesheets -->
      
</xsl:stylesheet>

Here you use modes to identify each major stage of processing. However, you also pass the current mode’s name in a tunnel parameter. This has two benefits. First, it is useful for debugging templates that match in multiple modes. Second, it allows similar multi-mode templates whose behavior varies by a small amount to implement this variation as a function of the mode parameter, without necessarily having knowledge of the specific modes. For example, if the template attaches CSS styles to the output elements, those styles can be prefixed with the mode name or use some other general mapping (e.g., table lookup) to go from mode to CSS style.

Chain of responsibility

Avoid coupling the initiator of a transformation to its templates that handle it by giving more than one template a chance to handle the request. Rely on template matching rather than conditional logic to determine the appropriate template.

Priorities are key to making this pattern portable because some XSLT processor may not handle templates with ambiguous template precedence. This example is adapted from a dynamic web site project I worked on that also used Cocoon. This project used templatized HTML where the class attributes dictated how dynamic content from an XML database would be rendered into the templated HTML. I omit the details of each template because they are less important than the overall structure. In this example, only one template will match in any given xhtm:td node, but in the general case, you can use xsl:next-match to combine the effects of multiple matching templates:

<xsl:template match="xhtm:td[matches(@class, '^keep-(w+)')]"
              mode="template" priority="2.1">

</xsl:template>

<xsl:template match="xhtm:td[matches(@class, '^(flow|list)-(w+)')]"
              mode="template" priority="2.2">

</xsl:template>

<xsl:template match="xhtm:td[matches(@class, '^repeat-(w+)')]"
              mode="template" priority="2.3">
 

</xsl:template>

<xsl:template match="xhtm:td[matches(@class, '^download-(w+)')]"
              mode="template" priority="2.4">


</xsl:template> 


<xsl:template match="xhtm:td[matches(@class, '^keep-(w+)')]"
              mode="template" priority="2.1">


<xsl:template>

Decorator

Add behavior to lower-priority templates by matching nodes using a higher priority of higher import precedence templates. Invoke the core behavior using xsl:next-match.

This is the classic use of the next match we discussed in the solution section, so we omit an example here.

6.7. Processing Unstructured Text with Regular Expressions

Problem

You need to transform XML documents that contain chunks of unstructured text that must be marked up into a proper document.

Solution

There are three XPath 2.0 function for working with regular expressions: match(), replace(), and tokenize(). We covered these in Chapter 1. There is also a new XSLT instruction, xsl:analyze-string, which allows you to do even more advanced text processing.

The xsl:analyze-string instruction takes a select attribute for specifying the string to be processed, a regex attribute for specifying the regular expression to apply to the string, and an optional flags attribute to modify the action of the regex engine. The standard flags are:

  • i Case-insensitive mode.

  • m Multi-line mode makes metacharacters ^ and $ match the beginning and ends of lines rather than the beginning and end of the entire string (the default).

  • s Causes the metacharacter . to match newlines (entity &#xa;). The default is not to match newlines. This mode is sometimes called single-line mode, but from its definition, it should be clear that it is not the opposite of multi-line mode. Indeed, one can use both the s and m flags together.

  • x Allows whitespace to be used in a regular expression as a separator rather than a significant character.

The child element xsl:matching-substring is used to process the substring that matches the regex and xsl:non-matching-substring is used to process the substrings that match the regex. Either may be omitted. It is also possible to refer to captured groups (parts of a regex surrounded by parenthesis) using the regex-group function within xsl:matching-substring:

<xsl:template match="date">
  <xsl:copy>
    <xsl:analyze-string select="normalize-space(.)" 
        regex="(dddd) ( / | - ) (dd) ( / | - ) (dd)" 
        flags="x">
      <xsl:matching-substring>
        <year><xsl:value-of select="regex-group(1)"/></year>
        <month><xsl:value-of select="regex-group(3)"/></month>
        <day><xsl:value-of select="regex-group(5)"/></day>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <error><xsl:value-of select="."/></error>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:copy>
</xsl:template>

A nice complement to xsl:analyze-string is the XSLT function unparsed-text(). This function allows you to read the contents of a text file as a string. Thus, as the name suggests, the file is not parsed and therefore need not be XML. In fact, except in the most unique of circumstances, you would not normally use unparsed-text() on XML content.

The following stylesheet will convert a simple comma delimited file (one with no quoted strings) to XML:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" 
xmlns:fn="http://www.w3.org/2005/02/xpath-functions" 
xmlns:xdt="http://www.w3.org/2005/02/xpath-datatypes">

 <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

 <xsl:param name="csv-file" select=" 'test.csv' "/>  
  
  <xsl:template match="/">
  
    <converted-csv filename="{$csv-file}">
      <xsl:for-each select="tokenize(unparsed-text($csv-file, 'UTF-8'),
                                    '
')">
        <xsl:if test="normalize-space(.)">
          <row>
            <xsl:analyze-string select="." regex="," flags="x">
              <xsl:non-matching-substring>
                <col><xsl:value-of select="normalize-space(.)"/></col>
              </xsl:non-matching-substring>
            </xsl:analyze-string>
          </row>
        </xsl:if>
      </xsl:for-each>
    </converted-csv>
    
  </xsl:template>
  
</xsl:stylesheet>

Discussion

The regex capabilities of XSLT 2.0 along with unparsed-text() open up whole new processing possibilities to XSLT that were next to impossible in XSLT 1.0. Still, XSLT would not be my first choice for non-XML processing unless I was working in a context where a multi-language solution (e.g., Java and XSLT or Perl and XLST) was not practical. Of course, if XSLT is the only language you want to master, the new capabilities certainly open up new vistas for you to explore.

Part of my motivation for jumping the XSLT ship when entering the domain of unstructured text processing are the “missing features” of xsl:analyze-string. It would be nice if the position() and last() functions worked within xsl:matching-substring to tell you that this is match number position() of last() possible matches. I sometimes use xsl:for-each over a tokenize() instead of xsl:analyze-string but that is also deficient because it only returns the non-matching portions. Further, you often feel compelled to use xsl:analyze-string for a complex parsing problem involving many possible regex matches in a regex using alternation (|). However, there is no way to tell which regex matched without re-matching using the match() function, which is a tad redundant and wasteful for my taste because surely the regex engine knows what part it just matched:

<xsl:template match="text()">
  <xsl:analyze-string select="." 
                        regex='[-+]?d.d+s*[eE][-+]?d+ |
                               [-+]?d+.d+                | 
                               [-+]?d+                     |
                               "[^"]*?"                      ' 
                        flags="x">
      <xsl:matching-substring>
        <xsl:choose>
          <xsl:when test="matches(.,'[-+]?d.d+s*[eE][-+]?d+')">
            <scientific><xsl:value-of select="."/></scientific>          
            </xsl:when>
          <xsl:when test="matches(.,'[-+]?d+.d+')">
            <decimal><xsl:value-of select="."/> </decimal>
          </xsl:when>
          <xsl:when test="matches(.,'[-+]?d+')">
            <integer><xsl:value-of select="."/> </integer>
          </xsl:when>
          <xsl:when test='matches(.," "" [^""]*? "" ", "x")'>
            <string><xsl:value-of select="."/></string>
          </xsl:when>
      </xsl:choose>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:template>

Now, hindsight is always 20/20, and there are, of course, all sorts of implementation issues and tradeoffs that one needs to overcome when enhancing a language; so, with all due respect to the XSLT 2.0 committee, it would have been sweeter if xsl:analyze-string worked as follows:

               <!-- NOT VALID XSLT 2.0 - Author's wishful thinking --> 
<xsl:template match="text()">
  <xsl:analyze-string select="." 
                      flags="x">
    <xsl:matching-substring regex="[-+]?d.d+s*[eE][-+]?d+">
      <scientific><xsl:value-of select="."/></scientific>
    </xsl:matching-substring>          
    <xsl:matching-substring regex="[-+]?d+.d+'">
      <decimal><xsl:value-of select="."/> </decimal>
    </xsl:matching-substring>          
    <xsl:matching-substring regex=" [-+]?d+')">
      <integer><xsl:value-of select="."/> </integer>
    </xsl:matching-substring>          
    <xsl:matching-substring regex=' "[^"]*?" '>
      <string><xsl:value-of select="."/></string>
    </xsl:matching-substring>
    <xsl:non=matching-substring>
      <other><xsl:value-of select="."/></other>
   </xsl:non=matching-substring>
  </xsl:analyze-string>
</xsl:template>

6.8. Solving Difficult Serialization Problems with Character Maps

Problem

You need precise control of the serialization of your document and XSLT 1.0’s disable-output-escaping feature is too limited for your needs.

Solution

XSLT 2.0 provides a new facility called a character map that provides precise control of serialization. A character map is designed to be used with the xsl:output instruction.

The xsl:character-map instruction takes the following attributes:

name

Defines the name of the character map.

use-character-maps

A list of other character maps to incorporate into this one.

The content of an xsl:character-map is a sequence of xsl:output-character elements. These elements define a mapping between a single Unicode character and a string to be output in place of the character when that character is serialized. The following map can be used to output various special space characters as entities:

<xsl:character-map name="spaces">
    <xsl:output-character char="&#xA0;" string="&amp;npsp;"/>
    <xsl:output-character char="&#x2003;" string="&amp;emsp;"/>
    <xsl:output-character char="&#x2007;" string="&amp;numsp;"/>
    <xsl:output-character char="&#x2008;" string="&amp;puncsp;"/>
    <xsl:output-character char="&#x2009;" string="&amp;thincsp;"/>
    <xsl:output-character char="&#x200A;" string="&amp;hairsp;"/>
</xsl:character-map>

Another more subtle application of character maps is to output non-standard documents that would be difficult to create because they violate the rules of XML or XSLT. Michael Kay gives an example of outputting elements that are commented out in his XSLT Programmer’s Reference, 3rd Edition. Here is a variation on his example. The idea is to generate a copy of the input document but with the content of certain elements commented out with XML comments:

<?xml version="1.0"?>
<!-- Define custom enties using the Unicode private use characters -->
<!DOCTYPE xsl:stylesheet [
  <!ENTITY start-comment "&#xE501;">
  <!ENTITY end-comment "&#xE502;">
]>

<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Import the identity transform -->
<xsl:import href="copy.xslt"/>

<!-- Tell the serializer to use our character map, defined below -->
<xsl:output use-character-maps="comment-delimiters"/>

<!-- Define a key that will be used to identify elements that should be 
commented out. -->
<xsl:key name="omit-key" match="omit" use="@id"/>

<!-- Map our custom entities to strings that form the syntax of XML 
start and end comments -->
<xsl:character-map name="comment-delimiters">
  <xsl:output-character character="&start-comment;" string="&lt;!--"/>
  <xsl:output-character character="&end-comment;" string="--&gt;"/>
</xsl:character-map>

<!-- Comment out those elements that have an id attribute that matches the 
id of an omit element from an external document, omit.xml. -->
<xsl:template match="*[key('omit-key',@id,doc('omit.xml'))]">
  <xsl:text>&start-comment;</xsl:text>
  <xsl:copy>
      <xsl:apply select="@* | *"/>
  </xsl:copy>
  <xsl:text>&end-comment;</xsl:text>
</xsl:template>

See Also

Evan Lenz developed an XML-to-string converter that provides an alternative means for dealing with tough serialization problems. See http://xmlportfolio.com/xml-to-string/.

6.9. Outputting Multiple Documents

Problem

You need a portable stylesheet that can ouput more than one document.

Solution

Although most XSLT 1.0 implementations had extensions to help process multiple documents, they differed quite a bit. XSLT 2.0 provides xsl:result-document.

The xsl:result-document instruction takes the following attributes:

format

Defines the name of the output format as declared by a named xsl:output instruction.

href

Determines the destination where the output document will be serialized.

validation

Determines the validation to be applied to the result tree.

type

Determines the type that should be used to validate the result tree.

Here is an example that splits an XML document into several documents based on the the groups extracted by an xsl:for-each-group. Each output document is named using the grouping key as a suffix:

<xsl:template match="products">
    <xsl:for-each-group select="product" group-by="@type">
       <xsl:result-document href="prod-{current-grouping-key()}.xml">
        <xsl:copy-of select="current-group()"/>
       </xsl:result-document>
    </xsl:for-each-group>
</xsl:template>

Sometimes you want a stylesheet that outputs more than one format. For example, the default output format may be XML, but the output you send to an alternate destination may be HTML. For this, you need to take advantage of XSLT 2.0’s ability to specify multiple output formats:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
 <!-- Default output format is XML -->
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
 <!-- Another named output format for HTML -->
  <xsl:output method="html" encoding="UTF-8" indent="yes" name="html-out"/>

<xsl:template match="/">
  <xsl:apply-templates/>
  <xsl:result-document href="result.html" format="html-out"> 
    <xsl:apply-templates mode="html"/>
  </xsl:result-document>
</xsl:template>

<!-- Other templates here -->
  
</xsl:stylesheet>

Discussion

Although the use of xsl:result-document is straightforward, there is a potential for confusion with another new XSLT 2.0 instruction, xsl:document. This is because the XSLT 1.1 specification (now defunct) also had an instruction called xsl:document that had similar behavior to 2.0’s xsl:result-document.

In 2.0, the xsl:document plays a more limited role. Its purpose is to construct a document node, presumably because you want to perform document-level validation without actually serializing the result. Typically, you will capture the result of xsl:document in a variable:

<xsl:variable name="tempDoc" as="document(element(*, my:document))">
    <xsl:document type="my:document" validation="strict">
         <xsl:apply-templates select="/*"/>
    </xsl:document>
</xsl:variable>

If in later processing you want to output the document, you can use xsl:result-document:

<xsl:result-document href="doc.xml">
  <xsl:copy-of select="$tempDoc"/>
</xsl:result-document>

See Also

See Recipe 8.6 for information on XSLT 1.0 extensions to support multiple output documents.

6.10. Handling String Literals Containing Quote Characters

Problem

String literals containing quote characters are difficult to deal with in XSLT 1.0 because there is no escape character.

Solution

This problem is alleviated by an enhancement that allows a quote character to be escaped by repeating the character. Here we are trying to match either double quote delimited strings or single quote delimited strings. We use single quotes for the test attribute so we must use double quotes for the string literal regex. This forces us to escape all literal double quotes by repeating them in the regex. The rules of XML force us to use the entity &apos; instead of ', but that is simply to emphasize that XML escaping is a separate issue that by itself does not provide a solution. In other words, if you replaced " " by &quot;, you would make the XML parser happy, but the XSLT parser would still choke:

<xsl:if test=' matches(., " "" [^""] "" | &apos;[^&apos;] &apos;  ","x") '>
</xsl:if>

An equivalent solution is as follows:

<xsl:if test=" matches(., ' &quot; [^&quot;] &quot; | ''[^''] ''  ','x') ">
</xsl:if>

Discussion

The lack of an escape character in XSLT 1.0 was frustrating but one could always work around it by using variables and concatenation:

 <xsl:variable name="d-quote" select='"'/>
 <xsl:variable name="s-quote" select="'"/>
 <xsl:value-of select="concat('He said,', $d-quote, 'John', $s-quote, 's', 
 'dog turned green.', $d-quote)"/>

6.11. Understanding the New Capabilities of Old XSLT 1.0 Features

Problem

There are numerous little enhancements in XSLT 2.0, and it is difficult to get a quick handle on all of them.

Solution

Many of the capabilities in 2.0 are delivered as enhancements to existing 1.0 instructions and functions. These are not as obvious as those that are packaged as completely new instructions or functions. This section provides a one-stop overview of these enhancements.

xsl:apply-templates

  • The mode attribute can take the value #current to signify that the processing should continue with the current mode.

  • You can take advantage of support for sequences by writing a comma-separated list (e.g., <xsl:apply-templates select="title, heading, para"/> ) when you want to process title elements first, then heading elements, and then para elements. In 1.0, you had to write three separate apply-template instructions.

    Tip

    In both 1.0 and 2.0, you can write:

    <xsl:apply-templates select="title | heading | para"/>

    but this is an unordered application. The child nodes will be processed as they appear in the document and not as you order them in the select.

xsl:attribute

  • A big pet peeve of many XSLT 1.0 developers was the inability to write <xsl:attribute name="foo" select="10"/> since the select attribute was not supported. Now it is supported, and you should prefer it when defining simple attributes, although the sequence constructor syntax is still available.

  • The attributes type and validation were added in support of schema-aware processors. Type is used to specify native or user-defined types defined by W3C schema. Validation is used to specify how the attribute should be validated.

xsl:call-template

  • The result of the call can be an arbitrary sequence (such as a sequence of integers) rather than just a node set.

  • You will get an error if you supply a parameter (via xsl:with-param) that the called template does not define.

xsl:comment

  • As with xsl:attribute , a select attribute is now supported in addition to the sequence constructor form.

xsl:copy and xsl:copy-of

  • A new attribute, copy-namespaces="yes | no“, is available to specify whether the namespace nodes of an element should be copied. The default is yes, which is consistent with 1.0 behavior.

  • The attribute type was added to specify native or user-defined types defined by W3C Schema.

  • The attribute validation was added to specify how the result should be validated or whether existing type annotations should be preserved.

xsl:element

  • The attributes type and validation were added in support of schema-aware processors. Type is used to specify native or user-defined types defined by W3C Schema. Validation is used to specify how the element should be validated.

xsl:for-each

  • Can now process arbitrary sequences in addition to sequences of nodes:

    <xsl:for-each select="(1, 2, 3, 4, 5)">
       <xsl:value-of select="."/><xsl:text>)&#xa;</xsl:text>
    </xsl:for-each>

xsl:key

  • The match and use attributes can now refer to global variables, provided there is no circularity between the value of the variables and the key:

    <xsl:variable name="state" select=" 'active' "/>
    <xsl:key name="state-key" match="employee[@state=$state]" use="@type"/>
  • The use attribute can be replaced by the value of a sequence constructor:

    <!--defining the value of a key using some sophisticated processing -->
    <xsl:key name="sick-key" select="employee">
    <xsl:apply-templates select="record[@type='sick-day']" mode="sick-key"/>
    </xsl:key>
  • A collation can be provided to specify when two key values match. The available collations are implementation defined.

xsl:message

  • The terminate attribute can now be an attribute value template. This greatly simplifies global changes to the termination behavior.

  • A select attribute is now supported in addition to the sequence constructor form:

    <xsl:param name="terminate" select=" 'no' "/>
    
    <xsl:template match="employee">
      <xsl:if test="not(@type)">
        <xsl:message terminate="{$terminate}"
                     select=" 'Missing type attribute for employee' "/>
      <xsl:if>
    </xsl:template>

xsl:number

  • A select attribute has been added to allow nodes other than the context node to be numbered.

  • Formatting options have been enhanced to allow output as a word such as “one”, “two”, “three” according to the chosen language:

    <-- This outputs 'Ten' -->
    <xsl:number value="10" format="Ww"/>
    
    <-- This outputs 'ten' -->
    <xsl:number value="10" format="w"/>
    
    <-- This outputs 'TEN' -->
    <xsl:number value="10" format="W"/>

xsl:output

  • Can be given a name so that it can be used with the new xsl:result-document instruction. See Recipe 6.10 for details.

  • A new method, XHTML, is supported.

  • A new attribute, escape-uri-attributes, determines whether URI attributes in HTML and XHTML output should be escaped.

  • A new attribute, include-content-type, determines if a <meta> element should be added to the output to indicate the content type and encoding.

  • A new attribute, normalize-unicode, determines if Unicode characters should be normalized. See http://www.w3.org/TR/2002/WD-charmod-20020430/ for further details on normalization.

  • A new attribute, undeclare-namespaces, determines if namespaces in XML 1.1 should be undeclared when they go out of scope. A namespace is undeclared by xmlns:pre ="“, where pre is some prefix.

  • A new attribute, use-character-maps, allows you to provide a list of names of character maps. See Recipe 6.8.

xsl:param

  • An as attribute can be used to specify the type of the parameter.

  • A required attribute can be used to specify whether the parameter is mandatory or optional.

  • A tunnel attribute is used to indicate whether tunneling is supported for this parameter. See Recipe 6.6.

xsl:processing-instruction

  • A select attribute is now supported.

xsl:strip-space

  • The elements attribute can now handle name tests of the form *:Name indicating that all Name elements, regardless of namespace, should be stripped of whitespace. See Recipe 7.1.

xsl:stylesheet

  • A new default-validation attribute determines the default validation to use when new element and attribute nodes are created and the instruction that creates them lacks a validation attribute.

  • A new xpath-default-namespace attribute determines the namespace used for unprefixed element names in XPath expressions.

xsl:template

  • Multiple modes are supported via #all or list of modes in the mode attribute.

  • An as attribute allows the result type to be specified.

  • The match attribute supports matching on types.

  • The match attribute can reference global variables or parameters.

xsl:value-of

  • A new separator attribute allows sequences to be delimited. See Recipe 7.2.

  • The sequence constructor syntax is now supported in addition to the select attribute.

xsl:variable

  • An as attribute can be used to specify the type of the variable.

xsl:with-param

  • An as attribute can be used to specify the type of the parameter.

  • A tunnel attribute is used to indicate whether tunneling is supported for this parameter. See Recipe 6.6.

  • Can now be used with xsl:apply-imports and the new xsl:next-match instruction. See Recipe 6.6.

current()

  • The function can return generalized items (such as a string) in addition to nodes.

  • Can now be used within a match pattern to refer to the element that matched.

    <!-- Match nodes with descendant elements that have an attribute whose 
    value matches the local name matched element -->
    <xsl:template match="*[descendant::*/@* = local-name(current())]">

document()

  • The first argument can now be an arbitrary sequence of URIs.

    Tip

    The new XPath doc() function is a simpler alternative to XSLT’s document().

function-available()

  • Now takes a second argument that specifies the arity (number of arguments) of the function being tested.

key()

  • An optional third argument is used to specify the document that should be searched.

    Tip

    This eliminates the need to introduce xsl:for-each instructions whose only purpose is to switch contents to a new document so that key() can be used relative to that document:

    <!-- Code like this is no longer necessary -->
    <xsl:for-each select="doc('other.xml')">
     <xsl:if test="key('some-key', $val)">
         <!-- ...-->
     </xsl:if>
    </xsl:for-each>
    
    <!-- Write this instead -->
    <xsl:if test="key('some-key', $val,
                       doc('other.xml'))">
         <!-- ...-->
     </xsl:if>

system-property()

  • An xsl:product property is defined to return the name of the XSLT processor (e.g., Saxon).

  • An xsl:product-version property is defined to return the version of the XSLT processor (e.g., 8.1).

  • An xsl:is-schema-aware property is defined to return yes or no to indicate if the processor is schema aware.

  • An xsl:supports-serialization property is defined to return yes or no to indicate if the processor supports serialization.

  • An xsl:supports-backwards-compatibility property is defined to return yes or no to indicate if the processor supports backward-compatibility mode (e.g., version="1.0“).

Discussion

The main themes that govern the new capabilities of old XSLT instructions are type support and consistency. Here consistency largely means support for a select attribute when only a sequence constructor was supported in the past or visa versa.

Tip

One quirk of the XSLT creators is that they continue to introduce two distinct names for constructs where, to my way of thinking, one would have sufficed. In particular, consider the attributes as and type. They are never used together, so why not just type? A similar argument could have been made for eliminating xsl:with-param in favor of just xsl:param when 1.0 was specified.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset