Chapter 14. Extending and Embedding XSLT

I think everybody should have a greatWonderbra. There’s so many ways to enhance them, everybody does it.

Christina Aguilera

Introduction

Truly ambitious programmers are never satisfied with what they are given and are obsessively driven to enhance what they have. I say “ambitious” rather than “great” because greatness, in my opinion, comes from knowing when it is wiser to work within the system versus when it is best to extend the system. Nevertheless, this chapter is dedicated to extending the system both from the perspective of XSLT needing functionality best implemented in another language and from the perspective of other languages needing XSLT.

Extending XSLT is, by definition, a facility on the fringes of the specification. Extensions decrease the portability of an XSLT stylesheet. This is definitely a hazard when you use extensions provided natively by your XSLT processor or when implementing your own extension. It is true even if you implement your extensions in a highly portable language like Java. The most obvious reason is that some XSLT processors are not written in Java and are thus unlikely ever to support Java-based extensions. However, even if you only want your extensions to work in Java-based XSLT processors, you might still run into trouble because the extension mechanism of XSLT was not fully standardized in Version 1.0. This state of affairs improved in Version 1.1, but 1.1 is no longer an official XSLT release and many processors do not support it. Surprisingly, XSLT 2.0 took a step back from 1.1 by leaving the method of binding extension functions undefined.

EXSLT.org is a portal whose supporters are dedicated to establishing standards XSLT implementers can follow when implementing common extensions. Chapter 2 and Chapter 3 mentioned EXSLT with respect to math extensions and date extensions. EXSLT.org also organized other extension categories, some of which this chapter touches upon. It is certainly a site worth visiting before going off and implementing your own extension. There is a good chance that someone either developed such an extension or put some thought into how the extension should work.

In contrast to extending XSLT, embedding XSLT involves invoking XSLT transformations from another language without forking your XSLT processor in a separate process. You will see how XSLT can be accessed from within Java- and Perl-based programs.

When writing this chapter, it quickly became apparent that you could easily dedicate a whole book to extension and embedding—especially when you consider the cross between implementations, extension languages, and interesting examples. To keep this chapter manageable, I compromised by alternating between Xalan-Java 2 and Saxon and sticking mostly to Java and JavaScript. This chapter also discusses MSXML.

To prevent repetition, this section explains how to use extensions in Saxon, Xalan-Java 2, and MSXML.

14.1. Saxon Extension Functions

XSLT 1.0 (Saxon Version 6.5.4)

Saxon lets you access extension functions written in Java by using the interface defined in the XSLT 1.1 draft standard.

Java is the only extension language currently supported by Saxon, so extension function bindings are defined along the lines of the following example:

<xsl:stylesheet 
 version="1.1" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:Math="java:java.lang.Math" 
 exclude-result-prefixes="Math">
  
 <xsl:script implements-prefix="Math"
                   xmlns:Math="java:java.lang.Math"
                   language="java"
                   src="java:java.lang.Math"/>

Here the naming convention used for the namespace is not strictly required. However, if followed, it makes the xsl:script element optional. Hence, if you need to access an extension only once, you can write something like:

<xsl:stylesheet 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
 <xsl:variable name="PI" select="4 * Math:atan(1.0)" 
               xmlns:Math="java:java.lang.Math"/>
<!-- ... -->
</xsl:stylesheet>

Here the namespace encodes the binding to the Java implementation rather than the xsl:script. Note that these binding techniques are independent; if you use the xsl:script element, then the namespace’s content does not matter. On the other hand, if you omit the xsl:script, the namespace has the sole responsibility of binding the Java implementation.

XSLT 2.0 (Saxon Version 8.x)

The mechanism for invoking Java-based extension functions is largely the same as Saxon 6.5.4. However, the saxon:script element is deprecated and may be withdrawn in a future release according to the product’s web site. The preferred method is to declare a top-level namespace of the form java: followed by the fully-qualified class name.

<xsl:stylesheet version="2.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:math="java:java.lang.Math" 
 exclude-result-prefixes="math">

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
  <xsl:template match="/">
    <pi><xsl:value-of select="4 * math:atan(1.0)"/></pi>
  </xsl:template>
  
</xsl:stylesheet>

14.2. Saxon Extension Elements

XSLT 1.0 (Saxon Version 6.5.4)

Extension elements in Saxon can be implemented only in Java. You must define a namespace that binds the extension to its implementation. However, the rules are more explicit here than with extension functions. The namespace URI must end with a /, part of which is the fully qualified class name of a Java class that implements the interface com.icl.saxon.ExtensionElementFactory:

<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:acmeX="http://www.acmeX.com/com.acemX.SuperExtensionFactory"
extension-element-prefixes="acmeX">
   
<!-- ... -->

</xsl:stylesheet>

The prefix must also be listed in the stylesheet’s extension-element-prefixes attribute.

Details of the ExtensionElementFactory are covered in Recipe 14.15.

XSLT 2.0 (Saxon Version 8.x)

The mechanism is basically the same but the fully qualified class names of the Saxon library have changed (for example, net.sf.saxon.style.ExtensionElementFactory).

14.3. Xalan-Java 2 Extension Functions

XSLT 1.0 (Xalan-Java 2.6.2)

Extension functions in Xalan-Java 2 are bound using two Xalan extensions, xalan:component and xalan:script, where the relevant Xalan namespace URI is http://xml.apache.org/xslt.

The xalan:component element associates the extension namespace prefix with the names of extension functions or elements that will be defined by the enclosing xalan:script element. The xalan:script element defines the language used to implement the extension and its associated implementation. The choices here vary. Casual users of Java-based extensions should note that Xalan supports an abbreviated syntax that does not require the use of the xalan:component or xalan:script elements. Simply declare the namespace in one of the forms shown here, and invoke the Java function using the appropriate syntax. For scripting languages, this shortcut does not apply.

14.4. Java Extension Function Using the Class Format Namespace

<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"




xmlns:xalan="http://xml.apache.org/xslt"
xmlns:Math="xalan://java.lang.Math">
   
<xalan:component prefix="Math" functions="sin cos tan atan">
 <xalan:script lang="javaclass" src="xalan://java.lang.Math"/>
</xalan:component>
   
<xsl:variable name="pi" select="4.0 *"/>

<!-- ... -->

</xsl:stylesheet>

If you use this form and omit the xalan:component element, then your stylesheet can work with both Saxon and Xalan.

14.5. Java Extension Function Using the Package Format Namespace

<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:myJava="xalan://java.lang">
   
<xalan:component prefix="Math" functions="sin cos tan atan">
 <xalan:script lang="javaclass" src="java.lang"/>
</xalan:component>
   
<xsl:variable name="pi" select="4.0 * myJava:Math.atan(1.0)"/>
   
<!-- ... -->
   
</xsl:stylesheet>

This form is useful if you want to reference many classes within the same package.

14.6. Java Extension Function Using the Java Format Namespace

<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:java="http://xml.apache.org/xslt/java">
   
<xalan:component prefix="Math" functions="sin cos tan atan">
 <xalan:script lang="javaclass" src="http://xml.apache.org/xslt/java"/>
</xalan:component>
   
<xsl:variable name="pi" select="4.0 * java:java.lang.Math:atan(1.0)"/>
   
<!-- ... -->
   
</xsl:stylesheet>

Use this form if you want to access a wide variety of Java-based extensions with a single namespace declaration. The disadvantage is that each invocation becomes more verbose.

14.7. Scripting Extension Function Using Inline Script Code

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:trig="http://www.acmeX.com/extend/trig">
  
<xalan:component prefix="trig" functions="sin cons tan atan">
  <xalan:script lang="javascript">
    function sin (arg){ return Math.sin(arg);} 
    function cos (arg){ return Math.cos(arg);} 
    function tan (arg){ return Math.tan(arg);} 
    function atan (arg){ return Math.atan(arg);} 
  </xalan:script>
</xalan:component>
   
<xsl:variable name="pi" select="4.0 * trig:atan(1.0)"/>
   
<!-- ... -->
   
</xsl:stylesheet>

Saxon currently supports JavaScript, NetRexx, BML, JPython, Jacl, JScript, VBScript, and PerlScript, but appropriate extensions need to be obtained from third parties supporting the respective languages. See http://xml.apache.org/xalan-j/extensions.html#supported-lang for details.

14.8. Xalan-Java 2 Extension Elements

Extension elements in Xalan can be written in Java or in a supported scripting language. Java-based extensions elements also allow the shortcut syntax that dispenses with the xalan:component or xalan:script elements.

14.9. Java Extension Element

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:MyExt="xalan://com.AcmeX.MyExtensionElement">
extension-element-prefixes="MyExt">
  
<xalan:component prefix="MyExt" elements="superExtension">
            <xalan:script lang="javasclass" 
                src=" xalan:// com.AcmeX.MyExtensionElement"/>
            </xalan:component>
   
<xsl:template match="*">
     <myExt:superExtension attr1="val1" attr2="val2">
            <!-- ... -->
            <myExt:superExtension>
</xsl:template>
   
</xsl:stylesheet>

The implementation must be via a Java class and method with the following signature:

public class com.AcmeX.MyExtensionElement
{
   
public SomeType superExtension(
          org.apache.xalan.extensions.XSLProcessorContext ctx,
          org.apache.xalan.templates.ElemExtensionCall extensionElement)
 {
     //...
 }
   
}

where SomeType designates the return type, ctx is an instance of the processing context, and extensionElement is the node corresponding to the stylesheet’s extension element. In the method signature, you may also use the indicated types’ superclasses. The com.AcmeX.MyExtensionElement base class can be anything you like, including none, as shown here.

Whatever the function returns is put into the result tree, so use void if you do not want this effect. See Recipe Recipe 14.15 for further details on the XSLProcessorContext and ElemExtensionCall classes.

14.10. Scripting Extension Elements

Scripted extensions are very similar to Java extensions, except the extension is implemented inside of the xalan:script element:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:MyExt="xalan://com.AcmeX.MyExtensionElement">
extension-element-prefixes="MyExt">
  
<xalan:component prefix="rep" elements="repeat">
            <xalan:script lang="javascript">
            function superExtension(ctx, elem)
            {
            /* ... */
            return null ;
            }
            </xalan:script>
            </xalan:component>
   
<xsl:template match="*">
     <myExt:superExtension attr1="val1" attr2="val2">
            <!-- ... -->
            <myExt:superExtension>
</xsl:template>
</xsl:stylesheet>

As with Java, the return value is placed into the result tree, but you return null to disable this effect with scripting languages. See Recipe 14.13 for an example.

XSLT 2.0

At this time, I am unaware of any effort to upgrade Xalan to XSLT 2.0.

14.11. MSXML Extension Functions

XSLT 1.0

Microsoft’s MSXML 3.0, 4.0, and .NET XSLT processor is extensible via Jscript and VBScript. MSXML .NET adds C# extensibility. Extensions in MSXML are specified using the ms:script element:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:ms="urn:schemas-microsoft-com:xslt" 
  xmlns:myExt="urn:AcmeX.com:xslt">
   
  <ms:script language="JScript" implements-prefix="myExt">
               <![CDATA[
               function superExtension(ops) {
               /* ... */
               return result;
               }
               ]]>
               </ms:script>
   
</xsl:stylesheet>

XSLT 2.0

At the present time, Microsoft has no plans to implement XSLT 2.0 on the .NET platform, choosing instead to only support XQuery. However, an effort is well on the way to port Saxon to C# .NET (http://sourceforge.net/projects/saxondotnet/ and http://weblog.saxondotnet.org/ ).

See Also

The XSLT C library for Gnome (libxslt) also supports extensibility. See http://xmlsoft.org/XSLT/extensions.html for details.

14.12. Using Saxon’s and Xalan’s Native Extensions

Problem

You want to know how to exploit some of the useful extensions available in these popular XSLT implementations.

Solution

XSLT 1.0

This recipe is broken into a bunch of mini-recipes showcasing the most important Saxon and Xalan extensions. For all examples, the saxon namespace prefix is associated with http://icl.com/saxon, and the xalan namespace prefix is associated with http://xml.apache.org/xslt.

You want to output to more than one destination

This book has used Saxon’s facility several times to output results to more than one file. Saxon uses the saxon:output element. It also provides the xsl:document element, but it will only work if the stylesheet version attribute is 1.1 and is therefore not preferred. The href attribute specifies the output destination. This attribute can be an attribute value template:

<saxon:output href="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</saxon:output>

Xalan takes a significantly different approach to multidestination output. Rather than one instruction, Xalan gives you three: redirect:open, redirect:close, and redirect:write. The extension namespace associated with these elements is xmlns:redirect = "org.apache. xalan.xslt.extensions.Redirect“. For the most common cases, you can get away with using redirect:write by itself because if used alone, it will open, write, and close the file.

Each element includes a file attribute and/or a select attribute to designate the output file. The file attribute takes a string, so you can use it to specify the output filename directly. The select attribute takes an XPath expression, so you can use it to generate the output file name dynamically. If you include both attributes, the redirect extension first evaluates the select attribute and falls back to the file attribute if the select attribute expression does not return a valid filename:

<xalan:write file="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</xalan:write>

By using Xalan’s extended capabilities, you can switch from writing a primary output file to other secondary files while the primary remains open. This step undermines the no-side-effects nature of XSLT, but presumably, Xalan will ensure a predictable operation:

<xsl:template match="doc">
<xalan:open file="regular.xml"/>
     <xsl:apply-templates select="*"/>
<xalan:close file="regular.xml"/>
<xsl:template/>
   
<xsl:template match="regular">
  <xalan:write file="regular.xml">
     <xsl:copy-of select="."/>
  </xalan:write/>
</xsl:template>
   
<xsl:template match="*">
  <xsl:variable name="file" select="concat(local-name(),'.xml')"/>
  <xalan:write select="$file">
     <xsl:copy-of select="."/>
  </xalan:write/>
</xsl:template>

XSLT 2.0 provides native support for multiple result destinations via a new element called xsl:result-document:

 <xsl:result-document format="html" href="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</xsl:result-document>

You want to split a complex transformation into a series of transformations in a pipeline

Developers who have worked a lot with Unix are intimately familiar with the notion of a processing pipeline in which the output of a command is fed into the input of another. This facility is also available in other operating systems, such as Windows. The genius of the pipelining approach to software development is that it enables the assembly of complex tasks from more basic commands.

Since an XSLT transformation is ultimately a tree-to-tree transformation, applying the pipelining approach is natural. Here the result tree of one transform becomes the input tree of the next. You have seen numerous examples in which the node-set extension function can create intermediate results that can be processed by subsequent stages. Alternatively, Saxon provides this functionality via the saxon:next-in-chain extension attribute of xsl:output. The saxon:next-in-chain attribute directs the output to another stylesheet. The value is the URL of a stylesheet that should be used to process the output stream. The output stream must always be pure XML, and attributes that control the output’s format (e.g., method, cdata-section-elements, etc.) have no effect. The second stylesheet’s output is directed to the destination that would have been used for the first stylesheet if no saxon:next-in-chain attribute were present.

Xalan has a different approach to this functionality; it uses a pipeDocument extension element. The nice thing about pipeDocument is that you can use it in an otherwise empty stylesheet to create a pipeline between independent stylesheets that do not know they are used in this way. The Xalan implementation is therefore much more like the Unix pipe because the pipeline is not hardcoded into the participating stylesheets. Imagine that a stylesheet called strip.xslt stripped out specific elements from an XML document representing a book, and a stylesheet called contents.xslt created a table of contents based on the hierarchical structure of the document’s markup. You could create a pipeline between the stylesheets as follows:

<xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:pipe="xalan://PipeDocument"
        extension-element-prefixes="pipe">
 
 <xsl:param name="source"/>
 <xsl:param name="target"/>
 <!-- A list of elements to preserve. All others are stripped. -->
 <xsl:param name="preserve-elems"/>
 
 <pipe:pipeDocument source="{$source}" target="{$target}">
   
                  <stylesheet href="strip.xslt">
                  <param name="preserve-elems" value="{$preserve-elems}"/>
                  </stylesheet>
   
                  <stylesheet href="contents.xslt"/>
   
                  </pipe:pipeDocument>
 
</xsl:stylesheet>

This code would create a table of contents based on the specified elements without disabling the independent use of strip.xsl or contents.xsl.

You want to work with dates and times

Chapter 4 provided a host of recipes dealing with dates and times but no pure XSLT facility that could determine the current date and time. Both Saxon and Xalan implement core functions from the EXSLT dates and times module. This section includes EXSLT’s date-and-time documentation for easy reference. The functions are shown in Table 14-1 with their return type, followed by the function and arguments. A question mark (?) indicates optional arguments.

Table 14-1. EXSLT’s date-and-time functions

Function

Behavior

string date: date-time()

The date:date-time function returns the current date and time as a date/time string. The returned date/time string must be in the format XML schema defines as the lexical representation of xs:dateTime.

string date: date(string?)

The date:date function returns the date specified in the date/time string given as the argument. If no argument is given, the current local date/time, as returned by date:date-time, is used as a default argument.

string date: time(string?)

The date:time function returns the time specified in the date/time string given as the argument. If no argument is given, the current local date/time, as returned by date:date-time, is used as a default argument.

number date: year(string?)

The date:year function returns the date’s year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as a default argument.

boolean date: leap-year(string?)

The date:leap-year function returns true if the year given in a date is a leap year. If no argument is given, then the current local date/time, as returned by date:date-time, is used as a default argument.

number date: month-in-year(string?)

The date:month-in-year function returns the month of a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument

string date: month-name(string?)

The date:month-name function returns the full name of the month of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: month-abbreviation(string?)

The date:month-abbreviation function returns the abbreviation of the month of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: week-in-year(string?)

The date:week-in-year function returns the week of the year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument. Counting follows ISO 8601: Week 1 in a year is the week containing the first Thursday of the year, with new weeks beginning on Mondays.

number date: day-in-year(string?)

The date:day-in-year function returns the day of a date in a year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-in-month(string?)

The date:day-in-month function returns the day of a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-of-week-in-month(string?)

The date:day-of-week-in-month function returns the day of the week in a month as a number (e.g., 3 for the third Tuesday in May). If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-in-week(string?)

The date:day-in-week function returns the day of the week given in a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: day-name(string?)

The date:day-name function returns the full name of the day of the week of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: day-abbreviation(string?)

The date:day-abbreviation function returns the abbreviation of the day of the week of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: hour-in-day(string?)

The date:hour-in-day function returns the hour of the day as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: minute-in-hour(string?)

The date:minute-in-hour function returns the minute of the hour as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: second-in-minute(string?)

The date:second-in-minute function returns the second of the minute as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:date="http://exslt.org/dates-and-times">
   
<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
  <html>
    <head><title>My Dull Home Page</title></head>
    <body>
      <h1>My Dull Homepage</h1>
      <div>It's <xsl:value-of select="date:time()"/> on <xsl:value-of 
      select="date:date(  
                  )"/> and this page is as dull as it was yesterday.</div>
    </body>
  </html>
   
</xsl:template>
     
</xsl:stylesheet>

XSLT 2.0 has direct support for dates and times, as discussed in Chapter 4, so these extensions are not necessary.

You need a more efficient implementation of set operations

Chapter 9 investigated various means of implementing set operations other than set union, which XPath supplies natively via the union operator (|). These solutions were not necessarily the most efficient or obvious.

Both Saxon and Xalan remedy this problem by implementing the set operations defined by EXSLT’s set module (see Table 14-2).

Table 14-2. EXSLT’s set module’s set operations

Function

Behavior

Node-set set: difference(node-set, node-set)

The set:difference function returns the difference between two node sets—nodes that are in the node set passed as the first argument that are not in the node set passed as the second argument.

Node-set set: intersection(node-set, node-set)

The set:intersection function returns a node set comprising the nodes that are within both the node sets passed to it as arguments.

Node-set set: distinct(node-set)

The set:distinct function returns a subset of the nodes contained in the node set NS passed as the first argument. Specifically, it selects a node N if no node in NS has the same string value as N, and that precedes N in document order.

Boolean set: has-same-node(node-set, node-set)

The set:has-same-node function returns true if the node set passed as the first argument shares nodes with the node set passed as the second argument. If no nodes are in both node sets, it returns false.

Node-set set: leading(node-set, node-set)

The set:leading function returns the nodes in the node set passed as the first argument that precede, in document order, the first node in the node set passed as the second argument. If the first node in the second node set is not contained in the first node set, then an empty node set is returned. If the second node set is empty, then the first node set is returned.

Node-set set: trailing(node-set, node-set)

The set:trailing function returns the nodes in the node set passed as the first argument that follow, in document order, to the first node in the node set passed as the second argument. If the first node in the second node set is not contained in the first node set, then an empty node set is returned. If the second node set is empty, then the first node set is returned.

set:distinct is a convenient way to remove duplicates, as long as equality is defined as string-value equality:

<xsl:varaible name="firstNames" select="set:destinct(person/firstname)"/>

set:leading and set:trailing can extract nodes bracketed by other nodes. For example, Recipe 12.9 used a complex expression to locate the xslx:elsif and xslx:else nodes that went with your enhanced xslx:if. Extensions can simplify this process:

<xsl:apply-templates 
        select="set:leading(following-sibling::xslx:else | 
                          following-sibling::xslx:elsif, following-sibling::xslx:if)"/>

This code specifies that you select all xslx:else and xslx:elseif siblings that come after the current node, but before the next xslx:if.

You want extended information about a node in the source tree

Xalan provides functions that allow you to get information about the location of nodes in the source tree. Saxon 6.5.2 provides only saxon:systemId and saxon:lineNumber. Debugging is one application of these functions. To use the functions, set the TransformerFactory source_location attribute to true with either the command-line utility -L flag or the TransformerFactory.setAttribute() method.

systemId()

systemId(node-set)

Returns the system ID for the current node and the first node in the node set, respectively.

lineNumber()

lineNumber(node-set)

Returns the line number in the source document for the current node and the first node in the node set, respectively. This function returns -1 if the line number is unknown (for example, when the source is a DOM Document).

columnNumber()

columnNumber(node-set)

Returns the column number in the source document for the current node and the first node in the node set, respectively. This function returns -1 if the column number is unknown (for example, when the source is a DOM Document):

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:xalan="http://xml.apache.org/xslt"
 xmlns:info="xalan://org.apache.xalan.lib.NodeInfo">
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
     
  <xsl:template match="foo">
    <xsl:comment>Matched a foo on line <xsl:value-of 
    select="info:lineNumber()"/> and column <xsl:value-of 
    select="info:columnNumber()"/>.</xsl:comment>
    <!-- ... -->
  </xsl:template>     
     
</xsl:stylesheet>

You want to interact with a relational database

Interfacing XSLT to a relational database opens up a whole new world of possibilities. Both Saxon and Xalan have extensions to support SQL. If you write stylesheets that modify databases, you violate the XSLT no-side-effects rule.

Warning

Michael Kay has this to say about Saxon’s SQL extensions, “These are not intended as being necessarily a production-quality piece of software (there are many limitations in the design), but more as an illustration of how extension elements can be used to enhance the capability of the processor.”

Saxon provides database interaction via five extension elements: sql:connect, sql:query, sql:insert, sql:column, and sql:close. Anyone who ever interacted with a relational database though ODBC or JDBC should feel comfortable using these elements.

<sql:connect driver="jdbc-driver" database="db name" user="user name"password="user password"/>

Creates a database connection. Each attribute can be an attribute value template. The driver attribute names the JDBC driver class, and the database must be a name that JDBC can associate with an actual database.

<sql:query table="the table" column="column names" where="where clause" row-tag="row element name" column-tag="column element name" disable-output-escaping="yes or no"/>

Performs a query and writes the results to the output tree using elements to represent the rows and columns. The names of these elements are specified by row-tag and col-tag, respectively. The column attribute can contain a list of columns or use * for all.

<sql:insert table="table name">

Performs an SQL INSERT. The child elements (sql:column) specify the data to be added to the table.

<sql:column name="col name" select="xpath expr"/>

Used as a child of sql:insert. The value can be specified by the select attribute or by the evaluation of the sql:column’s child elements. However, in both cases only the string value can be used. Hence, there is no way to deal with other standard SQL data types.

Xalan’s SQL support is richer than Saxon’s. This chapter covers only the basics. The “See Also” section provides pointers to more details. Unlike Saxon, Xalan uses extension functions that provide relational database access.

sql:new(driver, db, user, password)

Establishes a connection.

sql:new(nodelist)

Sets up a connection using information embedded as XML in the input document or stylesheet. For example:

<DBINFO>
  <dbdriver>org.enhydra.instantdb.jdbc.idbDriver</dbdriver>
  <dburl>jdbc:idb:../../instantdb/sample.prp</dburl>
  <user>jbloe</user>
  <password>geron07moe</password>
</DBINFO>
   
<xsl:param name="cinfo" select="//DBINFO"/>
<xsl:variable name="db" select="sql:new($cinfo)"/>
sql:query(xconObj, sql-query)

Queries the database. The xconObj is returned by new(). The function returns a streamable result set in the form of a row-set node. You can work your way through the row set one row at a time. The same row element is used repeatedly, so you can begin transforming the row set before the entire result set is returned.

sql:pquery(xconObj,sql-query-with-params)

sql:addParameter(xconObj, paramValue)

sql:addParameterFromElement(xconObj,element)

sql:addParameterFromElement(xconObj,node-list)clearParameters(xconObj)

Used together to implement parameterized queries. Parameters take the form of ? characters embedded in the query. The various addParameter() functions set these parameters with actual values before the query is executed. Use clearParameters() to make the connection object forget about prior values.

sql:close(xconObj)

Closes the connection to the database.

The sql:query() and sql:pquery() extension functions return a Document node that contains (as needed) an array of column-header elements, a single row element that is used repeatedly, and an array of col elements. Each column-header element (one per column in the row set) contains an attribute (ColumnAttribute) for each column descriptor in the ResultSetMetaData object. Each col element contains a text node with a textual representation of the value for that column in the current row.

You can find more information on using XSLT to access relational data in Doug Tidwell’s XSLT (O’Reilly, 2001).

You want to dynamically evaluate an XPath expression created at runtime

Saxon and Xalan have a very powerful extension function called evaluate that takes a string and evaluates it as an XPath expression. EXSLT.org also defines dyn:evaluate() which will give you greater portability. Such a feature was under consideration for XSLT 2.0, but at this time, the XSLT 2.0 working group decided not to pursue it. Their justification is that dynamic evaluation " . . . has significant implications on the runtime architecture of the processor, as well as the ability to do static optimization.”

Dynamic capabilities can come in handy when creating a table-driven stylesheet. The following stylesheet can format information on people into a table, but you can customize it to handle an almost infinite variety of XML formats simply by altering entries in a table:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:saxon="http://icl.com/saxon" 
 xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths" 
 exclude-result-prefixes="paths">
   
<xsl:output method="html"/>
   
<!-- This parameter is used to specify a document con taining a table that -->
<!-- specifies how to locate info on people -->
<xsl:param name="pathsDoc"/>
     
<xsl:template match="/">
<html>
  <head>
    <title>People</title>
  </head>
  <body>
  <!-- We load an Xpath expression out of a table [Symbol_Wingdings_224]
  <!-- in an external document. -->
  <xsl:variable name="peoplePath" 
       select="document($pathsDoc)/*/paths:path[@type='people']/@xpath"/>
    <table>
    <tbody>
      <tr>
        <th>First</th>
        <th>Last</th>
      </tr>
      <!-- Dynamically evaluate the xpath that locates information on --> 
      <!-- each person -->
      <xsl:for-each select="saxon:evaluate($peoplePath)">
        <xsl:call-template name="process-person"/>
      </xsl:for-each>
    </tbody>
  </table>
  </body>
</html>
</xsl:template>
   
<xsl:template name="process-person">
  <xsl:variable name="firstnamePath" 
      select="document($pathsDoc)/*/paths:path[@type='first']/@xpath"/> 
  <xsl:variable name="lastnamePath" 
      select="document($pathsDoc)/*/paths:path[@type='last']/@xpath"/> 
  <tr>
    <!-- Dynamically evaluate the xpath that locates the person -->
    <!-- specific info we want to process -->
    <td><xsl:value-of select="saxon:evaluate($firstnamePath)"/></td>
                  <td><xsl:value-of select="saxon:evaluate($lastnamePath)"/></td>
  </tr>
</xsl:template>
   
</xsl:stylesheet>

You can use this table to process person data encoded as elements:

<paths:paths 
  xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths">
  <paths:path type="people" xpath="people/person"/>
  <paths:path type="first" xpath="first"/>
  <paths:path type="last" xpath="last"/>
</paths:paths>

Add this table to process person data encoded as attributes:

<paths:paths xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths">
  <paths:path type="people" xpath="people/person"/>
  <paths:path type="first" xpath="@first"/>
  <paths:path type="last" xpath="@last"/>
</paths:paths>

You want to change the value of a variable

Almost any book you read on XSLT will describe the inability to change the value of variables and parameters once they are bound as a feature of XSLT rather than a defect. This is true because it prevents a certain class of bugs, makes stylesheets easier to understand, and enables certain performance optimizations. However, sometimes being unable to change the values is simply inconvenient. Saxon provides a way around this obstacle with its saxon:assign extension element. You can use saxon:assign only on variables designated as assignable with the extension attribute saxon:assignable="yes“:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:saxon="http://icl.com/saxon"
extension-element-prefixes="saxon">
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:variable name="countFoo" select="0" saxon:assignable="yes"/>
   
<xsl:template name="foo">
    <saxon:assign name="countFoo" select="$countFoo + 1"/>
    <xsl:comment>This is invocation number <xsl:value-of select="$countFoo"/> of 
template foo.</xsl:comment>       
</xsl:template>
   
<!- ... -->
   
</xsl:stylesheet>

You want to write first-class extension functions in XSLT 1.0

Many examples in this book are implemented as named templates accessed via xsl:call-template. Often, this implementation is inconvenient and awkward because what you really want is to access this code as first-class functions that can be invoked as easily as native XPath functions. This is supported in XSLT 2.0, but in 1.0, you might consider using an EXSLT extension called func:function that is implemented by Saxon and the latest version of Xalan (Version 2.3.2 at this writing). The following code is a template from Chapter 2 reimplemented as a function:

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:func="http://exslt.org/functions" 
  xmlns:str="http://www.ora.com/XSLTCookbook/namespaces/strings"
  extension-element-prefixes="func">
   
  <xsl:template match="/">
    <xsl:value-of 
      select="str:substring-before-last('123456789a123456789a123',
  </xsl:template>
     
  <func:function name="str:substring-before-last"> 
  <xsl:param name="input"/>
  <xsl:param name="substr"/>
  
  <func:result>
    <xsl:if test="$substr and contains($input, $substr)">
      <xsl:variable name="temp" 
                    select="substring-after($input, $substr)" />
      <xsl:value-of select="substring-before($input, $substr)" />
      <xsl:if test="contains($temp, $substr)">
        <xsl:value-of 
             select="concat($substr,
                            str:substring-before-last($temp, $substr))"/>
      </xsl:if>
    </xsl:if>
  </func:result>
</func:function>
     
</xsl:stylesheet>

XSLT 2.0

Most of the extensions available in Saxon 6 are also available in Saxon 8. However, some, such as saxon:function()are no longer needed in XSLT 2.0. Some additional functions exist to enhance the abilities of XQuery because these capabilities exist in XSLT already. For example, saxon:index() and saxon:find() achieve similar results to XSLT keys (xsl:key and key() function. However, there are a few additional goodies in Saxon 8 that are not available in the older product.

You want to get an XPath expression to the current node

The saxon:path() function takes no arguments and returns a string whose value is an XPath expression to the context node. This is similar to the XSLT solution (introduced in Recipe 15.2 for debugging purposes).

You want to handle and recover from dynamic errors

Many modern languages like Java and C++ have a try-throw-catch mechanism for handling dynamic (runtime) error. Saxon adds a saxon:try pseudo function in its commercial version (Saxon-SA) that provides similar if more limited capabilities. saxon:try takes an expression as its first argument. The expression is evaluated and if a dynamic error occurs (e.g., division by zero, type errors, etc.) the value of the second argument is returned:

<xsl:template match="/">
  <test>
    <xsl:value-of select="saxon:try(*[0], 'Index out of bounds')"/>
    </test>
 </xsl:template>

The value of the second argument could be a error string as we show in this example, or a default value. Michael Kay calls saxon:try a pseudo function because it does not follow the rules of a normal XPath function since it only evaluates the second argument if the first fails.

Discussion

Using vendor-specific extensions is a double-edged sword. On the one hand, they can provide you with the ability to deliver an XSLT solution faster or more simply than you could if you constrained yourself to standard XSLT. In a few cases, they allow you to do things that are impossible with standard XSLT. On the other hand, they can lock you into an implementation whose future is uncertain.

EXSLT.org encourages implementers to adopt uniform conventions for the most popular extensions, so you should certainly prefer an EXSLT solution to a vendor-specific one if you have a choice.

Another tactic is to avoid vendor-specific implementations altogether in favor of your own custom implementation. In this way, you control the source and can port the extension to more than one processor, if necessary. Recipe 14.2, Recipe 14.3, and Recipe 14.4 address custom extensions.

See Also

This book has not covered all of the extensions available in Saxon and Xalan. Additional information and features of Saxon extensions can be found at http://saxon.sourceforge.net/saxon6.5.4/extensions.html or http://www.saxonica.com/documentation/documentation.html (the XSLT 2.0). Additional Xalan extension information can be found at http://xml.apache.org/xalan-j/extensionslib.html.

14.13. Extending XSLT with JavaScript

Problem

You want to execute JavaScript to implement functionality missing from XSLT.

Solution

The following examples use Xalan-Java 2’s ability to invoke scripting languages such as JavaScript. A typical use of a JavaScript-based extension invokes a function that is not native to XSLT or XPath. One common example is trigonometric functions:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:trig="http://www.ora.com/XSLTCookbook/extend/trig">
  
<xsl:output method="text"/>
   
<xalan:component prefix="trig" functions="sin">
               <xalan:script lang="javascript">
               function sin (arg){ return Math.sin(arg);} 
               </xalan:script>
               </xalan:component>
   
<xsl:template match="/">
  The sin of 45 degrees is <xsl:text/>
  <xsl:value-of select="trig:sin(3.14159265 div 4)"/>
</xsl:template>
     
</xsl:stylesheet>

With JavaScript, you can actually implement functions that have side effects and objects that maintain state:[1]

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:count="http://www.ora.com/XSLTCookbook/extend/counter">
  
<xsl:output method="text"/>
   
<xalan:component prefix="count" 
                 functions="counter nextCount resetCount makeCounter">
               <xalan:script lang="javascript">
   
    
               function counter(initValue)
               {
               this.value = initValue ;
               } 
       
               function nextCount(ctr) 
               {
               return ctr.value++ ;
               }
   
               function resetCount(ctr, value) 
               {
               ctr.value = value  ;
               return "" ;
               }
   
               function makeCounter(initValue)
               {
               return new counter(initValue) ;
               }
    
               </xalan:script>
               </xalan:component>
   
<xsl:template match="/">
  <xsl:variable name="aCounter" select="count:makeCounter(0)"/>
  Count: <xsl:value-of select="count:nextCount($aCounter)"/>
  Count: <xsl:value-of select="count:nextCount($aCounter)"/>
  Count: <xsl:value-of select="count:nextCount($aCounter)"/>
  Count: <xsl:value-of select="count:nextCount($aCounter)"/>
  <xsl:value-of select="count:resetCount($aCounter,0)"/>
  Count: <xsl:value-of select="count:nextCount($aCounter)"/>

</xsl:template>
     
</xsl:stylesheet>

In most implementations, this code results in:

  Count: 0
  Count: 1
  Count: 2
  Count: 3
  Count: 0

A processor that expects no side effects can potentially change the order of evaluation and undermine the expected results.

Here you can access JavaScript’s regular expression library:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:regex="http://www.ora.com/XSLTCookbook/extend/regex">
  
<xsl:output method="text"/>
   
<xalan:component prefix="regex" 
     functions="match leftContext rightContext getParenMatch makeRegExp">
               <xalan:script lang="javascript">
   
               function Matcher(pattern)
               {
               this.re = new RegExp(pattern) ;
               this.re.compile(pattern) ;
               this.result="" ;
               this.left="" ;
               this.right="" ;
               } 
   
               function match(matcher, input)
               {
               matcher.result = matcher.re.exec(input) ;
               matcher.left = RegExp.leftContext ;
               matcher.right = RegExp.rightContext ;
               return matcher.result[0] ;
               }
           
               function leftContext(matcher) 
               {
               return matcher.left ;
               }
   
               function rightContext(matcher) 
               {
               return matcher.right ;
               }
   
               function getParenMatch(matcher, which)
               {
               return matcher.result[which] ;
               }
    
               function makeRegExp(pattern)
               {
               return new Matcher(pattern) ;
               }
    
               </xalan:script>
               </xalan:component>
   
<xsl:template match="/">
  <xsl:variable name="dateParser" 
       select="regex:makeRegExp('(dd?)[/-](dd?)[/-](d{4}|d{2})')"/>
  Match: <xsl:value-of 
              select="regex:match($dateParser, 
                     'I was born on 05/03/1964 in New York City.')"/>
  Left: <xsl:value-of select="regex:leftContext($dateParser)"/>
  Right: <xsl:value-of select="regex:rightContext($dateParser)"/>
  Month: <xsl:value-of select="regex:getParenMatch($dateParser, 1)"/>
  Day: <xsl:value-of select="regex:getParenMatch($dateParser,2)"/>
  Year: <xsl:value-of select="regex:getParenMatch($dateParser,3)"/>
</xsl:template>     
</xsl:stylesheet>

This example results in:

  Match: 05/03/1964
  Left: I was born on
  Right:  in New York City.
  Month: 05
  Day: 03
  Year: 1964

In addition, Xalan lets you create JavaScript-based extension elements. Here is an extension element that repeats the execution of its content n times. It is useful for duplicating strings, structure, or as a simple looping construct:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
xmlns:rep="http://www.ora.com/XSLTCookbook/extend/repeat" 
extension-element-prefixes="rep">
  
<xsl:output method="xml"/>
   
<xalan:component prefix="rep" elements="repeat">
               <xalan:script lang="javascript">
               <![CDATA[
               function repeat(ctx, elem)
               {
               //Get the attribute value n as an integer
               n = parseInt(elem.getAttribute("n")) ;
               //get the transformer which is required to execute nodes
               xformer = ctx.getTransformer() ;
               //Execute content of repeat element n times
               for(var ii=0; ii < n; ++ii)
               {
               node = elem.getFirstChild() ;
               while(node)
               {
               node.execute(xformer) ;
               node = node.getNextSibling() ;
               }
               }
               //The return value is inserted into the output
               //so return null to prevent this
               return null ;
               } 
               ]]>
               </xalan:script>
               </xalan:component>
   
<xsl:template match="/">
  <tests>
    <!--Use to duplicate text-->
    <test1><rep:repeat n="10">a</rep:repeat></test1>
    <!--Use to duplicate structure-->
    <test2>
      <rep:repeat n="10">
        <Malady>
          <FirstPart>Shim's</FirstPart>
          <SecondPart>Syndrome</SecondPart>
        </Malady>
      </rep:repeat>
    </test2>
    <!--Use to repeat the execution of xslt code -->
    <!--(which is really what we've been doing in test1 and test2)-->
    <test3>
      <rep:repeat n="10">
        <xsl:for-each select="*">
          <xsl:copy/>
        </xsl:for-each>
      </rep:repeat>
    </test3>
  </tests>
</xsl:template>
   
</xsl:stylesheet>

Discussion

Creating extensions in JavaScript (or another embedded scripting language) is seductive because although you need to switch languages mentally, there is no need to switch to a different development environment or invoke a separate compiler. However, possibly the greatest benefit is that languages like JavaScript or VBScript are very easy to learn.[2]

The challenge to using scripting-based extensions is that the documentation on how to tie XSLT and the scripts together tends to be thin. A few pointers are in order. Most of this information is available in the Xalan extension documents (http://xml.apache.org/xalan-j/extensions.html), but it is easy to miss when you are in a hurry to get something working.

First, script-based extensions are available only in Xalan-Java, not Xalan C++.

Second, make sure you add bsf.jar and js.jar (for JavaScript) to your class path either on the command line when invoking Java from a Unix shell:

java -cp /xalan/bin/xalan.jar:/xalan/bin/xercesImpl.jar:/xalan/bin/bsf.jar: /xalan/
bin/js.jar org.apache.xalan.xslt.Process -in input.xml -xsl trans.xslt

or in the CLASSPATH environment variable:

export CLASSPATH=/xalan/bin/xalan.jar:/xalan/bin/xercesImpl.jar:/
xalan/bin/bsf.jar:/xalan/bin/js.jar

For Windows, replace colon path separators with semicolons and use set rather than export.

Third, note that js.jar is not part of the Xalan distribution. You must get it separately from Mozilla.org (http://www.mozilla.org/rhino/).

Once you configure your environment correctly, you need to specify your stylesheet to conform to Xalan’s requirements for script-based extensions. See the introduction of this chapter for the gory details.

Implementing extension functions are much easier than implementing extension elements, and the examples in the “Solution” section (in conjunction with Xalan’s documentation) should be sufficient. The rest of this section focuses on extension elements.

When an extension element’s associated function is invoked, it is automatically passed two objects. The first is a context of type org.apache.xalan.extensions.XSLProcessorContext. This object is a handle for getting several other useful objects, such as the context node, the Stylesheet object, and the transformer. It also implements a function outputToResultTree(Stylesheet stylesheetTree, java.lang.Object obj) that can output data to the result tree. That fact that all these objects are Java based but accessible from JavaScript is a function of the Bean Scripting Framework (http://jakarta.apache.org/bsf/), which is contained in bsf.jar.

The second object is an instance of org.apache.xalan.templates.ElemExtensionCall. This object represents the extension element itself. From this element, you can extract attributes and child elements that your script needs to interpret to implement the extension’s functionality. This is done using standard DOM function calls such as getAttribute(), getFirstChild(), getLastChild(), etc.

There are few limitations on what you can do with a scripting-based extension element. You simply must be capable and willing to dig into the Xalan-Java source code and documentation to find out how to make it do what you want. However, you should use scripting-based extensions only for simple tasks because they are significantly slower than native Java extensions.

See Also

The definitive source for information on Xalan extensibility is http://xml.apache.org/xalan-j/extensionslib.html.

14.14. Adding Extension Functions Using Java

Problem

You want to add your own custom extension functions written in Java.

Solution

This chapter’s introduction covered the mechanism for binding the stylesheet to the Java implementations, so this section concentrates on examples.

Chapter 2 showed how to convert numbers from base 10 to other bases (such as base 16 (hex)). You can implement a hex converter in Java easily:

package com.ora.xsltckbk.util;
   
public class HexConverter 
{
   
  public static String toHex(String intString) 
  {
    try 
    {
       Integer temp = new Integer(intString) ;
       return new String("0x").concat(Integer.toHexString(temp.intValue())) ;
     } 
     catch (Exception e) 
    {
       return new String("0x0") ;
     }
  }
}

You can probably tell by the way the return value is formatted with a leading 0x that this particular function will be used in a code-generation application. The following example shows how it might be used:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt" 
xmlns:hex="xalan://com.ora.xsltckbk.util.HexConverter" 
exclude-result-prefixes="hex xalan">
   
<xsl:template match="group">
enum <xsl:value-of select="@name"/> 
{
  <xsl:apply-templates mode="enum"/>
} ;
</xsl:template> 
   
<xsl:template match="constant" mode="enum">
  <xsl:variable name="rep">
    <xsl:call-template name="getRep"/>
  </xsl:variable>
  <xsl:value-of select="@name"/> = <xsl:value-of select="$rep"/>
  <xsl:if test="following-sibling::constant">
    <xsl:text>,</xsl:text>
  </xsl:if>
</xsl:template> 
   
<xsl:template match="constant">
  <xsl:variable name="rep">
    <xsl:call-template name="getRep"/>
  </xsl:variable>
const int <xsl:value-of select="@name"/> = <xsl:value-of select="$rep"/> ;
</xsl:template> 
   
<xsl:template name="getRep">
  <xsl:choose>
    <xsl:when test="@rep = 'hex'">
      <xsl:value-of select="hex:toHex(@value)"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="@value"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
   
</xsl:stylesheet>

The next example shows how you can construct Java objects and call their methods. Dealing with text layout is difficult when transforming XML to Scalable Vector Graphics. SVG gives you no way to determine how long a string will be when it is rendered. Fortunately, Java provides the functionality you need. The question is whether Java’s opinion of how long a string will be when rendered in a particular font matches the SVG engine opinion. Nevertheless, this idea is seductive enough to try:

package com.ora.xsltckbk.util ;
import java.awt.* ;
import java.awt.geom.* ;
import java.awt.font.* ;
import java.awt.image.*;
   
public class SVGFontMetrics
{
  public SVGFontMetrics(String fontName, int size)
  {
    m_font = new Font(fontName, Font.PLAIN, size) ;
    BufferedImage bi
        = new BufferedImage(1, 1, BufferedImage.TYPE_INT_ARGB);
    m_graphics2D = bi.createGraphics() ;
  }
   
  public SVGFontMetrics(String fontName, int size, boolean bold, 
                        boolean italic)
  {
    m_font = new Font(fontName, style(bold,italic) , size) ;
    BufferedImage bi
        = new BufferedImage(1, 1, BufferedImage.TYPE_INT_ARGB);
    m_graphics2D = bi.createGraphics() ;
  }
   
  public double stringWidth(String str)
  {
    FontRenderContext frc = m_graphics2D.getFontRenderContext();
    TextLayout layout = new TextLayout(str, m_font, frc);
    Rectangle2D rect = layout.getBounds() ;
    return rect.getWidth() ;
  }
   
  public double stringHeight(String str)
  {
    FontRenderContext frc = m_graphics2D.getFontRenderContext();
    TextLayout layout = new TextLayout(str, m_font, frc);
    Rectangle2D rect = layout.getBounds() ;
    return rect.getHeight() ;
  }
   
  static private int style(boolean bold, boolean italic)
  {
    int style = Font.PLAIN ;
    if (bold) { style |= Font.BOLD;}
    if (italic) { style |= Font.ITALIC;}
    return style ;
  }
   
  private Font m_font = null ;
  private Graphics2D m_graphics2D = null;
}

Here Java 2’s (JDK 1.3.1) Graphics2D and TextLayout classes provide the information you need. You implemented two public constructors to support simple fonts and fonts that are either bold or italic. Two public methods, stringWidth() and stringHeight() , get dimensional information about how a particular string would be rendered in the font specified by the constructor. This technique is generally accurate on most common fonts, but without precise guarantees, you will have to experiment.

The following stylesheet tests the results:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt" 
xmlns:font="xalan://com.ora.xsltckbk.util.SVGFontMetrics" 
exclude-result-prefixes="font xalan">
   
<xsl:output method="xml"/>
   
<xsl:template match="/">
  <svg width="100%" height="100%">
    <xsl:apply-templates/>
  </svg>
</xsl:template>
   
<xsl:template match="text">
  <xsl:variable name="fontMetrics" 
      select="font:new(@font, @size, boolean(@weight), boolean(@stytle))"/>
  <xsl:variable name="text" select="."/>
  <xsl:variable name="width" select="font:stringWidth($fontMetrics, $text)"/>
  <xsl:variable name="height" select="font:stringHeight($fontMetrics, $text)"/>
  <xsl:variable name="style">
    <xsl:if test="@style">
      <xsl:value-of select="concat('font-style:',@style)"/>
    </xsl:if>
  </xsl:variable>
  <xsl:variable name="weight">
    <xsl:if test="@weight">
      <xsl:value-of select="concat('font-weight:',@weight)"/>
    </xsl:if>
  </xsl:variable>
  <g style="font-family:{@font};font-size:{@size};{$style};{$weight}">
    <!-- Use the SVGFontMetrics info to render a rectangle that is -->
    <!-- slightly bigger than the expected size of the text. -->
    <!-- Adjust the y position based on the previous text size. -->
    <rect x="10" 
          y="{sum(preceding-sibling::text/@size) * 2}pt" 
          width="{$width + 2}" 
          height="{$height + 2}"
          style="fill:none;stroke: black;stroke-width:0.5;"/>
    <!-- Render the text so it is cenetered in the rectangle -->
    <text x="11" 
          y="{sum(preceding-sibling::text/@size) * 2 + @size div 2 + 2}pt">
      <xsl:value-of select="."/>
    </text>
  </g>
        
</xsl:template>
   
</xsl:stylesheet>

Your test run produced pretty good results on some commonly available fonts, as shown in Figure 14-1.

Creating correctly sized text-bounding rectangles with the SVGFontMetrics extension
Figure 14-1. Creating correctly sized text-bounding rectangles with the SVGFontMetrics extension
<TextWidthTest>
  <text font="Serif" size="9">M's are BIG; l's are small;</text>
  <text font="Serif" size="10">SVG makes handling text no fun at all</text>
  <text font="Helvetica" size="12">But if I cheat with a little Java</text>
  <text font="Arial" size="14" weight="bold">PROMISE ME YOU WON'T TELL MY MAMMA!
  </text>
  <text font="Century" size="16" style="italic">But if you do, I won't lose cheer.
  </text>
  <text font="Courier New" size="18" weight="bold" style="italic">Its really my tech editor that I fear!</text>
</TextWidthTest>

Discussion

The examples shown in the “Solution” section work unchanged with either Xalan or Saxon (despite the xml.apache.org/xslt namespace). It works because you used the processors’ shortcut conventions for encoding the Java class in the namespace.

Notice that constructors are accessed using a function with the name new() and that the XSLT processors can figure out which overloaded constructor to call based on the arguments. Member functions of a Java class are called by passing an extra initial argument corresponding to this. The HexConverter example shows that static members are called without the extra this parameter.

The SVGFontMetrics example does not work with older versions of the JDK, but similar results can be obtained if you use the java.awt.FontMetrics class in conjunction with the original java.awt.Graphics class:

package com.ora.xsltckbk.util ;
import java.awt.* ;
import java.awt.geom.* ;
import java.lang.System ;
   
public class FontMetrics
{
  public FontMetrics(String fontName, int size)
  {
    //Any concrete component will do
    Label component = new Label() ;
    m_metrics
      = component.getFontMetrics(
           new Font(fontName, Font.PLAIN, size)) ;
    m_graphics = component.getGraphics() ;
  }
   
  public FontMetrics(String fontName, int size, boolean bold, boolean italic)
  {
    //Any concrete component will do
    Label component = new Label() ;
    m_metrics
      = component.getFontMetrics(
           new Font(fontName, style(bold,italic) , size)) ;
    m_graphics = component.getGraphics() ;
  }
   
  //Simple, but less accurate on some fonts
  public int stringWidth(String str)
  {
    return  m_metrics.stringWidth(str) ;
  }
   
  //Better accuracy on most fonts
  public double stringWidthImproved(String str)
  {
    Rectangle2D rect = m_metrics.getStringBounds(str, m_graphics) ;
    return rect.getWidth() ;
  }
   
  static private int style(boolean bold, boolean italic)
  {
    int style = Font.PLAIN ;
    if (bold) { style |= Font.BOLD;}
    if (italic) { style |= Font.ITALIC;}
    return style ;
  }
   
  private java.awt.FontMetrics m_metrics = null;
  private java.awt.Graphics m_graphics = null ;
}

Although these particular examples may not fulfill your immediate needs, they demonstrate the mechanisms by which you can harness your own Java-based extension functions. Other possibilities, in increasing level of difficulty, include:

  1. Using Java’s Hashtable instead of xsl:key. This allows better control over which elements are indexed and allows the index to be changed during the execution. It overcomes the limitation, whereas xsl:key definitions cannot reference variables. You can also use it to build a master index that spans multiple documents.

  2. Implementing a node-sorting function that can compensate for xsl:sort’s limitations, for example, constructing a sort based on foreign language rules. Doug Tidwell demonstrates this example with Saxon in XSLT (O’Reilly, 2001).

  3. Reads and writes multiple file formats in a single stylesheet. For example, it allows the stylesheet to read text files other than XML, such as CSV or proprietary binary files. XSLT 2.0 provides capabilities in this area with the xsl:result-document and element (see Chapter 6).

  4. Processes compressed XML straight from a zip file using java.util.zip.ZipFile. Studying the source code of your XSLT processor’s document function would be helpful.

See Also

Chapter 9 punted on the problem of laying out text within your generated SVG tree nodes. You could use SVGFontMetrics as an ingredient in the solution.

Although not specifically related to XSLT extensions, developers interested in Java and SVG should check out Batik (http://xml.apache.org/batik/index.html).

14.15. Adding Extension Elements Using Java

Problem

You want to extend the functionality of XSLT by adding elements with custom behavior.

Solution

Prior sections considered how extensions provided by the XSLT implementers could be used to your advantage. This section develops your own extension elements from scratch. Unlike extension functions, creating extension elements requires much more intimacy with a particular processor’s implementation details. Because processor designs vary widely, much of the code will not be portable between processors.

This section begins with a simple extension that provides syntactic sugar rather than extended functionality. A common requirement in XSLT coding is to switch context to another node. Using an xsl:for-each is an idiomatic way of accomplishing this. The process is somewhat confusing because the intent is not to loop but to change context to the single node defined by the xsl:for-each’s select:

<xsl:for-each select="document('new.xml')">
     <!-- Process new document -->
</xsl:for-each>

You will implement an extension element called xslx:set-context, which acts exactly like xsl:for-each, but only on the first node of the node set defined by the select (normally, you have only one node anyway).

Saxon requires an implementation of the com.icl.saxon.style.ExtensionElementFactory interface for all extension elements associated with a particular namespace. The factory is responsible for creating the extension elements from the element’s local name. The second extension, named templtext, is covered later:

package com.ora.xsltckbk;
import com.icl.saxon.style.ExtensionElementFactory;
import org.xml.sax.SAXException;
   
public class CkBkElementFactory implements ExtensionElementFactory {
   
    public Class getExtensionClass(String localname)  {
        if (localname.equals("set-context")) return CkBkSetContext.class;
        if (localname.equals("templtext")) return CkBkTemplText.class;
        return null;
    }
   
}

When using a stylesheet extension, you must use a namespace that ends in a /, followed by the factory’s fully qualified name. The namespace prefix must also appear in the xsl:stylesheet’s extension-element-prefixes attribute:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xslx="http://com.ora.xsltckbk.CkBkElementFactory" 
 extension-element-prefixes="xslx">
   
<xsl:template match="/">
  <xslx:set-context select="foo/bar">
    <xsl:value-of select="."/>
  </xslx:set-context>
</xsl:template>
   
</xsl:stylesheet>

The set-context element implementation derives from com.icl.saxon.style.StyleElement and must implement prepareAttributes() and process(), but it will usually implement the others shown in Table 14-3.

Table 14-3. Important Saxon StyleElement methods

Method

Effect

isInstruction()

Extensions always return true.

mayContainTemplateBody()

Returns true if this element can contain child elements. Often returns true to allow an xsl:fallback child.

prepareAttributes()

Called at compile time to allow the class to parse information contained in the extensions attributes. It is also the time to do local validation.

validate()

Called at compile time after all stylesheet elements have done local validation. It allows cross validation between this element and its parents or children.

process(Context context)

Called at runtime to execute the extension. This method can access or modify information in the context, but must not modify the stylesheet tree.

The xslx:set-context element was easy to implement because the code was stolen from Saxon’s XSLForEach implementation and modified to do what XSLForEach does, but only once:

public class CkBkSetContext extends com.icl.saxon.style.StyleElement {
   
    Expression select = null;
   
    public boolean isInstruction() {
        return true;
    }
   
    public boolean mayContainTemplateBody() {
        return true;
    }

Here you make sure @select is present. If it is, call makeExpression, which parses it into an XPath expression:

    public void prepareAttributes() 
                      throws TransformerConfigurationException {
   
          StandardNames sn = getStandardNames();
          AttributeCollection atts = getAttributeList();
   
          String selectAtt = null;
   
          for (int a=0; a<atts.getLength(); a++) {
               int nc = atts.getNameCode(a);
               int f = nc & 0xfffff;
               if (f == sn.SELECT) {
                  selectAtt = atts.getValue(a);
             } else {
                  checkUnknownAttribute(nc);
             }
        }
   
        if (selectAtt=  =null) {
            reportAbsence("select");
        } else {
            select = makeExpression(selectAtt);
        }
    }
   
    public void validate() throws TransformerConfigurationException {
        checkWithinTemplate();
    }

This code is identical to Saxon’s for-each, except instead of looping selection.hasMoreElements, it simply checks once, extracts the element, sets the context and current node, processes children, and returns the result to the context:

    public void process(Context context) throws TransformerException
    {
        NodeEnumeration selection = select.enumerate(context, false);
        if (!(selection instanceof LastPositionFinder)) {
            selection = new LookaheadEnumerator(selection);
        }
   
        Context c = context.newContext();
        c.setLastPositionFinder((LastPositionFinder)selection);
        int position = 1;
   
          if (selection.hasMoreElements()) {
              NodeInfo node = selection.nextElement();
              c.setPosition(position++);
              c.setCurrentNode(node);
              c.setContextNode(node);
              processChildren(c);
              context.setReturnValue(c.getReturnValue());
          }
    }
}

The next example extension is not as simple because it extends XSLT’s capabilities rather than creating an alternate implementation for existing functionality.

You can see that because a whole chapter of this book is dedicated to code generation, the task interests me. However, although XSLT is near optimal in its XML manipulation capabilities, it lacks output capabilities due to the XML’s verbosity. Consider a simple C++ code generation task in native XSLT:

<classes>
  <class>
    <name>MyClass1</name>
  </class>
   
  <class>
    <name>MyClass2</name>
  </class>
   
  <class>
    <name>MyClass3</name>
    <bases>
      <base>MyClass1</base>
      <base>MyClass2</base>
    </bases>
  </class>
  
</classes>

A stylesheet that transforms this XML into C++ might look like this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:output method="text"/>
   
<xsl:template match="class">
class <xsl:value-of select="name"/> <xsl:apply-templates select="bases"/>
{
public:
   
  <xsl:value-of select="name"/>() ;
  ~<xsl:value-of select="name"/>() ;
  <xsl:value-of select="name"/>(const <xsl:value-of select="name"/>&amp; other) ;
  <xsl:value-of select="name"/>&amp; operator =(const <xsl:value-of select="name"/>
&amp; other) ;
} ;
</xsl:template>     
   
<xsl:template match="bases">
<xsl:text>: public </xsl:text>
<xsl:for-each select="base">
  <xsl:value-of select="."/>
  <xsl:if test="position() != last()">
    <xsl:text>, public </xsl:text>
  </xsl:if>
</xsl:for-each>
</xsl:template>
   
<xsl:template match="text()"/>
   
</xsl:stylesheet>

This code is tedious to write and difficult to read because the C++ is lost in a rat’s nest of markup.

The extension xslx:templtext addresses this problem by creating an alternate implementation of xsl:text that can contain special escapes and indicate special processing. An escape is indicated by surrounding backslashes () and comes in two forms. An obvious alternative would use { and } to mimic attribute value templates and XQuery; however, because you use these common characters in code generators, I opted for the backslashes.

Escape

Equivalent XSLT

expression

<xsl:value-of select="expression"/>

expression%delimit[3]

<xsl:for-each select="expression">  
<xsl:value-of select="."/> 
<xsl:if test="position() !=       last()>     
<xsl:value-of select="delimit"/> 
</xsl:if> </xsl:for-each>

[3] XSLT 2.0 will provide this functionality via <xsl:value-of select="expression" separator="delimit" />.

Given this facility, your code generator would look as follows:

<xsl:stylesheet 
 version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xslx="http://com.ora.xsltckbk.CkBkElementFactory" 
 extension-element-prefixes="xslx">
   
<xsl:output method="text"/>
   
<xsl:template match="class">
<xslx:templtext>
class 
ame <xsl:apply-templates select="bases"/> 
{
public:
   
  
ame() ;
  ~
ame() ;
  
ame(const 
ame&amp; other) ;
  
ame&amp; operator =(const 
ame&amp; other) ;
} ;
</xslx:templtext>
</xsl:template>     
   
<xsl:template match="bases">
<xslx:templtext>: public ase%', public '</xslx:templtext>
</xsl:template>
   
<xsl:template match="text()"/>
   
</xsl:stylesheet>

This code is substantially easier to read and write. This facility is applicable to any context where a lot of boilerplate text will be generated. An XSLT purist may frown on such an extension because it introduces a foreign syntax into XSLT that is not subject to simple XML manipulation. This argument is valid; however, from a practical standpoint, many developers would reject XSLT (in favor of Perl) for boilerplate generation simply because it lacks a concise and unobtrusive syntax for getting the job done. So enough hemming and hawing; let’s just code it:

package com.ora.xsltckbk;
import java.util.Vector ;
import java.util.Enumeration ;
import com.icl.saxon.tree.AttributeCollection;
import com.icl.saxon.*;
import com.icl.saxon.expr.*;
import javax.xml.transform.*;
import com.icl.saxon.output.*;
import com.icl.saxon.trace.TraceListener;
import com.icl.saxon.om.NodeInfo;
import com.icl.saxon.om.NodeEnumeration;
import com.icl.saxon.style.StyleElement;
import com.icl.saxon.style.StandardNames;
import com.icl.saxon.tree.AttributeCollection;
import com.icl.saxon.tree.NodeImpl;

Your extension class first declares constants that will be used in a simple state machine that parses the escapes:

public class CkBkTemplText extends com.icl.saxon.style.StyleElement
{
  private static final int SCANNING_STATE = 0 ;
  private static final int FOUND1_STATE   = 1 ;
  private static final int EXPR_STATE     = 2 ;
  private static final int FOUND2_STATE   = 3 ;
  private static final int DELIMIT_STATE  = 4 ;
...

Then define four private classes that implement the mini-language contained within the xslx:templtext element. The base class, CkBkTemplParam , captures literal text that may come before an escape:

  private class CkBkTemplParam
  {
    public CkBkTemplParam(String prefix)
    {
      m_prefix = prefix ;
    }
   
    public void process(Context context) throws TransformerException
    {
      if (!m_prefix.equals(""))
      {
          Outputter out = context.getOutputter();
          out.setEscaping(false);
          out.writeContent(m_prefix);
          out.setEscaping(true);
      }
    }
   
    protected String m_prefix ;
  }

The CkBkValueTemplParam class derives from CkBkTemplParam and implements the behavior of a simple value-of escape expr. To simplify the implementation in this example, the disabled output escaping will be the norm inside a xslx:templtext element:

  private class CkBkValueTemplParam extends CkBkTemplParam
  {
    public CkBkValueTemplParam(String prefix, Expression value)
    {
      super(prefix) ;
      m_value = value ;
    }
   
    public void process(Context context) throws TransformerException
    {
      super.process(context) ;
      Outputter out = context.getOutputter();
      out.setEscaping(false);
      if (m_value != null)
      {
          m_value.outputStringValue(out, context);
      }
      out.setEscaping(true);
    }
   
    private Expression m_value ;
   
  }

The CkBkTemplParam class implements the of expr%delimit behavior, largely by mimicking the behavior of a Saxon XslForEach class:

  private class CkBkListTemplParam extends CkBkTemplParam
  {
    public CkBkListTemplParam(String prefix, Expression list,
                              Expression delimit)
    {
      super(prefix) ;
      m_list = list ;
      m_delimit = delimit ;
    }
   
    public void process(Context context) throws TransformerException
    {
      super.process(context) ;
      if (m_list != null)
      {
        NodeEnumeration m_listEnum = m_list.enumerate(context, false);
   
        Outputter out = context.getOutputter();
        out.setEscaping(false);
        while(m_listEnum.hasMoreElements())
        {
          NodeInfo node = m_listEnum.nextElement();
          if (node != null)
          {
            node.copyStringValue(out);
          }
          if (m_listEnum.hasMoreElements() && m_delimit != null)
          {
            m_delimit.outputStringValue(out, context);
          }
        }
        out.setEscaping(true);
      }
    }
   
    private Expression m_list = null;
    private Expression m_delimit = null ;
  }

The last private class is CkBkStyleTemplParam, and it is used as a holder of elements nested within the xslx:templtext, for example, xsl:apply-templates:

  private class CkBkStyleTemplParam extends CkBkTemplParam
  {
    public CkBkStyleTemplParam(StyleElement snode)
    {
      m_snode = snode ;
    }
   
    public void process(Context context) throws TransformerException
    {
       if (m_snode.validationError != null)
      {
              fallbackProcessing(m_snode, context);
       }
      else
      {
           try
        {
           context.setStaticContext(m_snode.staticContext);
           m_snode.process(context);
         }
        catch (TransformerException err)
        {
           throw snode.styleError(err);
         }
      }
    }
  }

The next three methods are standard. If you allow the standard disable-output-escaping attribute to control output escaping, you would capture its value in prepareAttributes(). The Saxon XslText.java source provides the necessary code:

  public boolean isInstruction()
  {
      return true;
  }
   
  public boolean mayContainTemplateBody()
  {
    return true;
  }
   
  public void prepareAttributes() throws TransformerConfigurationException
  {
    StandardNames sn = getStandardNames();
     AttributeCollection atts = getAttributeList();
     for (int a=0; a<atts.getLength(); a++)
    {
       int nc = atts.getNameCode(a);
      checkUnknownAttribute(nc);
    }  
   }

The validate stage is an opportunity to parse the contents of the xslx:templtext element, looking for escapes. You send every text node to a parser function. Element style content is converted into instances CkBkStyleTemplParam. The member m_TemplParms is a vector where the results of parsing are stored:

  public void validate() throws TransformerConfigurationException
  {
      checkWithinTemplate();
      m_TemplParms = new Vector() ;
   
      NodeImpl node = (NodeImpl)getFirstChild();
      String value ;
      while (node!=null)
      {
        if (node.getNodeType() =  = NodeInfo.TEXT)
        {
          parseTemplText(node.getStringValue()) ;
        }
        else
        if (node instanceof StyleElement)
        {
           StyleElement snode = (StyleElement) node;
          m_TemplParms.addElement(new CkBkStyleTemplParam(snode)) ;
        }
        node = (NodeImpl)node.getNextSibling();
      }
  }

The process method loops over m_TemplParms and calls each implementation’s process method:

  public void process(Context context) throws TransformerException
  {
    Enumeration iter = m_TemplParms.elements() ;
    while (iter.hasMoreElements())
    {
       CkBkTemplParam param = (CkBkTemplParam) iter.nextElement() ;
       param.process(context) ;
    }
  }

The following private functions implement a simple state-machine-driven parser that would be easier to implement if you had access to a regular-expression engine (which is actually available to Java Version 1.4.1). The parser handles two consecutive backslashes (\) as a request for a literal backslash. Likewise, %% is translated into a single %:

  private void parseTemplText(String value)
  {
      //This state machine parses the text looking for parameters
      int ii = 0 ;
      int len = value.length() ;
   
      int state = SCANNING_STATE ;
      StringBuffer temp = new StringBuffer("") ;
      StringBuffer expr = new StringBuffer("") ;
      while(ii < len)
      {
        char c = value.charAt(ii++) ;
        switch (state)
        {
          case SCANNING_STATE:
          {
            if (c == '')
            {
              state = FOUND1_STATE ;
            }
            else
            {
              temp.append(c) ;
            }
          }
          break ;
   
          case FOUND1_STATE:
          {
            if (c == '')
            {
              temp.append(c) ;
              state = SCANNING_STATE ;
            }
            else
            {
              expr.append(c) ;
              state = EXPR_STATE ;
            }
          }
          break ;
   
          case EXPR_STATE:
          {
            if (c == '')
            {
              state = FOUND2_STATE ;
            }
            else
            {
              expr.append(c) ;
            }
          }
          break ;
   
          case FOUND2_STATE:
          {
            if (c =  = '')
            {
              state = EXPR_STATE ;
              expr.append(c) ;
            }
            else
            {
              processParam(temp, expr) ;
              state = SCANNING_STATE ;
              temp = new StringBuffer("") ;
                    temp.append(c) ;
              expr = new StringBuffer("") ;
            }
          }
          break ;
        }
          }
      if (state == FOUND1_STATE || state == EXPR_STATE)
      {
          compileError("xslx:templtext dangling \");
      }
      else
      if (state == FOUND2_STATE)
      {
        processParam(temp, expr) ;
      }
      else
      {
        processParam(temp, new StringBuffer("")) ;
      }
  }
   
  private void processParam(StringBuffer prefix, StringBuffer expr)
  {
    if (expr.length() == 0)
    {
      m_TemplParms.addElement(new CkBkTemplParam(new String(prefix))) ;
    }
    else
    {
      processParamExpr(prefix, expr) ;
    }
  }
   
  private void processParamExpr(StringBuffer prefix, StringBuffer expr)
  {
      int ii = 0 ;
      int len = expr.length() ;
   
      int state = SCANNING_STATE ;
      StringBuffer list = new StringBuffer("") ;
      StringBuffer delimit = new StringBuffer("") ;
      while(ii < len)
      {
        char c = expr.charAt(ii++) ;
        switch (state)
        {
          case SCANNING_STATE:
          {
            if (c == '%')
            {
              state = FOUND1_STATE ;
            }
            else
            {
              list.append(c) ;
            }
          }
          break ;
   
          case FOUND1_STATE:
          {
            if (c == '%')
            {
              list.append(c) ;
              state = SCANNING_STATE ;
            }
            else
            {
              delimit.append(c) ;
              state = DELIMIT_STATE ;
            }
          }
          break ;
   
          case DELIMIT_STATE:
          {
            if (c == '%')
            {
              state = FOUND2_STATE ;
            }
            else
            {
              delimit.append(c) ;
            }
          }
          break ;
        }
      }
      try
      {
        if (state =  = FOUND1_STATE)
        {
            compileError("xslx:templtext trailing %");
        }
        else
        if (state == FOUND2_STATE)
        {
            compileError("xslx:templtext extra %");
        }
        else
        if (state =  = SCANNING_STATE)
        {
          String prefixStr = new String(prefix) ;
          Expression value = makeExpression(new String(list)) ;
          m_TemplParms.addElement(
                 new CkBkValueTemplParam(prefixStr, value)) ;
        }
        else
        {
          String prefixStr = new String(prefix) ;
          Expression listExpr = makeExpression(new String(list)) ;
          Expression delimitExpr = makeExpression(new String(delimit)) ;
          m_TemplParms.addElement(
            new CkBkListTemplParam(prefixStr, listExpr, delimitExpr)) ;
        }
      }
      catch(Exception e)
      {
      }
  }
  //A vector of CBkTemplParms parse form text
  private Vector m_TemplParms = null;
 }

You can make some useful enhancements to the functionality of xslx:templtext. For example, you could expand the functionality of the list escape to multiple lists (e.g., /expr1%delim1%expr2%delim2/.). This enhancement would roughly translate into the following XSLT equivalent:

<xsl:for-each select="expr1">
  <xsl:variable name="pos" select="position()"/>
  <xsl:value-of select="."/>
  <xsl:if test="$pos != last()">
    <xsl:value-of select="delim1"/>
  </xsl:if>
  <xsl:value-of select="expr2[$pos]"/>
  <xsl:if test="$pos != last()">
    <xsl:value-of select="delim2"/>
  </xsl:if>
</xsl:for-each >

This facility would be useful when pairs of lists need to be sequenced into text. For example, consider a C++ function’s parameters, which consist of name and type pairs. The XSLT code is only a rough specification of semantics because it assumes that the node sets specified by expr1 and expr2 have the same number of elements. I believe that an actual implementation would continue to expand the lists as long as any set still had nodes, suppressing delimiters for those that did not. Better yet, the behavior could be controlled by attributes of xslx:templtext.

Discussion

Space does not permit full implementations of these extension elements in Xalan. However, based on the information provided in the introduction, the path should be relatively clear.

See Also

Developers interested in extending Saxon should read Michael Kay’s article on Saxon design (http://www-106.ibm.com/developerworks/library/x-xslt2).

14.16. Using XSLT from Perl

Problem

You have a problem that is more appropriately solved in Perl, but would be easier with a pinch of XSLT.

Solution

There are several choices for embedding XSLT in Perl. XML::LibXSLT and XML::LibXML are Perl front ends to the functionality of GNOME library’s SAX and XSLT processors. The following example, borrowed from Erik T. Ray’s and Jason McIntosh’s Perl and XML (O’Reilly, 2002), shows a Perl program that batch-processes several XML files with a single XSLT script, compiled once:

use XML::LibXSLT;
use XML::LibXML;
   
# the arguments for this command are stylesheet and source files
my( $style_file, @source_files ) = @ARGV;
   
# initialize the parser and XSLT processor
my $parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $stylesheet = $xslt->parse_stylesheet_file( $style_file );
   
# for each source file: parse, transform, print out result
foreach my $file ( @source_files ) {
  my $source_doc = $parser->parse_file( $source_file );
  my $result = $stylesheet->transform( $source_doc );
  print $stylesheet->output_string( $result );
}

Parameters to the stylesheet can be passed in as a Perl hash, as shown in the following code:

               #Similar code from previous example has been elided.
   
my %params = {
               param1 => 10,
    param2 => 'foo',
} ;
   
foreach my $file ( @source_files ) {
  my $source_doc = $parser->parse_file( $file );
  my $result = $stylesheet->transform($source_doc, %params);
  print $stylesheet->output_string( $result );
}

Passing parameters to from Perl to the stylesheet would enable, among other things, a Perl-based CGI program that received input from an HTML form and queried an XML database using XSLT. See Recipe 13.5, where we cheated by forking the XSLT processor rather than embedding.

XML::Xalan is another Perl XSLT module that allows Perl to invoke Xalan’s processor. Edwin Pratomo, the author of this module, still considers it alpha-level software.

Using XML::Xalan through external files is your simplest option:

use XML::Xalan;
   
  #Construct the transformer
  my $tr = new XML::Xalan::Transformer;
   
  #Compile the stylesheet
  my $compiled = $tr->compile_stylesheet_file("my.xsl");
  
  #Parse the input source document
  my $parsed = $tr->parse_file("my.xml");
   
  my $dest_file = "myresult.xml" ;
   
  #Execute the transformation saving the result
  $tr->transform_to_file($parsed, $compiled, $dest_file)
    or die $tr->errstr;

A more useful mode of usage returns the result into a variable for further processing:

my $res = $tr->transform_to_data($parsed, $compiled);

You do not need to preparse the input or precompile the stylesheet, since either can be passed as files or literal strings:

my $res = $tr->transform_to_data($src_file, $xsl_file);

This returns the literal result as a string, so this usage probably makes most sense when the output format is text that you want to post-process in Perl.

Alternatively, you can receive the results in an event-driven manner:

#Create a handler sub
$out_handler = sub {
     my ($ctx, $mesg) = @_;
     print $ctx $mesg;
 };
#Invoke the transformation using the handler
$tr->transform_to_handler(
     $xmlfile, $xslfile, 
     *STDERR, $out_handler);

Discussion

Many Perl developers have not fully embraced XSLT because once you master Perl, it is difficult to do something in anything but the Perl way. To be fair, most Perl developers realize that other languages have their place, and XSLT certainly can simplify a complex XML transformation even if most of the overall program remains purely Perl.

See Also

Other Perl XSLT solutions include T. J. Mather’s XML::GNOME::XSLT, which is a Perl front end to libXSLT, a C-based XSLT processor from GNOME. You can also use the native Perl XSLT implementation XML::XSLT by Jonathan Stowe. Currently, it does not implement many of XSLT 1.0’s more advanced features, including xsl:sort, xsl:key, and xsl:import, and it has only partial support in several other areas. A third option is Pavel Hlavnicka’s XML::Saboltron, which is a Perl front end to the Ginger Alliance’s C++-based XSLT offering. Information on these modules can be found at http://www.cpan.org.

Another solution that mixes Perl with XSLT is Apache’s AxKit (http://axkit.org), an XML Application server for Apache. AxKit uses a pipelining processing model that allows processing of content in stages. It uses the Sablotron processor for XSLT functionality.

14.17. Using XSLT from Java

Problem

You want to invoke XSLT processing from within a Java application.

Solution

You can invoke XSLT functionality from Java in three basic ways:

  • Using the native interface of your favorite Java-based XSLT implementation

  • Using the more portable TrAX API

  • Using JAXP 1.2 or 1.3 (a superset of TrAX; see http://java.sun.com/xml/jaxp/index.jsp)

If you are familiar with the internals of a specific Java-based XSLT implementation, you might be tempted to use its API directly. However, this solution is not desirable, since your code will not be portable.

An alternative is Transformation API for XML (TrAX), an initiative initially sponsored by Apache.org (http://xml.apache.org/xalan-j/trax.html). The philosophy behind TrAX is best explained by quoting the TrAX site:

The Java community will greatly benefit from a common API that will allow them to understand and apply a single model, write to consistent interfaces, and apply the transformations polymorphically. TrAX attempts to define a model that is clean and generic, yet fills general application requirements across a wide variety of uses.

TrAX was subsumed into Java’s JAXP 1.1 (and more recently 1.2 and 1.3 for J2SE 5.0) specification, so there are now only two ways to interface Java to XSLT: portably and nonportably. However, the choice is not simply a question of right and wrong. Each processor implementation has special features that are sometimes needed, and if portability is not a concern, you can take advantage of a particular facility that you require. Nevertheless, this section covers only the portable JAXP 1.2 API.

You can implement a simple XSLT command-line processor in terms of JAXP 1.1, as shown in an example borrowed from Eric M. Burke’s Java and XSLT (O’Reilly, 2001):

public class Transform
{
   
  public static void main(String[  ] args) throws Exception
  {
    if (args.length != 2)
    {
      System.err.println(
        "Usage: java Transform [xmlfile] [xsltfile]");
      System.exit(1);
    }
   
    //Open the source and style sheet files
    File xmlFile = new File(args[0]);
    File xsltFile = new File(args[1]);
   
    //JAXP uses a Source interface to read data
    Source xmlSource = new StreamSource(xmlFile);
    Source xsltSource = new StreamSource(xsltFile);
   
    //Factory classes allow the specific XSLT processor
    //to be hidden from the application by returning a
    //standard Transformer interface
    TransformerFactory transFact =
      TransformerFactory.newInstance();
    Transformer trans = transFact.newTransformer(xsltSource);
   
    //Applies the stylesheet to the source document
    trans.transform(xmlSource, new StreamResult(System.out));
 }
}

In addition to a StreamResult, a DOMResult can capture the result as a DOM tree for further processing, or a SAXResult can be specified to receive the results in an event-driven manner.

In the case of DOM, the user can obtain the result as a DOM Document, DocumentFragment or Element, depending on the type of node passed in the DOMResult constructor.

In the case of SAXResult, a user-specified ContentHandler is passed to the SaxResult constructor and is the object that actually receives the SAX events. Recall that a SAX content handler receives callbacks for events such as startDocument(), startElement(), characters( ), endElement(), and endDocument(). See http://www.saxproject.org/ for more information on SAX.

Discussion

The beauty of accessing XSLT transformation capabilities from Java is not that you can write your own XSLT processor front end, as you did in the solution section, but that you can extend the already formidable capabilities of Java to include XSLT’s transformational abilities.

Consider a server process written in Java that must deal with constantly changing XML files stored in an XML database or XML arriving in the form of SOAP messages. Perhaps this server needs to support multiple versions of document schema or multiple SOAP clients for backward compatibility. Thus the server must handle several schemas transparently. If data in an older schema can be transformed to newer ones, then the server code will be that much simpler.

The nice thing about using XSLT via the JAXP interface is that instances of transformers can be reused so you need to parse the stylesheet only once, when the server loads. However, if your server is multithreaded and each thread must handle transformations, different instances will be required per thread to ensure thread safety.

See Also

Eric M. Burke’s Java and XSLT (O’Reilly, 2001) contains extensive coverage of Java and XSLT integration, especially via JAXP 1.1. It includes several complete application examples, such as Discussion Forum and Wireless Markup Language (WML) applications.



[1] Shame on me for suggesting such a thing! Seriously, though, when you leave the confines of XSLT, you need not play by its rules, but you must accept the consequences.

[2] In fact, before writing this book, I could count the number of lines of JavaScript I had written on two hands and a few toes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset