Arithmetic is being able to count up to 20 without taking off your shoes.
Mickey Mouse
Chapter 2 lamented the absence of sophisticated string-processing facilities in native XSLT 1.0. By comparison, XSLT’s handling of numerical computation is truly “Mickey Mouse”! XSLT 1.0 gives you facilities for basic arithmetic, counting, summing, and formatting numbers, but the remaining mathematics is up to your sheer wit. Fortunately, as with strings, XSLT’s recursive powers permit reasonable mathematical feats with reasonable effort.
In XPath/XSLT 2.0, some of the more obvious math functions have been
added including abs()
, avg()
,
max()
, min()
,
round-to-half-even()
. Further, there are now more
numerical types (xs:integer
,
xs:double
, and the other numeric types defined in
XML Schema datatypes) where XSLT 1.0 only had a single floating point
numeric type. Thankfully, XPath 2.0 now recognizes scientific
notation, whose absence was often a major inconvenience in 1.0.
However, whether you are using 1.0 or 2.0, do not expect to find matrix multiplication or Fast-Fourier transform recipes in this section. If you really need to perform sophisticated math on XML-encoded data, then XSLT is not the language for you. Instead, bring the data into a more mathematically inclined language using an XSLT front-end converter or native SAX or DOM interface. Nevertheless, a web page called “Gallery of Stupid XSL and XSLT Tricks” (http://www.incrementaldevelopment.com/xsltrick/) contains some interesting mathematical XSLT curiosities such as computing primes and differentiating polynomials. These tricks can be instructive because they might extend your understanding of XSLT. Instead, this chapter concentrates on recipes that demonstrate commonly used mathematics that can be implemented economically within the confines of XSLT.
Some of this section’s early examples read more like tutorials on how to use native functionality in XSLT. I include these examples because they represent XSLT facilities that are sometimes misunderstood.
Many of the recipes shown here are implementations of
EXSLT’s math definitions. When a pure XSLT
implementation is available at EXSLT.org, we will discuss that first
and then consider alternative solutions.[1] Pure XSLT
implementations are provided for all extensions defined in EXSLT math
except for trigonometric functions (sin
,
cos
, etc.). If you desperately need a pure XSLT
implementation of trigonometric functions, then Recipe 3.5 will point you in one general direction.
Many discussion sections in this chapter will explore alternative implementations of the solution. Readers uninterested in technical details are encouraged simply to use the example shown in the solution section since it will be the best solution or as good as the alternatives.
This problem has two general solutions.
The top-level element xsl:decimal-format
establishes
a named formatting rule that can be referenced by the
format-number()
function
whenever that format rule
is required. xsl:decimal-format
has a rich set of
attributes that describe the formatting rules. Table 3-1 explains each attribute and shows its default
value in parentheses.
Attribute |
Purpose |
Name |
An optional name for the rule. If absent, this rule becomes the default rule. There can be only one default, and all names must be unique (even when there is a difference in import precedence). |
decimal-separator (.) |
The character used to separate the whole and fractional parts of a number. |
grouping-separator (,) |
The character used to separate groups of digits. |
infinity (Infinity) |
The string that represents infinity. |
minus-sign (-) |
The character used as a minus sign. |
NaN (NaN) |
The string representing the numerical value “not a number.” |
percent (%) |
The percent sign. |
per-mille (‰) |
The per-mille sign. |
zero-digit (0) |
The character used in a formatting pattern to indicate where a leading or trailing zero should be placed. Setting this character resets the origin of the numbering system. See Example 3-1. |
digit (#) |
The character used in a formatting pattern to indicate where a digit will appear, provided it is significant. |
pattern-separator (;) |
The character used in a formatting pattern to separate the positive subpattern from the negative subpattern. |
The format-number()
function takes the arguments
shown in Table 3-2.
The most common use of
xsl:number
is to number nodes sequentially. However, it can also format numbers.
When used to perform the later, the relevant attributes are shown in
Table 3-3.
Name |
Purpose |
Value |
The number to be formatted. |
Format |
A format string (described in the discussion). |
Lang |
A language code as defined by the |
letter-value |
Must be either |
grouping-separator |
A single character used to separate groups. For example, in the U.S., the comma (,) is standard. |
grouping-size |
The number of digits in each group. |
Table 3-4 shows how formatting tokens are used with a format attribute.
Format token |
Example value |
Resulting output |
1 |
1 |
1 |
1 |
99 |
99 |
01 |
1 |
01 |
001 |
1 |
001 |
a |
1 |
A |
a |
10 |
J |
a |
27 |
Aa |
A |
1 |
A |
A |
27 |
AA |
i |
1 |
I |
i |
10 |
X |
I |
1 |
I |
I |
11 |
XI |
The format string is an alternating sequence of format and punctuation tokens. Using multiple format tokens makes sense only when the value contains a set of numbers.
Given the formatting machinery defined earlier, almost any numeric-formatting task can be handled.
Here we can take advantage of leading and trailing zero padding and then map the leading zeros to spaces and use a trailing minus sign. This solution gives a nice columnar, right-justified output when the final display medium uses a fixed-width font. Example 3-1 through Example 3-3 show more conventional ways of padding. The examples illustrate the behavior of the 0 digit when used as a format character.
<numbers> <number>10</number> <number>3.5</number> <number>4.44</number> <number>77.7777</number> <number>-8</number> <number>1</number> <number>444</number> <number>1.1234</number> <number>7.77</number> <number>3.1415927</number> <number>10</number> <number>9</number> <number>8</number> <number>7</number> <number>666</number> <number>5555</number> <number>-4444444</number> <number>22.33</number> <number>18</number> <number>36.54</number> <number>43</number> <number>99999</number> <number>999999</number> <number>9999999</number> <number>32</number> <number>64</number> <number>-64.0001</number> </numbers>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" /> <xsl:variable name="numCols" select="3"/> <xsl:template match="numbers"> <xsl:for-each select="number[position() mod $numCols = 1]"> <xsl:apply-templates select=". | following-sibling::number[position() < $numCols]" mode="format"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template> <xsl:template match="number" name="format" mode="format"> <xsl:param name="number" select="." /> <xsl:call-template name="leading-zero-to-space"> <xsl:with-param name="input" select="format-number($number, '0000000.0000 ;0000000.0000- ')"/> </xsl:call-template> </xsl:template> <xsl:template name="leading-zero-to-space"> <xsl:param name="input"/> <xsl:choose> <xsl:when test="starts-with($input,'0')"> <xsl:value-of select="' '"/> <xsl:call-template name="leading-zero-to-space"> <xsl:with-param name="input" select="substring-after($input,'0')"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$input"/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
Example 3-4 gives a variation of the previous format template that will make your accountant happy.
<xsl:template match="number" name="format" mode="format">
<xsl:param name="number" select="." />
<xsl:text> $ </xsl:text>
<xsl:call-template name="leading-zero-to-space">
<xsl:with-param name="input"
select="format-number($number,
' 0000000.00 ;(0000000.00)')"/>
</xsl:call-template>
</xsl:template>
Output:
$ 10.00 $ 3.50 $ 4.44
$ 77.78 $ (8.00) $ 1.00
$ 444.00 $ 1.12 $ 7.77
$ 3.14 $ 10.00 $ 9.00
$ 8.00 $ 7.00 $ 666.00
$ 5555.00 $ (4444444.00) $ 22.33
$ 18.00 $ 36.54 $ 43.00
$ 99999.00 $ 999999.00 $ 9999999.00
$ 32.00 $ 64.00 $ (64.00)
Example 3-5 demonstrates the use of a named format.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://www.ora.com/XSLTCookbook/namespaces/strings">
<xsl:output method="text" />
<!-- From Recipe 2.5 ... -->
<xsl:include href="../strings/str.dup.xslt"/>
<xsl:variable name="numCols" select="3"/>
<xsl:decimal-format name="WesternEurope"
decimal-separator="," grouping-separator="."/>
<xsl:template match="numbers">
<xsl:for-each select="number[position() mod $numCols = 1]">
<xsl:apply-templates
select=". | following-sibling::number[position() < $numCols]"
mode="format"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template match="number" name="format" mode="format">
<xsl:param name="number" select="." />
<xsl:call-template name="pad">
<xsl:with-param name="string"
select="format-number($number,'#.###,00','WesternEurope')"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="pad">
<xsl:param name="string"/>
<xsl:param name="width" select="16"/>
<xsl:call-template name="str:dup">
<xsl:with-param name="input" select="' '"/>
<xsl:with-param name="count" select="$width - string-length($string)"/>
</xsl:call-template>
<xsl:value-of select="$string"/>
</xsl:template>
</xsl:stylesheet>
Output:
10,00 3,50 4,44
77,78 -8,00 1,00
444,00 1,12 7,77
3,14 10,00 9,00
8,00 7,00 666,00
5.555,00 -4.444.444,00 22,33
18,00 36,54 43,00
99.999,00 999.999,00 9.999.999,00
32,00 64,00 -64,00
Example 3-6 uses xsl:number
as a Roman
numeral formatter to label columns as rows:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://www.ora.com/XSLTCookbook/namespaces/strings">
<xsl:output method="text" />
<xsl:include href="../strings/str.dup.xslt"/>
<xsl:variable name="numCols" select="3"/>
<xsl:template match="numbers">
<xsl:for-each select="number[position() <= $numCols]">
<xsl:text> </xsl:text>
<xsl:number value="position()" format="I"/><xsl:text> </xsl:text>
</xsl:for-each>
<xsl:text>
 </xsl:text>
<xsl:for-each select="number[position() <= $numCols]">
<xsl:text>---------------- </xsl:text>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:for-each select="number[position() mod $numCols = 1]">
<xsl:call-template name="pad">
<xsl:with-param name="string">
<xsl:number value="position()" format="i"/>
</xsl:with-param>
<xsl:with-param name="width" select="4"/>
</xsl:call-template>|<xsl:text/> <!-- See recipe 7.1-->
<xsl:apply-templates
select=". | following-sibling::number[position() < $numCols]"
mode="format"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template match="number" name="format" mode="format">
<xsl:param name="number" select="." />
<xsl:call-template name="pad">
<xsl:with-param name="string" select="format-number(.,'#,###.00')"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="pad">
<xsl:param name="string"/>
<xsl:param name="width" select="16"/>
<xsl:call-template name="str:dup">
<xsl:with-param name="input" select="' '"/>
<xsl:with-param name="count" select="$width - string-length($string)"/>
</xsl:call-template>
<xsl:value-of select="$string"/>
</xsl:template>
</xsl:stylesheet>
Output:
I II III
---------------- -------------- --------------
i| 10.00 3.50 4.44
ii| 77.78 -8.00 1.00
iii| 444.00 1.12 7.77
iv| 3.14 10.00 9.00
v| 8.00 7.00 666.00
vi| 5,555.00 -4,444,444.00 22.33
vii| 18.00 36.54 43.00
viii| 99,999.00 999,999.00 9,999,999.00
ix| 32.00 64.00 -64.00
Spreadsheets number columns in the
alpha sequence A, B, C ... ZZ, and we can do the same using
xsl:number
(see Example 3-7).
<xsl:template match="numbers">
<xsl:for-each select="number[position() <= $numCols]">
<xsl:text> </xsl:text>
<xsl:number value="position()" format="A"/><xsl:text> </xsl:text>
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:for-each select="number[position() <= $numCols]">
<xsl:text> ---------------- </xsl:text>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:for-each select="number[position() mod $numCols = 1]">
<xsl:value-of select="position()"/><xsl:text>|</xsl:text>
<xsl:apply-templates
select=". | following-sibling::number[position() < $numCols]"
mode="format"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template match="number" name="format" mode="format">
<xsl:param name="number" select="." />
<xsl:call-template name="pad">
<xsl:with-param name="string" select="format-number(.,'#,###.00')"/>
</xsl:call-template>
<xsl:text> </xsl:text>
</xsl:template>
Output:
A B C
------------ ------------ ------------
1| 10.0000 3.5000 4.4400
2| 77.7777 8.0000- 1.0000
3| 444.0000 1.1234 7.7700
4| 3.1416 10.0000 9.0000
5| 8.0000 7.0000 666.0000
6| 5555.0000 4444444.0000- 22.3300
7| 18.0000 36.5400 43.0000
8| 99999.0000 999999.0000 9999999.0000
9| 32.0000 64.0000 64.0001-
Numeric characters from other
languages can be used by setting the zero digit in
xsl:decimal-format
to the zero of that language.
This example uses the Unicode character 0x660 (Arabic-Indic Digit
Zero):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:str="http://www.ora.com/XSLTCookbook/namespaces/strings"> <xsl:output method="text" encoding="UTF-8"/> <xsl:include href="../strings/str.dup.xslt"/> <!-- This states that zero starts at character 0x660, which implies one is 0x661, etc. --> <xsl:decimal-format name="Arabic" zero-digit="٠"/> <xsl:template match="numbers"> <xsl:for-each select="number"> <xsl:call-template name="pad"> <xsl:with-param name="string" select="format-number(.,'#,###.00')"/> </xsl:call-template> = <xsl:text/> <xsl:value-of select="format-number(.,'#,###.٠٠','Arabic')"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template> <xsl:template name="pad"> <xsl:param name="string"/> <xsl:param name="width" select="16"/> <xsl:value-of select="$string"/> <xsl:call-template name="str:dup"> <xsl:with-param name="input" select="' '"/> <xsl:with-param name="count" select="$width - string-length($string)"/> </xsl:call-template> </xsl:template> </xsl:stylesheet>
Here is the output of this code:
You want to
round to a specific
number of decimal places; however, XSLT’s
round
, ceiling
, and
floor
functions always map numbers to integer
values.
Multiply, round, and divide using a power of ten that determines how
many decimal digits are required. Assuming $pi
=
3.1415926535897932:
<xsl:value-of select="round($pi * 10000) div 10000"/>
results in 3.1416. Similarily:
<xsl:value-of select="ceiling($pi * 10000) div 10000"/>
results in 3.1416, and:
<xsl:value-of select="floor($pi * 10000) div 10000"/>
results in 3.1415.
Rounding to a specific number of decimal places is also achieved
using
format-number()
:
<xsl:value-of select="format-number($pi,'#.####')"/>
This results in 3.1416. This will work even if more than one
significant digit is in the whole part because
format-number
never uses a format specification as
an indication to remove significant digits from the whole part:
<xsl:value-of select="format-number($pi * 100,'#.####')"/>
This results in 314.1593.
You can use format-number
to get the effect of
truncating rather than rounding by using one more formatting digit
than required and then chopping off the last character:
<xsl:variable name="pi-to-5-sig" select="format-number($pi,'#.#####')"/> <xsl:value-of select="substring($pi-to-5-sig,1,string-length($pi-to-5-sig) -1)"/>
This results in 3.1415.
The new XPath 2.0 round-half-to-even()
function
will do the trick for most applications. The half to
even rule means that when the value being rounded is smack
between the lower and higher values, then rounding will be in the
direction that produces an even result. So
round-half-to-even(1.115,2) eq 1.12
and
round-half-to-even(1.125,2) eq 1.12
also! In the
first case, we round up because 2 is even, and the second rounds down
because 3 is not. The theory behind this is that if you have a bunch
of numbers, you should roughly round up as often as you round down so
rounding errors cancel out. As you probably guessed, the second
argument is the number of decimal places to round to.
If your application mandates that 5 in the last decimal place always rounds upward, you can use the techniques employed in the XSLT 1.0 recipe.
The multiply, round, and divide technique works well as long as the numbers involved remain within the representational limits of IEEE floating point. If you try to capture too many places after the decimal, then the rules of IEEE floating point will interfere with the expected result. For example, trying to pick up 16 decimal digits of pi will give you only 15:
<xsl:value-of select="round($pi * 10000000000000000) div 10000000000000000"/>
This results in 3.141592653589793, not 3.1415926535897932.
An alternative technique manipulates the number as a string and truncates:
<xsl:value-of select="concat(substring-before($pi,'.'), '.', substring(substring-after($pi,'.'),1,4))"/>
and results in 3.1415.
The effect of ceiling
or round
can be obtained by this technique at the cost of additional
complexity:
<xsl:variable name="whole" select="substring-before($pi,'.')"/> <xsl:variable name="frac" select="substring-after($pi,'.')"/> <xsl:value-of select="concat($whole, '.', substring($frac,1,3), round(substring($frac,4,2) div 10))"/>
This results in 3.1416.
Roman numbers do not use a place value system; instead, the number is composed by adding or subtracting the fixed value of the specified Roman numeral characters. If the following character has a lower or equal value, you add; otherwise, you subtract:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:math="http://www.ora.com/XSLTCookbook/math"> <ckbk:romans> <ckbk:roman value="1">i</ckbk:roman> <ckbk:roman value="1">I</ckbk:roman> <ckbk:roman value="5">v</ckbk:roman> <ckbk:roman value="5">V</ckbk:roman> <ckbk:roman value="10">x</ckbk:roman> <ckbk:roman value="10">X</ckbk:roman> <ckbk:roman value="50">l</ckbk:roman> <ckbk:roman value="50">L</ckbk:roman> <ckbk:roman value="100">c</ckbk:roman> <ckbk:roman value="100">C</ckbk:roman> <ckbk:roman value="500">d</ckbk:roman> <ckbk:roman value="500">D</ckbk:roman> <ckbk:roman value="1000">m</ckbk:roman> <ckbk:roman value="1000">M</ckbk:roman> </ckbk:romans> <xsl:variable name="ckbk:roman-nums" select="document('')/*/*/ckbk:roman"/> <xsl:template name="ckbk:roman-to-number"> <xsl:param name="roman"/> <xsl:variable name="valid-roman-chars"> <xsl:value-of select="document('')/*/ckbk:romans"/> </xsl:variable> <xsl:choose> <!-- returns true if there are any non-Roman characters in the string --> <xsl:when test="translate($roman,$valid-roman-chars,'')">NaN</xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:roman-to-number-impl"> <xsl:with-param name="roman" select="$roman"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:roman-to-number-impl"> <xsl:param name="roman"/> <xsl:param name="value" select="0"/> <xsl:variable name="len" select="string-length($roman)"/> <xsl:choose> <xsl:when test="not($len)"> <xsl:value-of select="$value"/> </xsl:when> <xsl:when test="$len = 1"> <xsl:value-of select="$value + $ckbk:roman-nums[. = $roman]/@value"/> </xsl:when> <xsl:otherwise> <xsl:variable name="roman-num" select="$ckbk:roman-nums[. = substring($roman, 1, 1)]"/> <xsl:choose> <xsl:when test="$roman-num/following-sibling::ckbk:roman = substring($roman, 2, 1)"> <xsl:call-template name="ckbk:roman-to-number-impl"> <xsl:with-param name="roman" select="substring($roman,2,$len - 1)"/> <xsl:with-param name="value" select="$value - $roman-num/@value"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:roman-to-number-impl"> <xsl:with-param name="roman" select="substring($roman,2,$len - 1)"/> <xsl:with-param name="value" select="$value + $roman-num/@value"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
The xsl:number
element
provides a convenient way to convert numbers to Roman numerals;
however, for converting from Roman numerals to numbers, you are on
your own. The recursive template shown earlier is straightforward and
much like that already found in Jeni Tennison’s
XSLT and XPath on the Edge (M&T Books,
2001).
There are two small caveats, but they should not cause trouble in
most cases. The first is that the previous solution will not work
with Roman numerals using mixed case (e.g., IiI
).
Such odd strings would hardly appear in reasonable data source, but
this code will neither reject such input nor arrive at the
“correct” value. Adding code to
convert to one case allows the code to reject or correctly process
these mixed Romans.
The second caveat relates to the fact that there is no standard Roman
representation for numbers higher than 1,000. Saxon and Xalan keep
stringing M
s together, but another processor might
do something else.
If for some reason you object to storing data about Roman numerals in the stylesheet, then the following XPath 1.0 decodes a Roman numeral:
<xsl:variable name="roman-value" select="($c = 'i' or $c = 'I') * 1 + ($c = 'v' or $c = 'V') * 5 + ($c = 'x' or $c = 'X') * 10 + ($c = 'l' or $c = 'L') * 50 + ($c = 'c' or $c = 'C') * 100 + ($c = 'd' or $c = 'D') * 500 + ($c = 'm' or $c = 'M') * 1000)"/>
In XSLT 2.0, you can use a nested if-then-else expression or use a lookup within a sequence:
<xsl:variable name="roman-value" select="(1,5,10,50,100,500,1000) [index-of(('I', 'V', 'X', 'L', 'C', 'D', 'M'),upper-case($c))]"/>
This example provides a general solution for converting from any base between 2 and 36 to any base in the same range. It uses two global variables to encode the value of all characters in a base 36 system as offsets into the string—one for uppercase encoding and the other for lowercase:
<xsl:variable name="ckbk:base-lower" select="'0123456789abcdefghijklmnopqrstuvwxyz'"/> <xsl:variable name="ckbk:base-upper" select="'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:template name="ckbk:convert-base"> <xsl:param name="number"/> <xsl:param name="from-base"/> <xsl:param name="to-base"/> <xsl:variable name="number-base10"> <xsl:call-template name="ckbk:convert-to-base-10"> <xsl:with-param name="number" select="$number"/> <xsl:with-param name="from-base" select="$from-base"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:convert-from-base-10"> <xsl:with-param name="number" select="$number-base10"/> <xsl:with-param name="to-base" select="$to-base"/> </xsl:call-template> </xsl:template>
This template reduces the general problem to two subproblems of converting to and from base 10. Performing base 10 conversions is easier because it is the native base of XPath numbers.
The template ckbk:convert-to-base-10
normalizes
the input number to lowercase. Thus, for example, you treat
ffff
hex the same as FFFF
hex,
which is the normal convention. Two error checks are performed to
make sure the base is in the range you can handle and that the number
does not contain illegal characters inconsistent with the base. The
trivial case of converting from base 10 to base 10 is also handled:
<xsl:template name="ckbk:convert-to-base-10"> <xsl:param name="number"/> <xsl:param name="from-base"/> <xsl:variable name="num" select="translate($number,$ckbk:base-upper, $ckbk:base-lower)"/> <xsl:variable name="valid-in-chars" select="substring($ckbk:base-lower,1,$from-base)"/> <xsl:choose> <xsl:when test="$from-base < 2 or $from-base > 36">NaN</xsl:when> <xsl:when test="not($num) or translate($num,$valid-in-chars,'')">NaN</xsl:when> <xsl:when test="$from-base = 10"> <xsl:value-of select="$number"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:convert-to-base-10-impl"> <xsl:with-param name="number" select="$num"/> <xsl:with-param name="from-base" select="$from-base"/> <xsl:with-param name="from-chars" select="$valid-in-chars"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Once error checking is taken care of, you can defer the actual conversion to another recursive template that does the work. This template looks up the decimal value of each character as an offset into the string of characters you obtained from the caller. The recursion keeps multiplying the result by the base and adding in the value of the first character until you are left with a string of length 1:
<xsl:template name="ckbk:convert-to-base-10-impl"> <xsl:param name="number"/> <xsl:param name="from-base"/> <xsl:param name="from-chars"/> <xsl:param name="result" select="0"/> <xsl:variable name="value" select="string-length(substring-before($from-chars,substring($number,1,1)))"/> <xsl:variable name="total" select="$result * $from-base + $value"/> <xsl:choose> <xsl:when test="string-length($number) = 1"> <xsl:value-of select="$total"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:convert-to-base-10-impl"> <xsl:with-param name="number" select="substring($number,2)"/> <xsl:with-param name="from-base" select="$from-base"/> <xsl:with-param name="from-chars" select="$from-chars"/> <xsl:with-param name="result" select="$total"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
The other half of the problem requires a conversion from base 10 to any base. Again, you separate error checking from the actual conversion:
<xsl:template name="ckbk:convert-from-base-10"> <xsl:param name="number"/> <xsl:param name="to-base"/> <xsl:choose> <xsl:when test="$to-base < 2 or $to-base > 36">NaN</xsl:when> <xsl:when test="number($number) != number($number)">NaN</xsl:when> <xsl:when test="$to-base = 10"> <xsl:value-of select="$number"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:convert-from-base-10-impl"> <xsl:with-param name="number" select="$number"/> <xsl:with-param name="to-base" select="$to-base"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
The actual conversion is simply a matter of picking out each digit
form the $ckbk:base-lower
table based on the
remainder (i.e., mod) obtained when dividing by the
$to-base
. You recurse on the leftover integer
portion, concatenating the digit onto the front of the result.
Recursion ends when the number is less than the base:
<xsl:template name="ckbk:convert-from-base-10-impl"> <xsl:param name="number"/> <xsl:param name="to-base"/> <xsl:param name="result"/> <xsl:variable name="to-base-digit" select="substring($ckbk:base-lower,$number mod $to-base + 1,1)"/> <xsl:choose> <xsl:when test="$number >= $to-base"> <xsl:call-template name="ckbk:convert-from-base-10-impl"> <xsl:with-param name="number" select="floor($number div $to-base)"/> <xsl:with-param name="to-base" select="$to-base"/> <xsl:with-param name="result" select="concat($to-base-digit,$result)"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="concat($to-base-digit,$result)"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Base conversions are a common programming task and most developers already know how to perform them. Many languages have built-in provisions for these conversions; XSLT does not. The fact that XPath 1.0 and XSLT 1.0 provide no means of getting the integer value of a Unicode character makes these conversions more cumbersome. In XPath 2.0 you can use the functions string-to-codepoints and codepoints-to-string. Hence, you must resort to playing tricks with strings that act like lookup tables. These manipulations are inefficient, but reasonable for most conversion needs.
The code assumes that bases higher than 10 will use the standard convention of assigning successive alphas to digits higher than 9. If you work with an unconventional encoding, then you will have to adjust the mapping strings accordingly. You can potentially extend this code beyond base 36 by adding the characters used to encode digits higher than Z.
XSLT 1.0 implementations are provided here for absolute value, square root, logarithms, power, and factorial.
The obvious but long-winded way to determine the absolute value of a number is shown here:
<xsl:template name="ckbk:abs"> <xsl:param name="x"/> <xsl:choose> <xsl:when test="$x < 0"> <xsl:value-of select="$x * -1"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$x"/> </xsl:otherwise> </xsl:choose> </xsl:template>
The short but obscure way relies on the fact that the
true
always converts to the number 1 and
false
to the number 0:
<xsl:template name="ckbk:abs"> <xsl:param name="x"/> <xsl:value-of select="(1 - 2 *($x < 0)) * $x"/> </xsl:template>
I prefer the latter because it is concise. Alternatively, you can use an extension function (see Chapter 12).
Nate Austin contributed a
native XSLT sqrt
to EXSLT that uses
Newton’s method:
<xsl:template name="ckbk:sqrt"> <!-- The number you want to find the square root of --> <xsl:param name="number" select="0"/> <!-- The current 'try'. This is used internally. --> <xsl:param name="try" select="1"/> <!-- The current iteration, checked against maxiter to limit loop count --> <xsl:param name="iter" select="1"/> <!-- Set this up to ensure against infinite loops --> <xsl:param name="maxiter" select="20"/> <!-- This template was written by Nate Austin using Sir Isaac Newton's method of finding roots --> <xsl:choose> <xsl:when test="$try * $try = $number or $iter > $maxiter"> <xsl:value-of select="$try"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:sqrt"> <xsl:with-param name="number" select="$number"/> <xsl:with-param name="try" select="$try - (($try * $try - $number) div (2 * $try))"/> <xsl:with-param name="iter" select="$iter + 1"/> <xsl:with-param name="maxiter" select="$maxiter"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Changing the initial value of try
can improve
performance significantly:
<xsl:template name="ckbk:sqrt"> <!-- The number you want to find the square root of --> <xsl:param name="number" select="0"/> <!-- The current 'try'. This is used internally. --> <xsl:param name="try" select="($number < 100) + ($number >= 100 and $number < 1000) * 10 + ($number >= 1000 and $number < 10000) * 31 + ($number >= 10000) * 100"/> <!-- rest of code the same --> </xsl:template>
This little trick (using
Boolean-to-numeric
conversion again) causes try
to better approximate
the square root on the first get go. On a test that computes all
roots from 1 to 10,000, I achieved a 10% performance boost. More
significantly, the change reduced the average error from 1
10-5 to 6
10-13. This means you
can reduce the number of iterations to get even more performance at
the same average error rate. For example, I was able to reduce the
iterations from 10 to 6 and achieve the same error rate of
10-5, but at a 50% performance increase
using the same test. If you need to compute square roots of numbers
much greater than 10,000, you should keep the iterations at least 10
or add higher ranges to the try
initialization.
If
your XSLT processor supports EXSLT’s
exsl:log()
(natural log), then implementing a logarithm
to any other base is easy. In pseudocode:
<!-- A fundamental rule of logarithms --> ckbk:logN(x,base) = ckbk:logN(x,diffBase) div ckbk:logN(base, diffBase)
Unfortunately, no XSLT processors are currently listed on EXSLT.org
that implement exsl:log
. Your next best bet is to
implement an extension function in terms of Java’s
java.lang.Math.log
class or
JavaScript’s Math.log
. Finally,
if you must avoid extensions, the following pure XSLT implementation
computes log10 to a good degree of accuracy at an acceptable speed
for most (sane) applications. Once you have log10()
,
then log()
follows from
the rule of logarithms:
<xsl:template name="ckbk:log10"> <xsl:param name="number" select="1"/> <xsl:param name="n" select="0"/> <!-- book keeping for whole part of result --> <xsl:choose> <xsl:when test="$number <= 0"> <!-- Logarithms are undefined for 0 and negative numbers. --> <xsl:value-of select="'NaN'"/> </xsl:when> <xsl:when test="$number < 1"> <!-- Fractional numbers have negative logs --> <xsl:call-template name="ckbk:log10"> <xsl:with-param name="number" select="$number * 10"/> <xsl:with-param name="n" select="$n - 1"/> </xsl:call-template> </xsl:when> <xsl:when test="$number > 10"> <!-- Numbers greater than 10 have logs greater than 1 --> <xsl:call-template name="ckbk:log10"> <xsl:with-param name="number" select="$number div 10"/> <xsl:with-param name="n" select="$n + 1"/> </xsl:call-template> </xsl:when> <xsl:when test="$number = 10"> <xsl:value-of select="$n + 1"/> </xsl:when> <xsl:otherwise> <!-- We only need to know how to compute for numbers in range [1,10) --> <xsl:call-template name="ckbk:log10-util"> <xsl:with-param name="number" select="$number"/> <xsl:with-param name="n" select="$n"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <!-- Computes log (natural) of number--> <xsl:template name="ckbk:log"> <xsl:param name="number" select="1"/> <xsl:variable name="log10-e" select="0.4342944819"/> <xsl:variable name="log10"> <xsl:call-template name="ckbk:log10"> <xsl:with-param name="number" select="$number"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$log10 div $log10-e"/> </xsl:template> <!-- Computes log to base b of number--> <xsl:template name="ckbk:log-b"> <xsl:param name="number" select="1"/> <xsl:param name="base" select="2"/> <xsl:variable name="log10-base"> <xsl:call-template name="ckbk:log10"> <xsl:with-param name="number" select="$base"/> </xsl:call-template> </xsl:variable> <xsl:variable name="log10"> <xsl:call-template name="ckbk:log10"> <xsl:with-param name="number" select="$number"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$log10 div $log10-base"/> </xsl:template> <!-- Computes log10 of numbers in the range [1,10) and returns the result + n--> <xsl:template name="ckbk:log10-util"> <xsl:param name="number"/> <xsl:param name="n"/> <xsl:param name="frac" select="0"/> <!-- book keeping variable for fractional part --> <xsl:param name="k" select="0"/> <!-- iteration counter --> <xsl:param name="divisor" select="2"/> <!-- successive powers of 2 used to build up frac --> <xsl:param name="maxiter" select="38"/> <!-- Number of iterations. 38 is more than sufficient to get at least 10 dec place prec --> <xsl:variable name="x" select="$number * $number"/> <xsl:choose> <xsl:when test="$k >= $maxiter"> <!-- Round to 10 decimal places --> <xsl:value-of select="$n + round($frac * 10000000000) div 10000000000"/> </xsl:when> <xsl:when test="$x < 10"> <xsl:call-template name="ckbk:log10-util"> <xsl:with-param name="number" select="$x"/> <xsl:with-param name="n" select="$n"/> <xsl:with-param name="k" select="$k + 1"/> <xsl:with-param name="divisor" select="$divisor * 2"/> <xsl:with-param name="frac" select="$frac"/> <xsl:with-param name="maxiter" select="$maxiter"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:log10-util"> <xsl:with-param name="number" select="$x div 10"/> <xsl:with-param name="n" select="$n"/> <xsl:with-param name="k" select="$k + 1"/> <xsl:with-param name="divisor" select="$divisor * 2"/> <xsl:with-param name="frac" select="$frac + (1 div $divisor)"/> <xsl:with-param name="maxiter" select="$maxiter"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Ckbk:log10
’s main purpose is to
reduce the problem of computing log10(x)
to the
simpler problem of computing log10(x
: 1
<=
x
<
10)
.
This is done by observing that log10(x
: x
>
10)
=
log10(
x
div
10)
+
1
and
log10(x
: x
<
1)
=
log10(
x
*
10)
-
1
.
Error checking is also performed because logarithms are not defined
for zero or negative numbers.
The utility template, ckbk:log10-util
,
does
the hard part; it is a tail-recursive implementation of an iterative
technique found in
Knuth.[2] To keep the
implementation tail recursive and greatly
simplify the implementation, several
bookkeeping parameters are defined:
The whole part of the answer passed in by
ckbk:log10
. This parameter is not strictly
necessary because it could have held onto it until
ckbk:log10-util
did its part. However, it
eliminates the need to capture the result of
ckbk:log10-util
in a variable.
The fractional part of the answer. This is what we are really after.
An iteration counter that is incremented on each recursive call. The
recursion terminates when $k > $maxiter
.
A number that is set to the next higher power of 2 on each recursive
call (e.g., 2,4,8,16,...). The value 1 div
$divisor
is added to frac
as you
approximate the logarithm.
The number of iterations used to compute frac
. The
higher maxiter
is, the greater the precision of
the result (up to the limits of IEEE floating point). A parameter
need not be used, but it opens the possibility of extending
log10
to allow the caller to determine the
required number of iterations and hence tweak speed versus accuracy.
At this
time, EXSLT.org does not list any implementations that support
ckbk:power()
. However, as it is defined by EXSLT,
power()
is easy to implement in pure XSLT. Jeni
Tennison provides the following implementation:
<xsl:template name="ckbk:power"> <xsl:param name="base"/> <xsl:param name="power"/> <xsl:choose> <xsl:when test="$power = 0"> <xsl:value-of select="1"/> </xsl:when> <xsl:otherwise> <xsl:variable name="temp"> <xsl:call-template name="ckbk:power"> <xsl:with-param name="base" select="$base"/> <xsl:with-param name="power" select="$power - 1"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$base * $temp"/> </xsl:otherwise> </xsl:choose> </xsl:template>
For most applications, this code will do just fine; however, it is
neither tail recursive nor the most algorithmically efficient
implementation. The following implementation is tail recursive and
reduces the number of multiplications from
O($power)
to O(log2($power))
.
It also adds error handling, which prevents an infinite recursion if
$power
is NaN
:
<xsl:template name="ckbk:power"> <xsl:param name="base"/> <xsl:param name="power"/> <xsl:param name="result" select="1"/> <xsl:choose> <xsl:when test="number($base) != $base or number($power) != $power"> <xsl:value-of select="'NaN'"/> </xsl:when> <xsl:when test="$power = 0"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:power"> <xsl:with-param name="base" select="$base * $base"/> <xsl:with-param name="power" select="floor($power div 2)"/> <xsl:with-param name="result" select="$result * $base * ($power mod 2) + $result * not($power mod 2)"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
This section introduces a bookkeeping parameter,
$result
, to build up the final answer. It allows
you to make the function tail recursive. On each recursive step,
square the base and halve the power. Because you use floor()
, $result
will reach 0 in
ceiling(log2($power))
recursions. This accounts
for the better performance. The tricky part is the computation of
$result
at each step.
Let’s analyze this expression by looking at each
side of the addition. The expression $result * $base *
($power mod 2)
will be equal to $result *
$base
when $power
is odd and 0
otherwise. Conversely, $result * not($power mod 2)
will be equal to 0 when $power
is odd and
$result
otherwise. In XPath 2.0, you would write
this expression as if (power % 2) eq 1 then result * base
else result
. In XPath 1.0, you emulate it with this bit of
XSLT hackery. The net result is that this template ends up computing
b1 * base + b2
* base2 + b3 *
base4 * b4 *
base8 ..., where
bi is either 0 or 1. It should be fairly
easy to see that this sum can add up to
basepower for an arbitrary integer power
by setting the bis to appropriate
values—which is just what the expression
$power
mod
2
determines. If this concept is still unclear, then work out a few
cases by hand to convince yourself that it works.[3]
The power function shown earlier only computes powers for positive
integral powers. However, as any high-school algebra student can tell
you, xy is a real number for all real x
and y, not just positive integers. It would be nice to have a
generalized version of power()
lying around in
case you need it, so here it is. Not to be confused with
power()
, this template is called
power-f()
,
where the f
stands for
floating point. If you prefer to have the most general version called
power()
, then go right ahead and rename it in
your own code. However, having the restricted version available as a
separate function is still useful:
<xsl:template name="ckbk:power-f"> <xsl:param name="base"/> <xsl:param name="power"/> <xsl:choose> <xsl:when test="number($base) != $base or number($power) != $power"> <xsl:value-of select="'NaN'"/> </xsl:when> <xsl:when test="$power < 0"> <xsl:variable name="result"> <xsl:call-template name="ckbk:power-f"> <xsl:with-param name="base" select="$base"/> <xsl:with-param name="power" select="-1 * $power"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="1 div $result"/> </xsl:when> <xsl:otherwise> <xsl:variable name="powerN" select="floor($power)"/> <xsl:variable name="resultN"> <xsl:call-template name="ckbk:power"> <xsl:with-param name="base" select="$base"/> <xsl:with-param name="power" select="$powerN"/> </xsl:call-template> </xsl:variable> <xsl:choose> <xsl:when test="$power - $powerN"> <xsl:variable name="resultF"> <xsl:call-template name="ckbk:power-frac"> <xsl:with-param name="base" select="$base"/> <xsl:with-param name="power" select="$power - $powerN"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$resultN * $resultF"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$resultN"/> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:power-frac"> <xsl:param name="base"/> <xsl:param name="power"/> <xsl:param name="n" select="1"/> <xsl:param name="ln_base"> <xsl:call-template name="ckbk:log"> <xsl:with-param name="number" select="$base"/> </xsl:call-template> </xsl:param> <xsl:param name="ln_base_n" select="$ln_base"/> <xsl:param name="power_n" select="$power"/> <xsl:param name="n_fact" select="$n"/> <xsl:param name="result" select="1"/> <xsl:choose> <xsl:when test="20 >= $n"> <xsl:call-template name="ckbk:power-frac"> <xsl:with-param name="base" select="$base"/> <xsl:with-param name="power" select="$power"/> <xsl:with-param name="n" select="$n + 1"/> <xsl:with-param name="ln_base" select="$ln_base "/> <xsl:with-param name="ln_base_n" select="$ln_base_n * $ln_base"/> <xsl:with-param name="power_n" select="$power_n * $power"/> <xsl:with-param name="n_fact" select="$n_fact * ($n+1)"/> <xsl:with-param name="result" select="$result + ($power_n * $ln_base_n) div $n_fact"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="round($result * 1000000000) div 1000000000"/> </xsl:otherwise> </xsl:choose> </xsl:template>
The ckbk:power-f
template does not do the meat of
the calculation. Instead, it checks for errors and then computes the
result in terms of the existing template,
ckbk:power
, and a new template,
ckbk:power-frac
. Template
ckbk:power-f
takes advantage of the following two
mathematical truths about powers:
base-y = 1 / basey, which allows you to handle negative powers easily.
base(power1+power2) =
basepower1 *
basepower2, which lets you reuse the
accuracy and efficiency ckbk:power()
for the
whole part of $power
and use a good approximation
for the fractional part.
The template ckbk:power-frac
is
a recursive implementation of the
Maclaurin
series for xy, as shown in Figure 3-1.
One way to implement the trigonometric functions is to create similar recursive implementations of their Maclaurin representation:
Oddly, EXSLT has not defined a factorial function. Factorial is, of course, easy to implement:
<xsl:template name="ckbk:fact"> <xsl:param name="number" select="0"/> <xsl:param name="result" select="1"/> <xsl:choose> <xsl:when test="$number < 0 or floor($number) != $number"> <xsl:value-of select="'NaN'"/> </xsl:when> <xsl:when test="$number < 2"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:fact"> <xsl:with-param name="number" select="$number - 1"/> <xsl:with-param name="result" select="$number * $result"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
A useful generalization of factorial is a function that computes the product of all numbers in a range:
<xsl:template name="ckbk:prod-range"> <xsl:param name="start" select="1"/> <xsl:param name="end" select="1"/> <xsl:param name="result" select="1"/> <xsl:choose> <xsl:when test="$start > $end"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:prod-range"> <xsl:with-param name="start" select="$start + 1"/> <xsl:with-param name="end" select="$end"/> <xsl:with-param name="result" select="$start * $result"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
In 2.0, abs()
is built-in. You can implement the others
as functions for maximum convenience:
<!-- Power --> <xsl:function name="ckbk:power" as="xs:double"> <xsl:param name="base" as="xs:double"/> <xsl:param name="exp" as="xs:integer"/> <xsl:sequence select="if ($exp lt 0) then ckbk:power(1.0 div $base, -$exp) else if ($exp eq 0) then 1e0 else $base * ckbk:power($base, $exp - 1)"/> </xsl:function> <!-- Sqrt --> <xsl:function name="ckbk:sqrt" as="xs:double"> <xsl:param name="number" as="xs:double"/> <xsl:variable name="try" select="if ($number lt 100.0) then 1.0 else if ($number gt 100.0 and $number lt 1000.0) then 10.0 else if ($number gt 1000.0 and $number lt 10000.0) then 31.0 else 100.00" as="xs:decimal"/> <xsl:sequence select="if ($number ge 0) then ckbk:sqrt($number,$try,1,20) else 'NaN'"/> </xsl:function> <xsl:function name="ckbk:sqrt" as="xs:double"> <xsl:param name="number" as="xs:double"/> <xsl:param name="try" as="xs:double"/> <xsl:param name="iter" as="xs:integer"/> <xsl:param name="maxiter" as="xs:integer"/> <xsl:variable name="result" select="$try * $try" as="xs:double"/> <xsl:sequence select="if ($result eq $number or $iter gt $maxiter) then $try else ckbk:sqrt($number, ($try - (($result - $number) div (2 * $try))), $iter + 1, $maxiter)"/> </xsl:function> <!-- Factorial --> <xsl:function name="ckbk:factorial" as="xs:decimal"> <xsl:param name="n" as="xs:integer"/> <xsl:sequence select="if ($n eq 0) then 1 else $n * ckbk:factorial($n - 1)"/> </xsl:function> <!-- Prod-range --> <xsl:function name="ckbk:prod-range" as="xs:decimal"> <xsl:param name="from" as="xs:integer"/> <xsl:param name="to" as="xs:integer"/> <xsl:sequence select="if ($from ge $to) then $from else $from * ckbk:prod-range($from + 1, $to)"/> </xsl:function> <!-- Log10 --> <xsl:function name="ckbk:log10" as="xs:double"> <xsl:param name="number" as="xs:double"/> <xsl:sequence select="if ($number le 0) then 'NaN' else ckbk:log10($number,0)"/> </xsl:function> <xsl:function name="ckbk:log10" as="xs:double"> <xsl:param name="number" as="xs:double"/> <xsl:param name="n" as="xs:double"/> <xsl:sequence select="if ($number le 1) then ckbk:log10($number * 10, $n - 1) else if($number gt 10) then ckbk:log10($number div 10, $n + 1) else if($number eq 10) then $n + 1 else $n + ckbk:log10-util($number,0,0,2,38)"/> </xsl:function> <xsl:function name="ckbk:log10-util" as="xs:double"> <xsl:param name="number" as="xs:double"/> <xsl:param name="frac" as="xs:double"/> <xsl:param name="iter" as="xs:integer"/> <xsl:param name="divisor" as="xs:double"/> <xsl:param name="maxiter" as="xs:integer"/> <xsl:variable name="x" select="$number * $number"/> <xsl:sequence select="if ($iter ge $maxiter) then round-half-to-even($frac,10) else if ($x lt 10) then ckbk:log10-util($x,$frac,$iter + 1, $divisor * 2, $maxiter) else ckbk:log10-util($x div 10, $frac + (1 div $divisor), $iter + 1, $divisor * 2, $maxiter)"/> </xsl:function>
I would guess that at least 80% of your garden-variety XSLT applications never require math beyond the native capabilities of XPath. When the remaining 20 percent need to go beyond XPath 1.0, chances are high that one or more of the previous functions are necessary.
The biggest drawback of a pure XSLT 1.0 implementation is that the templates cannot be invoked as first-class functions from XPath expression. This makes doing math awkward and slightly more inefficient because you need to create artificial variables to capture the results of template calls as result-tree fragments. Internally, the XSLT processor has to convert these fragments back to numbers when you use them in subsequent calculations.
Another problem with XSLT implementations is that the public
interface of these templates is polluted with bookkeeping parameters
that obscure the nature of the function. The bookkeeping parameters
are often necessary to make the implementation tail recursive and
prevent unnecessary work. For example, the implementation of
power-frac
needs to compute the log of the base
and have it available at all times. If ln_base
were not a parameter, then log
would be invoked on
every recursive call, resulting in unacceptable performance.
XSLT 2.0 comes to the rescue by giving you first-class functions.
Thus, the 2.0 solutions are more concise and easier to use. However,
parameters in xsl:function
cannot have default
values; thus, you will need to emulate this capability by overloading
functions with different arities to emulate defaults. Unfortunately,
2.0 provides no way of encapsulating the private parameters that are
used in recursive functions for bookkeeping purposes. When designing
libraries of functions, you might want to consider placing the
implementation functions (i.e., the one with the artificial
bookkeeping parameters) in a different implementation namespace to
hint that these are not for general use.
The second problem would be alleviated if a future XSLT version had private parameters that can be used only in a recursive call.
The abstract form of sum for processors that support tail-recursive optimization is as follows:
<xsl:template name="ckbk:sum"> <!-- Initialize nodes to empty node set --> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="0"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <!-- call or apply template that will determine value of node unless the node is literally the value to be summed --> <xsl:variable name="value"> <xsl:call-template name="some-function-of-a-node"> <xsl:with-param name="node" select="$nodes[1]"/> </xsl:call-template> </xsl:variable> <!-- recurse to sum rest --> <xsl:call-template name="ckbk:sum"> <xsl:with-param name="nodes" select="$nodes[position() != 1]"/> <xsl:with-param name="result" select="$result + $value"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Two techniques can handle a large number of nodes in the absence of tail-recursive optimization. The first is commonly called divide and conquer. The idea behind this technique is to reduce the amount of work by at least a factor of two on each recursive step:
<xsl:template name="ckbk:sum-dvc"> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="0"/> <xsl:param name="dvc-threshold" select="100"/> <xsl:choose> <xsl:when test="count($nodes) <= $dvc-threshold"> <xsl:call-template name="ckbk:sum"> <xsl:with-param name="nodes" select="$nodes"/> <xsl:with-param name="result" select="$result"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:variable name="half" select="floor(count($nodes) div 2)"/> <xsl:variable name="sum1"> <xsl:call-template name="ckbk:sum-dvc"> <xsl:with-param name="nodes" select="$nodes[position() <= $half]"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="dvc-threshold" select="$dvc-threshold"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:sum-dvc"> <xsl:with-param name="nodes" select="$nodes[position() > $half]"/> <xsl:with-param name="result" select="$sum1"/> <xsl:with-param name="dvc-threshold" select="$dvc-threshold"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
The second is called batching, which uses two recursive stages. The first stage divides the large problem into batches of reasonable size. The second stage processes each batch recursively:
<xsl:template name="ckbk:sum-batcher"> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="0"/> <xsl:param name="batch-size" select="500"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <xsl:variable name="batch-sum"> <xsl:call-template name="ckbk:sum"> <xsl:with-param name="nodes" select="$nodes[position() < $batch-size]"/> <xsl:with-param name="result" select="$result"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:sum-batcher"> <xsl:with-param name="nodes" select="$nodes[position() >= $batch-size]"/> <xsl:with-param name="result" select="$batch-sum"/> <xsl:with-param name="batch-size" select="$batch-size"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
The solutions for product
are similar:
<xsl:template name="ckbk:product"> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="1"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <!-- call or apply template that will determine value of node unless the node is literally the value to be multiplied --> <xsl:variable name="value"> <xsl:call-template name="some-function-of-a-node"> <xsl:with-param name="node" select="$nodes[1]"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:product"> <xsl:with-param name="nodes" select="$nodes[position() != 1]"/> <xsl:with-param name="result" select="$result * $value"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:product-batcher"> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="1"/> <xsl:param name="batch-size" select="500"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="$result"/> </xsl:when> <xsl:otherwise> <xsl:variable name="batch-product"> <xsl:call-template name="ckbk:product"> <xsl:with-param name="nodes" select="$nodes[position() < $batch-size]"/> <xsl:with-param name="result" select="$result"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:product-batcher"> <xsl:with-param name="nodes" select="$nodes[position() >= $batch-size]"/> <xsl:with-param name="result" select="$batch-product"/> <xsl:with-param name="batch-size" select="$batch-size"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:product-dvc"> <xsl:param name="nodes" select="/.."/> <xsl:param name="result" select="1"/> <xsl:param name="dvc-threshold" select="100"/> <xsl:choose> <xsl:when test="count($nodes) <= $dvc-threshold"> <xsl:call-template name="ckbk:product"> <xsl:with-param name="nodes" select="$nodes"/> <xsl:with-param name="result" select="$result"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:variable name="half" select="floor(count($nodes) div 2)"/> <xsl:variable name="product1"> <xsl:call-template name="ckbk:product-dvc"> <xsl:with-param name="nodes" select="$nodes[position() <= $half]"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="dvc-threshold" select="$dvc-threshold"/> </xsl:call-template> </xsl:variable> <xsl:call-template name="ckbk:product-dvc"> <xsl:with-param name="nodes" select="$nodes[position() > $half]"/> <xsl:with-param name="result" select="$product1"/> <xsl:with-param name="dvc-threshold" select="$dvc-threshold"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Using the built-in XPath sum()
function
is the simplest way to perform simple
sums. However, if you want to compute sums of the
nodes’ arbitrary function in a node-set, then you
need either to:
Use one of the recipes in this section.
Compute the function of the nodes first, capturing the result in a
variable as a result-tree fragment. Then use an extension function to
convert the fragment to a node set that can be fed to
sum
. In XSLT 2.0, generalized sums become trivial
because of the banishment of result-tree fragments.
Sums of arbitrary functions are trivial in 2.0 because of the
introduction of sequences and the for
expression
and sum
function in XPath:
<!--Sum of squares --> <xsl:value select="sum(for $i in $nodes return $i * $i)"/>
However, XPath does not give you a native prod()
function, so you will need to provide your own:
<xsl:function name="ckbk:prod" as="xs:double"> <xsl:param name="numbers" as="xs:double*"/> <xsl:sequence select="if (count($numbers) eq 0) then 0 else if (count($numbers) = 1) then $numbers[1] else $numbers[1] * ckbk:prod(subsequence($numbers,2))"/> </xsl:function>
Batching and divide and conquer are two techniques for managing recursion that are useful whenever you must process a potentially large set of nodes. Experimentation shows that even when using an XSLT processor that recognizes tail recursion, better performance results from these approaches.
Chapter 14 shows how to make a reusable batch and divide-and-conquer drivers.
Given this prod function, you might be tempted use it to compute factorials:
<xsl:function name="ckbk:factorial" as="xs:double"> <xsl:param name="n" as="xs:integer"/> <xsl:sequence select="if ($n eq 0) then 1 else ckbk:prod(1 to $n)"/> </xsl:function>
This is probably an okay solution if $n
is small,
but as it gets larger, the overhead of constructing the sequence may
give worse performance then the more direct solution. This is
something to keep in mind when taking advantage of the convenience of
sequences. Michael Kay provides other
examples in discussing the differences
between his Knights Tour 1.0 and 2.0 solutions (XSLT 2.0,
Third Edition, Wrox, 2004, pg. 753):
//what does this code do?
<xsl:function name="ckbk:factorial" as="xs:decimal">
<xsl:param name="n" as="xs:integer"/>
<xsl:sequence select="if ($n eq 0) then 1 else $n * ckbk:factorial($n - 1)"/>
</xsl:function>
Dimitre Novatchev and Slawomir Tyszko compare batching with divide and conquer at http://www.vbxml.com/xsl/articles/recurse/.
The EXSLT functions that perform these operations are
exsl:min
, exsl:max
,
exsl:lowest
, and exsl:highest
.
The
functions
min
and
max
find the value of the node with minimum and
maximum numerical value, respectively. EXSLT defines
exsl:min
as listed here:
The minimum value is defined as follows: the node set passed as an
argument is sorted in ascending order as it would be by
xsl:sort
with a data type of
number
. The minimum is the result of converting
the string value of the first node in this sorted list to a number
using the number
function.
If the node set is empty, or if the result of converting the string
values of any of the nodes to a number is NaN
,
then NaN
is returned.
exsl:max
is defined similarly. EXSLT provides pure
XSLT implementations that are literal implementations of this
definition, as shown in Example 3-8.
<xsl:template name="ckbk:min"> <xsl:param name="nodes" select="/.." /> <xsl:choose> <xsl:when test="not($nodes)">NaN</xsl:when> <xsl:otherwise> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" /> <xsl:if test="position() = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:max"> <xsl:param name="nodes" select="/.." /> <xsl:choose> <xsl:when test="not($nodes)">NaN</xsl:when> <xsl:otherwise> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" order="descending" /> <xsl:if test="position() = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:template>
You may be scratching your head over the default value for the nodes
parameter (select="/
..”). This
is simply an idiomatic way to initialize nodes to an empty node set
(i.e., the parent of the root is empty by definition).
Although the definitions of ckbk:min
and
ckbk:max
use xsl:sort
, the
implementations need not, so results that are more efficient are
possible provided your XSLT processor supports tail recursion (see
Example 3-9).
<xsl:template name="ckbk:max"> <xsl:param name="nodes" select="/.."/> <xsl:param name="max"/> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/> <xsl:choose> <xsl:when test="not($count)"> <xsl:value-of select="number($max)"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"> <xsl:value-of select="number($aNode)"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:max"> <xsl:with-param name="nodes" select="$nodes[not(. <= number($aNode))]"/> <xsl:with-param name="max"> <xsl:choose> <xsl:when test="not($max) or $aNode > $max"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$max"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:min"> <xsl:param name="nodes" select="/.."/> <xsl:param name="min"/> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/> <xsl:choose> <xsl:when test="not($count)"> <xsl:value-of select="number($min)"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"> <xsl:value-of select="number($aNode)"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:min"> <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode))]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="not($min) or $aNode < $min"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Typically, the preceding implementations are faster than the version
using xsl:sort
. In some degenerate cases, they are
likely to be slower. The reason is that efficiency hinges on removing
half the nodes from consideration (on average) at each recursive
step. One can imagine a scenario in which at each pass,
aNode
is the minimum node (in the case of
ckbk:max
) or the maximum remaining node (in the
case of ckbk:min
). If this were to occur, each
pass would remove only one node, resulting in poor performance.
Luckily, data tends to come in two configurations: presorted and
random. In both cases, these implementations should hold their own.
While you had to look out for non-numeric data explicitly,
EXSLT’s implementations let
xsl:sort
take care of this. The XSLT standard
mandates that non-numeric data be placed up front by sort when
data-type='number
‘.
Don’t be tempted to write
not(number($var))
to test for
NaN
! I often catch myself doing this because it
sounds correct. The number function does not test
for a number; instead, it attempts to convert
its argument to a number. This is not what you
want—this test will conclude 0 is not a number due to the
conversion of 0 to false
. The correct test is
number($var) != number($var)
. This test works
because NaN
is never equal to
NaN
, but any number is always equal to itself. Do
not be tempted to shorten this idiom to number($var) !=
$var
. Doing so works most of the time, but if
$var
is an empty node set, it will fail. If you
prefer, a more direct approach
string(number($var))
= 'NaN
'
also works.
EXSLT defines ckbk:lowest
as follows.
The ckbk:lowest
function
returns
the nodes in the node set whose value is the minimum value for the
node set. The minimum value for the node set is the same as the value
as calculated by ckbk:min
. A node has this minimum
value if the result of converting its string value to a number as if
by the number
function is equal to the minimum
value, where the equality comparison is defined as a numerical
comparison using the =
operator.
If any of the nodes in the node set has a non-numeric value, the
ckbk:min
function will return
NaN
. The definition numeric comparisons entails
that NaN != NaN
. Therefore, if any of the nodes in
the node set has a non-numeric value, ckbk:lowest
will return an empty node set.
The EXSLT implementation is literally based on this definition and might not be very efficient:
<xsl:template name="ckbk:lowest"> <xsl:param name="nodes" select="/.." /> <xsl:if test="$nodes and not($nodes[number(.) != number(.)])"> <xsl:variable name="min"> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" /> <xsl:if test="position() = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:variable> <xsl:copy-of select="$nodes[. = $min]" /> </xsl:if> </xsl:template>
The xsl:if
test scans all nodes to handle cases
when a non-numeric is present. It then sorts to find the
min
and finally collects all nodes with that
min
. Example 3-10 reuses
ckbk:min
to do the same without needing to sort.
<xsl:template name="ckbk:lowest"> <xsl:param name="nodes" select="/.."/> <xsl:variable name="min"> <xsl:call-template name="ckbk:min"> <xsl:with-param name="nodes" select="$nodes"/> </xsl:call-template> </xsl:variable> <xsl:choose> <xsl:when test="number($min) = number($min)"> <xsl:copy-of select="$nodes[. = $min]"/> </xsl:when> </xsl:choose> </xsl:template>
Finally, you can implement ckbk:lowest
with only
one pass over the nodes if you are willing to forgo reuse of
ckbk:min
(see Example 3-11).
<xsl:template name="ckbk:lowest"> <xsl:param name="nodes" select="/.."/> <xsl:param name="lowest" select="/.."/> <xsl:variable name="index" select="ceiling(count($nodes) div 2)"/> <xsl:variable name="aNode" select="$nodes[$index]"/> <xsl:choose> <xsl:when test="not($index)"> <xsl:copy-of select="$lowest"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"/> <xsl:otherwise> <xsl:choose> <xsl:when test="not($lowest) or $aNode < $lowest"> <xsl:call-template name="ckbk:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$nodes[. = $aNode]"/> </xsl:call-template> </xsl:when> <xsl:when test="$aNode = $lowest"> <xsl:call-template name="ckbk:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$lowest|$nodes[$index]"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$lowest"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template>
Interestingly, this implementation does worse, probably because of
the additional copying that occurs. In performance tests on 10,000
data points using various distributions of data (sorted, reverse
sorted, semirandom, and random), the
ckbk:min
-based implementation beat the
xsl:sort
-based implementation by about 40% on
average (often better). The recursive implementation that did not use
ckbk:min
was 24% slower than the one that did.
The ckbk:highest
definition and implementations
follow directly from inverting the relational logic of
ckbk:lowest
, so I will not discuss them here.
XPath 2.0 has native min
and
max
functions.
The minimum and maximum values of a node set can be determined by the
simple XPath expressions <xsl:value-of
select="$nodes[not($nodes < .)]"/>
and
<xsl:value-of select="$nodes[not($nodes >
.)]"/>
. In English, the first says,
“Select all nodes for which there is no node less
than its value,” and the second says,
“Select all nodes for which there is no node greater
than its value.”
Although very simple, these expressions have
O(N2)
performance, where N is the number of nodes. Therefore, unless you
are certain that the number of nodes is small, avoid this shortcut if
you can. Occasionally, you are forced to use it because, for example,
you have to find the min
/max
within the select attribute of xsl:sort
or the
use
attribute of xsl:key
(for
which you cannot call a template).
In another XSLT publication, the following recursive implementation
for finding minimums is described as more efficient than one using
xsl:sort
:
<xsl:template name="ckbk:min"> <xsl:param name="nodes" select="/.."/> <xsl:param name="min" select="number('NaN')"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="number($min)"/> </xsl:when> <xsl:otherwise> <xsl:variable name="aNode" select="$nodes[1]"/> <xsl:call-template name="ckbk:min"> <xsl:with-param name="nodes" select="$nodes[position() > 1]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="$aNode < $min or string($min) = 'NaN'"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
This is simply not the case on any XSLT implementation I tested, and
I can’t imagine that it ever could be. The reason
this is much more likely to be slow is because only one node is
removed from consideration at each step. Even with tail recursion,
there will be a lot of copying of nodes. It is easy to be fooled into
thinking that this recursive solution is as efficient as the
iterative-indexing solution you might come up with in C or Java.
However, indexing is not the same as creating a brand-new node set
with the first item removed, as is the case with
$nodes[position() > 1]
.
In many cases when you need to find the minimum of a data set, you end up also needing the maximum. It would be nice to have a function handy that gives two for the price of one. The following will do so for a slight increase in complexity:
<xsl:template name="ckbk:min-max"> <xsl:param name="nodes" select="/.."/> <xsl:param name="nodes-for-max" select="$nodes"/> <xsl:param name="min"/> <xsl:param name="max"/> <xsl:variable name="count1" select="count($nodes)"/> <xsl:variable name="aNode1" select="$nodes[ceiling($count1 div 2)]"/> <xsl:variable name="count2" select="count($nodes-for-max)"/> <xsl:variable name="aNode2" select="$nodes-for-max[ceiling($count2 div 2)]"/> <xsl:choose> <xsl:when test="not($count1) and not($count2)"> <xsl:value-of select="concat(number($min),',',number($max))"/> </xsl:when> <xsl:when test="number($aNode1) != number($aNode1) and number($aNode2) != number($aNode2)"> <xsl:value-of select="concat(number($aNode1),',',number($aNode2))"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="ckbk:min-max"> <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode1))]"/> <xsl:with-param name="nodes-for-max" select="$nodes-for-max[not(. <= number($aNode2))]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="not($min) or $aNode1 < $min"> <xsl:value-of select="$aNode1"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> <xsl:with-param name="max"> <xsl:choose> <xsl:when test="not($max) or $aNode2 > $max"> <xsl:value-of select="$aNode2"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$max"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Testing shows that this approach continues to outperform a sort-based solution.
When considering minima and maxima, only one special case was addressed: when the nodes are literally the numbers you must process. A more general problem involves finding the minimum or maximum of a function of the nodes in a node set. For example, consider the case in which you have a set of positive and negative numbers and you need the minimum of the square of their value. Although hacking the previous implementations to do the squaring is simple, a single reusable implementation is more desirable. Chapter 14 revisits this problem and describes several alternatives for creating generic solutions.
Three types of averages are used by statisticians: the mean (layperson’s average), the median, and the mode.
The mean is trivial—simply sum using Recipe 3.6 and divide by the count.
The median is the number that falls in the middle of the set of numbers when they are sorted. If the count is even, then the mean of the two middle numbers is generally taken:
<xsl:template name="ckbk:median"> <xsl:param name="nodes" select="/.."/> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="middle" select="ceiling($count div 2)"/> <xsl:variable name="even" select="not($count mod 2)"/> <xsl:variable name="m1"> <xsl:for-each select="$nodes"> <xsl:sort data-type="number"/> <xsl:if test="position() = $middle"> <xsl:value-of select=". + ($even * ./following-sibling::*[1])"/> </xsl:if> </xsl:for-each> </xsl:variable> <!-- The median --> <xsl:value-of select="$m1 div ($even + 1)"/> </xsl:template>
Handling the even case relies on the Boolean-to-number conversion
trick used in several other examples in this book. If the number of
nodes is odd, $m1
ends up being equal to the
middle node, and you divide by 1 to get the answer. On the other
hand, if the number of nodes is odd, $m1
ends up
being the sum of the two middle nodes, and you divide by two to get
the answer.
The mode is the most frequently occurring element(s) in a set of elements that need not be numbers. If identical nodes compare with equality on their string values, then the following solution does the trick:
<xsl:template name="ckbk:mode"> <xsl:param name="nodes" select="/.."/> <xsl:param name="max" select="0"/> <xsl:param name="mode" select="/.."/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:copy-of select="$mode"/> </xsl:when> <xsl:otherwise> <xsl:variable name="first" select="$nodes[1]"/> <xsl:variable name="try" select="$nodes[. = $first]"/>
<xsl:variable name="count" select="count($try)"/> <!-- Recurse with nodes not equal to first --> <xsl:call-template name="ckbk:mode"> <xsl:with-param name="nodes" select="$nodes[not(. = $first)]"/>
<!-- If we have found a node that is more frequent then pass the count otherwise pass the old max count --> <xsl:with-param name="max" select="($count > $max) * $count + not($count > $max) * $max"/> <!-- Compute the new mode as ... --> <xsl:with-param name="mode"> <xsl:choose> <!-- the first element in try if we found a new max --> <xsl:when test="$count > $max"> <xsl:copy-of select="$try[1]"/> </xsl:when> <!-- the old mode union the first element in try if we found an equivalent count to current max --> <xsl:when test="$count = $max"> <!-- Caution: you will need to convert $mode to a --> <!-- node set if you are using a version of XSLT --> <!-- that does not convert automatically --> <xsl:copy-of select="$mode | $try[1]"/> </xsl:when> <!-- othewise the old mode stays the same --> <xsl:otherwise> <xsl:copy-of select="$mode"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
If not, then replace the comparisons with an appropriate test. For
example, if equality is contingent on an attribute called
age
, the test would be ./@age =
$first/@age
.
The variance and standard deviation are common statistical measures
of dispersion or the spread in the values about the average. The
easiest way to compute a variance is to obtain three values:
sum
= the sum of the numbers,
sum-sq
= the sum of each number squared, and
count
= the size of the set of numbers. The
variance is then (sum-sq
-
sum2
/
count)
/
count
-
1
.
You can compute them all in one shot with the following
tail-recursive template:
<xsl:template name="ckbk:variance"> <xsl:param name="nodes" select="/.."/> <xsl:param name="sum" select="0"/> <xsl:param name="sum-sq" select="0"/> <xsl:param name="count" select="0"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="($sum-sq - ($sum * $sum) div $count) div ($count - 1)"/> </xsl:when> <xsl:otherwise> <xsl:variable name="value" select="$nodes[1]"/> <xsl:call-template name="ckbk:variance"> <xsl:with-param name="nodes" select="$nodes[position() != 1]"/> <xsl:with-param name="sum" select="$sum + $value"/> <xsl:with-param name="sum-sq" select="$sum-sq + ($value * $value)"/> <xsl:with-param name="count" select="$count + 1"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
You may recognize this template as a variation of
ckbk:sum
that was extended to compute the other
two components that comprise the variance calculation. As such, an
XSLT implementation without support for tail recursion runs into
trouble on large sets. In that case, you must take an alternate
piecewise strategy based on the standard definition of variance:
(mean - xi)2 /
(count - 1). First, compute the mean by using the divide-and-conquer
or batch forms of sum and diving by the count. Then use a
divide-and-conquer or batch template that computes the sum of the
squares of the difference between the mean and each number. Finally,
divide the result by count - 1.
Once you can compute the variance, the standard deviation follows as the square root of the variance. See Recipe 3.5 for square root.
<-- Median --> <xsl:function name="ckbk:median"> <xsl:param name="nodes" as="xs:double*" /> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="middle" select="ceiling($count div 2)"/> <xsl:variable name="sorted" as="xs:double*"> <xsl:perform-sort select="$nodes"> <xsl:sort data-type="number"/> </xsl:perform-sort> </xsl:variable> <-- Mode --> <xsl:function name="ckbk:mode" as="item()*"> <xsl:param name="nodes" as="item()*"/> <!-- First locate the distinct values --> <xsl:variable name="distinct" select="distinct-values($nodes)" as="item()*"/> <!--Get a sequence of occurrence counts of the distinct values --> <xsl:variable name="counts" select="for $i in $distinct return count($nodes[. = $i])" as="xs:integer*"/> <!--Find the max of the counts --> <xsl:variable name="max" select="max($counts)" as="xs:integer?"/> <!-- Now return those values that have the max count --> <xsl:sequence select="$distinct[position() = index-of($counts,$max)]"/> </xsl:function> <-- Variance --> <xsl:function name="ckbk:variance" as="xs:double"> <xsl:param name="nodes" as="xs:double*"/> <xsl:variable name="sum" select="sum($nodes)"/> <xsl:variable name="count" select="count($nodes)"/> <xsl:sequence select="if ($count lt 2) then 0 else sum(for $i in $nodes return $i * $i) - $sum * $sum) div ($count * $count - $count)"/> </xsl:function>
Statistical functions are common tools for analyzing numerical data, and these templates can be a useful addition to your toolkit. However, XSLT was never intended as a tool for statistical analysis. An alternate approach would use XSLT as a front end for converting XML data to comma- or tab-delimited data and then import this data into a spreadsheet or statistics package.
Again you can see that the 2.0 solutions are much simpler. However, even more enlightening is that the 2.0 implementation of mode was far easier to derive, code, and debug than the 1.0 solution. I recall spending at least a few hours coming up with the 1.0 solution while writing the first edition of this book and probably an hour more debugging it. The corresponding 2.0 solution took about 15 minutes once I realized how I could take advantage of the distinct-values function.
If you know the formula for permutations of size r is N! / r! and you know that the formula for combinations is N! / r! * (N-r)!, then you might disregard this example; this book already gave an example for factorial. However, since factorials get very big quickly, you need to be a little crafty to get the best bang for your calculating buck:
<xsl:template name="ckbk:P"> <xsl:param name="n" select="1"/> <xsl:param name="r" select="1"/> <xsl:choose> <xsl:when test="$n < 0 or $r < 0">NaN</xsl:when> <xsl:when test="$n = 0">0</xsl:when> <xsl:otherwise> <xsl:call-template name="prod-range"> <xsl:with-param name="start" select="$r + 1"/> <xsl:with-param name="end" select="$n"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="ckbk:C"> <xsl:param name="n" select="1"/> <xsl:param name="r" select="1"/> <xsl:choose> <xsl:when test="$n < 0 or $r < 0">NaN</xsl:when> <xsl:when test="$n = 0">0</xsl:when> <xsl:otherwise> <xsl:variable name="min" select="($r <= $n - $r) * $r + ($r > $n - $r) * $n - $r"/> <xsl:variable name="max" select="($r >= $n - $r) * $r + ($r < $n - $r) * $n - $r"/> <xsl:variable name="numerator"> <xsl:call-template name="prod-range"> <xsl:with-param name="start" select="$max + 1"/> <xsl:with-param name="end" select="$n"/> </xsl:call-template> </xsl:variable> <xsl:variable name="denominator"> <xsl:call-template name="ckbk:fact"> <xsl:with-param name="number" select="$min"/> </xsl:call-template> </xsl:variable> <xsl:value-of select="$numerator div $denominator"/> </xsl:otherwise> </xsl:choose> </xsl:template>
<xsl:function name="ckbk:P" as="xs:decimal"> <xsl:param name="r" as="xs:integer"/> <xsl:param name="n" as="xs:integer"/> <xsl:sequence select="if ($n eq 0) then 0 else ckbk:prod-range($r + 1, $n)"/> </xsl:function> <xsl:function name="ckbk:C" as="xs:decimal"> <xsl:param name="r" as="xs:integer"/> <xsl:param name="n" as="xs:integer"/> <xsl:variable name="min" select="min( ($r, $n - $r) )" as="xs:integer"/> <xsl:variable name="max" select="max( ($r, $n - $r) )" as="xs:integer"/> <xsl:sequence select="if ($n eq 0) then 0 else ckbk:prod-range($max + 1, $n) div ckbk:factorial($min)"/> </xsl:function>
The solutions are designed to reduce the number of multiplications;
if you divide one factorial by a smaller factorial, then the smaller
factorial effectively cancels out that many multiplications from the
larger. Hence, it is better to implement such functions using
prod-range
(Recipe 3.5)
rather than factorial. The combinatorial is slightly more complex
because you want
to cancel out the large of r and (n -
r).
You want to treat numbers as bit masks even though XSLT does not have integers or associated bitwise operators.
The following solution works on 16-bit numbers, but can easily be extended up to 32:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" id="bittesting"> <!--powers of two--> <xsl:variable name="bit15" select="32768"/> <xsl:variable name="bit14" select="16384"/> <xsl:variable name="bit13" select="8192"/> <xsl:variable name="bit12" select="4096"/> <xsl:variable name="bit11" select="2048"/> <xsl:variable name="bit10" select="1024"/> <xsl:variable name="bit9" select="512"/> <xsl:variable name="bit8" select="256"/> <xsl:variable name="bit7" select="128"/> <xsl:variable name="bit6" select="64"/> <xsl:variable name="bit5" select="32"/> <xsl:variable name="bit4" select="16"/> <xsl:variable name="bit3" select="8"/> <xsl:variable name="bit2" select="4"/> <xsl:variable name="bit1" select="2"/> <xsl:variable name="bit0" select="1"/> <xsl:template name="bitTest"> <xsl:param name="num"/> <xsl:param name="bit" select="$bit0"/> <xsl:choose> <xsl:when test="( $num mod ( $bit * 2 ) ) - ( $num mod ( $bit ) )">1</xsl:when> <xsl:otherwise>0</xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitAnd"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="test" select="$bit15"/> <xsl:variable name="nextN1" select="($num1 >= $test) * ($num1 - $test) + not($num1 >= $test) * $num1"/> <xsl:variable name="nextN2" select="($num2 >= $test) * ($num2 - $test) + not($num2 >= $test) * $num2"/> <xsl:choose> <xsl:when test="$test < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num1 >= $test and $num2 >= $test"> <xsl:call-template name="bitAnd"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result + $test"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitAnd"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitOr"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="test" select="$bit15"/> <xsl:variable name="nextN1" select="($num1 >= $test) * ($num1 - $test) + not($num1 >= $test) * $num1"/> <xsl:variable name="nextN2" select="($num2 >= $test) * ($num2 - $test) + not($num2 >= $test) * $num2"/> <xsl:choose> <xsl:when test="$test < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num1 >= $test or $num2 >= $test"> <xsl:call-template name="bitOr"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result + $test"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitOr"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitXor"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="test" select="$bit15"/> <xsl:variable name="nextN1" select="($num1 >= $test) * ($num1 - $test) + not($num1 >= $test) * $num1"/> <xsl:variable name="nextN2" select="($num2 >= $test) * ($num2 - $test) + not($num2 >= $test) * $num2"/> <xsl:choose> <xsl:when test="$test < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num1 >= $test and not($num2 >= $test) or not($num1 >= $test) and $num2 >= $test"> <xsl:call-template name="bitXor"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result + $test"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitXor"> <xsl:with-param name="num1" select="$nextN1"/> <xsl:with-param name="num2" select="$nextN2"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitNot"> <xsl:param name="num"/> <xsl:param name="result" select="0"/> <xsl:param name="test" select="$bit15"/> <xsl:choose> <xsl:when test="$test < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num >= $test"> <xsl:call-template name="bitNot"> <xsl:with-param name="num" select="$num - $test"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitNot"> <xsl:with-param name="num" select="$num"/> <xsl:with-param name="result" select="$result + $test"/> <xsl:with-param name="test" select="$test div 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
This solution for testing bits ( bitTest template) based on modulo arithmetic was discussed at XML DevCon London in February 2001. We implemented the bitwise logical operations recursively using comparison and subtraction, but you could also use division and mod, as shown in the following code:
<xsl:template name="bitAnd"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="pow2" select="$bit0"/> <xsl:choose> <xsl:when test="$num1 < 1 or $num2 < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num1 mod 2 and $num2 mod 2"> <xsl:call-template name="bitAnd"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result + $pow2"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitAnd"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitOr"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="pow2" select="$bit0"/> <xsl:choose> <xsl:when test="$num1 < 1 and $num2 < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="boolean($num1 mod 2) or boolean($num2 mod 2)"> <xsl:call-template name="bitOr"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result + $pow2"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitOr"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="bitXor"> <xsl:param name="num1"/> <xsl:param name="num2"/> <xsl:param name="result" select="0"/> <xsl:param name="pow2" select="$bit0"/> <xsl:choose> <xsl:when test="$num1 < 1 and $num2 < 1"> <xsl:value-of select="$result"/> </xsl:when> <xsl:when test="$num1 mod 2 + $num2 mod 2 = 1"> <xsl:call-template name="bitXor"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result + $pow2"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="bitXor"> <xsl:with-param name="num1" select="floor($num1 div 2)"/> <xsl:with-param name="num2" select="floor($num2 div 2)"/> <xsl:with-param name="result" select="$result"/> <xsl:with-param name="pow2" select="$pow2 * 2"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
[1] The EXSLT implementations shown are as they existed at the time of this writing. Naturally, these may be improved over time or deprecated as new versions of XPath and XSLT emerge.
[2] The math is beyond the scope of this book. See Knuth, D.E. The Art of Computer Programming, Vol. 1, p. 24 (Addison Wesley, 1973) for details.
[3] Despite the fact that formal proofs of algorithmic correctness are important, I find them tedious and boring, so you will not see any in this book. I prefer intuition and testing my code. Chapter 12 shows how the homoiconic nature (in which the program form is the same as the data form) of XSLT greatly simplifies testing. A real computer scientist would prove everything by induction and not need to test.