The EXSLT functions that perform these operations are
math:min
, math:max
,
math:lowest
, and math:highest
.
min
and
max
find the value of the node with minimum and
maximum numerical value, respectively. EXSLT defines
math:min
as follows:
The minimum value is defined as follows. The node set passed as an argument is sorted in ascending order as it would be by xsl:sort with a data type of number. The minimum is the result of converting the string value of the first node in this sorted list to a number using the number function.
If the node set is empty, or if the result of converting the string values of any of the nodes to a number is NaN, then NaN is returned.
math:max
is defined similarly. EXSLT provides pure
XSLT implementations that are literal implementations of this
definition, as shown in Example 2-9.
Example 2-9. EXSLT min and max implement directly from the definition
<xsl:template name="math:min"> <xsl:param name="nodes" select="/.." /> <xsl:choose> <xsl:when test="not($nodes)">NaN</xsl:when> <xsl:otherwise> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" /> <xsl:if test="position( ) = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="math:max"> <xsl:param name="nodes" select="/.." /> <xsl:choose> <xsl:when test="not($nodes)">NaN</xsl:when> <xsl:otherwise> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" order="descending" /> <xsl:if test="position( ) = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:template>
You may be scratching your head over the default value for the nodes
parameter (select="/
.."
). This
is simply an idiomatic way to initialize nodes to an empty node set
(i.e., the parent of the root is empty by definition).
Although the definitions of math:min
and
math:max
use xsl:sort
, the
implementations need not, so results that are more efficient are
possible provided your XSLT processor supports tail recursion (see
Example 2-10).
Example 2-10. min and max implemented with divide and conquer
<xsl:template name="math:max"> <xsl:param name="nodes" select="/.."/> <xsl:param name="max"/> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/> <xsl:choose> <xsl:when test="not($count)"> <xsl:value-of select="number($max)"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"> <xsl:value-of select="number($aNode)"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="math:max"> <xsl:with-param name="nodes" select="$nodes[not(. <= number($aNode))]"/> <xsl:with-param name="max"> <xsl:choose> <xsl:when test="not($max) or $aNode > $max"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$max"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="math:min"> <xsl:param name="nodes" select="/.."/> <xsl:param name="min"/> <xsl:variable name="count" select="count($nodes)"/> <xsl:variable name="aNode" select="$nodes[ceiling($count div 2)]"/> <xsl:choose> <xsl:when test="not($count)"> <xsl:value-of select="number($min)"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"> <xsl:value-of select="number($aNode)"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="math:min"> <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode))]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="not($min) or $aNode < $min"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Typically, the preceding implementations are faster than the version
using xsl:sort
. In some degenerate cases, they are
likely to be slower. The reason is that efficiency hinges on removing
half the nodes from consideration (on average) at each recursive
step. One can imagine a scenario in which at each pass
aNode
is the minimum node (in the case of
math:max
) or the maximum remaining node (in the
case of math:min
). If this were to occur, each
pass would remove only one node, resulting in poor performance.
Luckily, data tends to come in two configurations: presorted and
random. In both cases, these implementations should hold their own.
While you had to look out for non-numeric data explicitly,
EXSLT’s implementations let
xsl:sort
take care of this. The XSLT standard
mandates that non-numeric data be placed up front by sort when
data-type='number'
.
Don’t be tempted to write
not(number($var))
to test for
NaN
! I often catch myself doing this because it
“sounds” correct. The number
function does not test for a number; instead, it
attempts to convert its argument to a number.
This is not what you want—this test will conclude 0 is not a
number due to the conversion of 0 to false
. The
correct test is number($var) != number($var)
. This
test works because NaN
is never equal to
NaN
, but any number is always equal to itself. Do
not be tempted to shorten this idiom to number($var) != $var
. Doing so works most of the time, but if
$var
is an empty node set, it will fail. If you
prefer, a more direct approach string(number($var)) = 'NaN'
also works.
EXSLT defines math:lowest
as follows.
The
math:lowest
function returns the nodes in the node set whose value is the minimum value for the node set. The minimum value for the node set is the same as the value as calculated by math:min. A node has this minimum value if the result of converting its string value to a number as if by the number function is equal to the minimum value, where the equality comparison is defined as a numerical comparison using the = operator.If any of the nodes in the node set has a non-numeric value, the math:min function will return
NaN
. The definition numeric comparisons entails thatNaN != NaN
. Therefore if any of the nodes in the node set has a non-numeric value,math:lowest
will return an empty node set.
The EXSLT implementation is literally based on this definition and might not be very efficient:
<xsl:template name="math:lowest"> <xsl:param name="nodes" select="/.." /> <xsl:if test="$nodes and not($nodes[number(.) != number(.)])"> <xsl:variable name="min"> <xsl:for-each select="$nodes"> <xsl:sort data-type="number" /> <xsl:if test="position( ) = 1"> <xsl:value-of select="number(.)" /> </xsl:if> </xsl:for-each> </xsl:variable> <xsl:copy-of select="$nodes[. = $min]" /> </xsl:if> </xsl:template>
The xsl:if
test scans all nodes to handle cases
when a non-numeric is present. It then sorts to find the
min
and finally collects all nodes with that
min
. Example 2-11 reuses
math:min
to do the same without needing to sort.
Example 2-11. Lowest implemented by reusing math:min
<xsl:template name="math:lowest"> <xsl:param name="nodes" select="/.."/> <xsl:variable name="min"> <xsl:call-template name="math:min"> <xsl:with-param name="nodes" select="$nodes"/> </xsl:call-template> </xsl:variable> <xsl:choose> <xsl:when test="number($min) = number($min)"> <xsl:copy-of select="$nodes[. = $min]"/> </xsl:when> </xsl:choose> </xsl:template>
Finally, you can implement math:lowest
with only
one pass over the nodes if you are willing to forego reuse of
math:min
(see Example 2-12).
Example 2-12. Lowest implemented without reuse of math:min
<xsl:template name="math:lowest"> <xsl:param name="nodes" select="/.."/> <xsl:param name="lowest" select="/.."/> <xsl:variable name="index" select="ceiling(count($nodes) div 2)"/> <xsl:variable name="aNode" select="$nodes[$index]"/> <xsl:choose> <xsl:when test="not($index)"> <xsl:copy-of select="$lowest"/> </xsl:when> <xsl:when test="number($aNode) != number($aNode)"/> <xsl:otherwise> <xsl:choose> <xsl:when test="not($lowest) or $aNode < $lowest"> <xsl:call-template name="math:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$nodes[. = $aNode]"/> </xsl:call-template> </xsl:when> <xsl:when test="$aNode = $lowest"> <xsl:call-template name="math:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$lowest|$nodes[$index]"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="math:lowest"> <xsl:with-param name="nodes" select="$nodes[not(. >= $aNode)]"/> <xsl:with-param name="lowest" select="$lowest"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template>
Interestingly, this implementation does worse, probably because of
the additional copying that occurs. In performance tests on 10,000
data points using various distributions of data (sorted, reverse
sorted, semirandom, and random), the
math:min
-based implementation beat the
xsl:sort
-based implementation by about 40% on
average (often better). The recursive implementation that did not use
math:min
was 24% slower than the one that did.
The math:highest
definition and implementations
follow directly from inverting the relational logic of
math:lowest
, so I will not discuss them here.
The minimum and maximum values of a node set can be determined by the
simple XPath expressions <xsl:value-of select="$nodes[not($nodes < .)]"/>
and
<xsl:value-of select="$nodes[not($nodes > .)]"/>
. In English, the first says,
“Select all nodes for which there is no node less
than its value,” and the second says,
“Select all nodes for which there is no node greater
than its value.”
Although very simple, these expressions have
O(N2) performance, where N is the number
of nodes. Therefore, unless you are certain that the number of nodes
is small, avoid this shortcut if you can. Occasionally, you are
forced to use it because, for example, you have to find the
min
/max
within the select
attribute of xsl:sort
or the
use
attribute of xsl:key
(for
which you cannot call a template).
In another XSLT publication, the following recursive implementation
for finding minimums is described as more efficient than one using
xsl:sort
:
<xsl:template name="math:min"> <xsl:param name="nodes" select="/.."/> <xsl:param name="min" select="number('NaN')"/> <xsl:choose> <xsl:when test="not($nodes)"> <xsl:value-of select="number($min)"/> </xsl:when> <xsl:otherwise> <xsl:variable name="aNode" select="$nodes[1]"/> <xsl:call-template name="math:min"> <xsl:with-param name="nodes" select="$nodes[position( ) > 1]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="$aNode < $min or string($min) = 'NaN'"> <xsl:value-of select="$aNode"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
This is simply not the case on any XSLT implementation I tested, and
I can’t imagine that it ever could be. The reason
this is much more likely to be slow is because only one node is
removed from consideration at each step. Even with tail recursion,
there will be a lot of copying of nodes. It is easy to be fooled into
thinking that this recursive solution is as efficient as the
iterative-indexing solution you might come up with in C or Java.
However, indexing is not the same as creating a brand-new node set
with the first item removed, as is the case with
$nodes[position( ) > 1]
.
In many cases when you need to find the minimum of a data set, you end up also needing the maximum. It would be nice to have a function handy that gives two for the price of one. The following will do so for a slight increase in complexity:
<xsl:template name="math:min-max"> <xsl:param name="nodes" select="/.."/> <xsl:param name="nodes-for-max" select="$nodes"/> <xsl:param name="min"/> <xsl:param name="max"/> <xsl:variable name="count1" select="count($nodes)"/> <xsl:variable name="aNode1" select="$nodes[ceiling($count1 div 2)]"/> <xsl:variable name="count2" select="count($nodes-for-max)"/> <xsl:variable name="aNode2" select="$nodes-for-max[ceiling($count2 div 2)]"/> <xsl:choose> <xsl:when test="not($count1) and not($count2)"> <xsl:value-of select="concat(number($min),',',number($max))"/> </xsl:when> <xsl:when test="number($aNode1) != number($aNode1) and number($aNode2) != number($aNode2)"> <xsl:value-of select="concat(number($aNode1),',',number($aNode2))"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="math:min-max"> <xsl:with-param name="nodes" select="$nodes[not(. >= number($aNode1))]"/> <xsl:with-param name="nodes-for-max" select="$nodes-for-max[not(. <= number($aNode2))]"/> <xsl:with-param name="min"> <xsl:choose> <xsl:when test="not($min) or $aNode1 < $min"> <xsl:value-of select="$aNode1"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$min"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> <xsl:with-param name="max"> <xsl:choose> <xsl:when test="not($max) or $aNode2 > $max"> <xsl:value-of select="$aNode2"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$max"/> </xsl:otherwise> </xsl:choose> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
Testing shows that this approach continues to outperform a sort-based solution.
When considering minima and maxima, only one special case was addressed: when the nodes are literally the numbers you must process. A more general problem involves finding the minimum or maximum of a function of the nodes in a node set. For example, consider the case in which you have a set of positive and negative numbers and you need the minimum of the square of their value. Although hacking the previous implementations to do the squaring is simple, a single reusable implementation is more desirable. Chapter 14 revisits this problem and describes several alternatives for creating generic solutions.