Using recursion, you can emulate a reverse search with a search for
the last occurrence of substr
. Using this
technique, you can create a substring-before-last
and a substring-after-last
.
<xsl:template name="substring-before-last"> <xsl:param name="input" /> <xsl:param name="substr" /> <xsl:if test="$substr and contains($input, $substr)"> <xsl:variable name="temp" select="substring-after($input, $substr)" /> <xsl:value-of select="substring-before($input, $substr)" /> <xsl:if test="contains($temp, $substr)"> <xsl:value-of select="$substr" /> <xsl:call-template name="substring-before-last"> <xsl:with-param name="input" select="$temp" /> <xsl:with-param name="substr" select="$substr" /> </xsl:call-template> </xsl:if> </xsl:if> </xsl:template> <xsl:template name="substring-after-last"> <xsl:param name="input"/> <xsl:param name="substr"/> <!-- Extract the string which comes after the first occurence --> <xsl:variable name="temp" select="substring-after($input,$substr)"/> <xsl:choose> <!-- If it still contains the search string the recursively process --> <xsl:when test="$substr and contains($temp,$substr)"> <xsl:call-template name="substring-after-last"> <xsl:with-param name="input" select="$temp"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$temp"/> </xsl:otherwise> </xsl:choose> </xsl:template>
Both XSLT string-searching functions
(substring-before
and
substring-after
)
begin searching at the start of the
string. Sometimes you need to search a string from the end. The
simplest way to do this in XSLT is to apply the built-in search
functions recursively until the last instance of the substring is
found.
There was a nasty “gotcha” in my
first attempt at these templates, which you should keep in mind when
working with recursive templates that search strings. Recall that
contains($anything,'')
will always return
true
! For this reason, I make sure that I also
test the existence of a non-null $substr
value in
the recursive invocations of substring-before-last
and substring-after-last
. Without these checks,
the code will go into an infinite loop for null search input or
overflow the stack on implementations that do not handle tail
recursion.
Another algorithm is divide and conquer.
The basic idea is to split the
string in half. If the search string is in the second half, then you
can discard the first half, thus turning the problem into a problem
half as large. This process repeats recursively. The tricky part is
when the search string is not in the second half because you may have
split the search string between the two halves. Here is a solution
for substring-before-last
:
<xsl:template name="str:substring-before-last"> <xsl:param name="input"/> <xsl:param name="substr"/> <xsl:variable name="mid" select="ceiling(string-length($input) div 2)"/> <xsl:variable name="temp1" select="substring($input,1, $mid)"/> <xsl:variable name="temp2" select="substring($input,$mid +1)"/> <xsl:choose> <xsl:when test="$temp2 and contains($temp2,$substr)"> <!-- search string is in second half so just append first half --> <!-- and recurse on second --> <xsl:value-of select="$temp1"/> <xsl:call-template name="str:substring-before-last"> <xsl:with-param name="input" select="$temp2"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <!--search string is in boundary so a simple substring-before --> <!-- will do the trick--> <xsl:when test="contains(substring($input, $mid - string-length($substr) +1), $substr)"> <xsl:value-of select="substring-before($input,$substr)"/> </xsl:when> <!--search string is in first half so throw away second half--> <xsl:when test="contains($temp1,$substr)"> <xsl:call-template name="str:substring-before-last"> <xsl:with-param name="input" select="$temp1"/> <xsl:with-param name="substr" select="$substr"/> </xsl:call-template> </xsl:when> <!-- No occurences of search string so we are done --> <xsl:otherwise/> </xsl:choose> </xsl:template>
As it turns out, divide and conquer is of little or no advantage unless you search large texts (roughly 4,000 characters). You might have a wrapper template that chooses the appropriate algorithm based on the length or switches from divide and conquer to the simpler algorithm when the subpart becomes small enough.