To change and to change for the better are two different things.
German proverb
One of the beauties of XML is that if you don’t like some XML document, you can change it. Since it is impossible to please everyone, transforming XML to XML is extremely common. However, you will not transform XML only to improve the structure of a poorly designed schema. Sometimes you need to merge disparate XML documents into a single document. At other times, you want to break up a large document into smaller subdocuments. You might also wish to preprocess a document to filter out only the relevant information, without changing its structure, before sending it off for further processing.
A simple but important tool in many XML-to-XML transformations is the identity transform. This tool is a stylesheet that copies an input document to an output document without changing it. This task may seem better suited to the operating systems copy operation, but as the following examples demonstrate, this simple stylesheet can be imported into other stylesheets to yield very common types of transformations with little added coding effort.
Example 8-1 shows the identity stylesheet. I actually prefer calling this stylesheet the copying stylesheet, and I call the techniques that utilize it the overriding copy idiom .
You have a document that encodes information with attributes, and you would like to use child elements instead.
This problem is tailor-made for what the introduction to this chapter calls the overriding copy idiom. This example transforms attributes to elements globally:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="@*"> <xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}"> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet>
The stylesheet works by overriding the copy behavior for attributes. It replaces the behavior with a template that converts an attribute into an element (of the same name) whose content is the attribute’s value. It also assumes that this new element should be in the same namespace as the attribute’s parent. If you prefer not to make assumptions, then use the following code:
<xsl:template match="@*"> <xsl:variable name="namespace"> <xsl:choose> <!--Use namespsace of attribute, if there is one --> <xsl:when test="namespace-uri()"> <xsl:value-of select="namespace-uri()" /> </xsl:when> <!--Otherwise use parents namespace --> <xsl:otherwise> <xsl:value-of select="namespace-uri(..)" /> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:element name="{name()}" namespace="{$namespace}"> <xsl:value-of select="." /> </xsl:element> </xsl:template>
You’ll often want to be selective when transforming attributes to elements (see Example 8-1 to Example 8-3).
<people which="MeAndMyFriends"> <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/> <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/> <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/> <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/> </people>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="person/@*"> <xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}"> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet>
<people which="MeAndMyFriends"> <person> <firstname>Sal</firstname> <lastname>Mangano</lastname> <age>38</age> <height>5.75</height> </person> <person> <firstname>Mike</firstname> <lastname>Palmieri</lastname> <age>28</age> <height>5.10</height> </person> <person> <firstname>Vito</firstname> <lastname>Palmieri</lastname> <age>38</age> <height>6.0</height> </person> <person> <firstname>Vinny</firstname> <lastname>Mari</lastname> <age>37</age> <height>5.8</height> </person> </people>
In XSLT 2.0, the solution can be streamlined but the recipe remains
essentially the same. Here you can replace the awkward
xsl:choose
with an XPath 2.0 if-expression.
<xsl:template match="@*"> <xsl:variable name="namespace" select="if (namespace-uri()) then namespace-uri() else namespace-uri(..)"/> <xsl:element name="{name()}" namespace="{$namespace}"> <xsl:value-of select="." /> </xsl:element> </xsl:template>
This section and Recipe 8.2 address the problems that arise when a document designer makes a poor choice between encoding information in attributes versus elements. The attribute-versus-element decision is one of the most controversial aspects of document design.[1] These examples are helpful because they allow you to correct your own or others’ (perceived) mistakes.
You have a document that encodes information using child elements, and you would like to use attributes instead.
As with Recipe 8.1, you can use the overriding copy idiom. However, when transforming elements to attributes, you must selectively determine where the transformation will be applied. This is because the idea of transforming all elements to attributes is nonsensical. The following stylesheet reverses the attribute-to-element transformation we performed in Recipe 8.1:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8"/> <xsl:template match="person"> <xsl:copy> <xsl:for-each select="*"> <xsl:attribute name="{local-name(.)}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> </xsl:copy> </xsl:template> </xsl:stylesheet>
Converting from elements to attributes is not always as straightforward as transforming in the opposite direction. If the elements being converted to attributes have attributes themselves, you must decide what will become of them. In the preceding solution, they would be lost. Another alternative would be to promote them to the new parent:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="copy.xslt"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
<xsl:template match="person">
<xsl:copy>
<xsl:for-each select="*">
<xsl:attribute name="{local-name(.)}">
<xsl:value-of select="."/>
</xsl:attribute>
<xsl:copy-of select="@*"/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
However, this works only if all the attributes names in question are unique. If this is not the case, you will have to rename attributes, perhaps as follows:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8"/> <xsl:template match="person"> <xsl:copy> <xsl:for-each select="*"> <xsl:attribute name="{local-name(.)}"> <xsl:value-of select="."/> </xsl:attribute> <xsl:variable name="elem-name" select="local-name(.)"/> <xsl:for-each select="@*"> <xsl:attribute name="{concat($elem-name,'-',local-name(.))}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> </xsl:for-each> </xsl:copy> </xsl:template> </xsl:stylesheet>
Another complication arises if the sibling elements do not have unique names, because in this case, they would clash upon becoming attributes. Another possible strategy is to create an attribute from an element only if the element does not have attributes or element children of its own, does not repeat in its parent element, and has parents without attributes:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"/> <!-- Match elements that are parents --> <xsl:template match="*[*]"> <xsl:choose> <!-- Only convert children if this element has no attributes --> <!-- of its own --> <xsl:when test="not(@*)"> <xsl:copy> <!-- Convert children to attributes if the child has --> <!-- no children or attributes and has a unique name --> <!-- amoung its siblings --> <xsl:for-each select="*"> <xsl:choose> <xsl:when test="not(*) and not(@*) and not(preceding-sibling::*[name() = name(current())]) and not(following-sibling::*[name() = name(current())])"> <xsl:attribute name="{local-name(.)}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="."/> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:copy> </xsl:when> <xsl:otherwise> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
Here you partially simplify and speed up the 1.0 solution by
utilizing xsl:for-each-group
. The trick is to use
group-by="name()
" to determine if there are
siblings with identical names. In addition, this solution promotes
the attributes of the converted element, to the parent, which you can
do as well in the 1.0 solution:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"/> <!-- Match elements that are parents --> <xsl:template match="*[*]"> <xsl:choose> <!-- Only convert children if this element has no attributes --> <!-- of its own --> <xsl:when test="not(@*)"> <xsl:copy> <!-- Convert children to attributes if the child has --> <!-- no children or attributes and has a unique name --> <!-- amoung its siblings --> <xsl:for-each-group select="*" group-by="name()"> <xsl:choose> <xsl:when test="not(*) and count(current-group()) eq 1"> <xsl:attribute name="{local-name(.)}"> <xsl:value-of select="."/> </xsl:attribute> <!-- Copy attributes of child to parent element --> <xsl:copy-of select="@*"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="current-group()"/> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </xsl:copy> </xsl:when> <xsl:otherwise> <xsl:copy> <xsl:apply-templates select="@*| node()"/> </xsl:copy> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
There is a limitation in both the 1.0 and 2.0 stylesheets in that they assume a certain kind of canonical structure where at a certain level, all elements that qualify to be converted to attributes appear before those that do not. Below is an example document that violates this assumption:
<E1> <E2> <e31>a</e31> <e32>b</e32> <e33>c</e33> </E2> <test>a</test> <E2> <e31>u</e31> <e32>v</e32> <e33>w</e33> </E2> <E2> <e31>x</e31> <e32>y</e32> <e33>z</e33> </E2> </E1>
Notice how the test
element could, in theory, be
converted to an attribute of E1
. In fact, the
stylesheet will attempt to do so. However, it will fail because of a
constraint in XSLT that attributes can only be copied to a node
before any other nodes are copied. If you need to deal with messy
documents such as this, you can do so in two passes. During the first
pass, do not actually convert the elements to attributes but rather
copy them as elements tagged with a special attribute indicating they
are eligible for conversion. Then on the second pass, convert all
tagged elements to attributes of their parent
first and then copy
all other child elements to the parent unchanged.
If you need to rename a small number of attributes or elements, use a straightforward version of the overriding copy idiom, as shown in Example 8-4.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8"/> <xsl:template match="person"> <individual> <xsl:apply-templates select="@* | node()"/> </individual> </xsl:template> </xsl:stylesheet>
Or, alternatively, use
xsl:element
:
... <xsl:template match="person"> <xsl:element name="individual"> <xsl:apply-templates/> </xsl:element> </xsl:template> ...
Renaming attributes is just as straightforward:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8"/> <xsl:template match="@lastname"> <xsl:attribute name="surname"> <xsl:value-of select="."/> </xsl:attribute> </xsl:template> </xsl:stylesheet>
Sometimes you need to re-namespace rather than rename, as shown in Example 8-5.
<foo:someElement xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo"> <foo:aChild> <foo:aGrandChild/> <foo:aGrandChild> </foo:aGrandChild> </foo:aChild> </foo:someElement>
For each element in the foo
namespace, create a
new element in the bar
namespace, as shown in
Example 8-6 and Example 8-7.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo" xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="foo:*"> <xsl:element name="bar:{local-name()}"> <xsl:apply-templates/> </xsl:element> </xsl:template> </xsl:stylesheet>
Naming is an important skill that few software practitioners (including yours truly) have mastered.[2] Hence, you should know how to rename things when you don’t get the names quite right on the first get go.
If many elements or attributes need renaming, then you may want to use a generic table-driven approach, as shown in Example 8-8 to Example 8-10.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ren="http://www.ora.com/namespaces/rename"> <xsl:import href="copy.xslt"/> <!--Override in importing stylesheet --> <xsl:variable name="lookup" select="/.."/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="*"> <xsl:choose> <xsl:when test="$lookup/ren:element[@from=name(current())]"> <xsl:element name="{$lookup/ren:element[@from=local-name(current())]/@to}"> <xsl:apply-templates select="@*"/> <xsl:apply-templates/> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:apply-imports/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="@*"> <xsl:choose> <xsl:when test="$lookup/ren:attribute[@from=name(current())]"> <xsl:attribute name="{$lookup/ren:attribute[@from=name(current())]/@to}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:when> <xsl:otherwise> <xsl:apply-imports/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ren="http://www.ora.com/namespaces/rename"> <xsl:import href="TableDrivenRename.xslt"/> <!-- Load the lookup table. We define it locally but it can also come from an external file --> <xsl:variable name="lookup" select="document('')/*[ren:*]"/> <!-- Define the renaming rules --> <ren:element from="person" to="individual"/> <ren:attribute from="firstname" to="givenname"/> <ren:attribute from="lastname" to="surname"/> <ren:attribute from="age" to="yearsOld"/> </xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?> <people which="MeAndMyFriends"> <individual givenname="Sal" surname="Mangano" yearsOld="38" height="5.75"/> <individual givenname="Mike" surname="Palmieri" yearsOld="28" height="5.10"/> <individual givenname="Vito" surname="Palmieri" yearsOld="38" height="6.0"/> <individual givenname="Vinny" surname="Mari" yearsOld="37" height="5.8"/> </people>
You can still use this approach if some elements or attributes need context-sensitive handling. For example, consider the following document fragment:
<clubs> <club name="The 500 Club"> <members> <member name="Joe Smith"> <position name="president"/> </member> <member name="Jill McFonald"> <position name="treasurer"/> </member> <!-- ... --> <members> </club> <!-- ... --> <clubs>
Suppose you want to change attribute @name
to
attribute @title
, but only for
position
elements. If you use the table-driven
approach, all elements containing a name attribute will be changed.
The solution is to create a template that overrides the default
behavior for all elements except position
:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ren="http://www.ora.com/namespaces/rename"> <xsl:import href="TableDrivenRename.xslt"/> <!-- Load the lookup table. We define it locally but it can also come from an external file --> <xsl:variable name="lookup" select="document('')/*[ren:*]"/> <!-- Define the renaming rules --> <ren:attribute from="name" to="title"/> <!--OVEVRIDE: Simply copy all names that are not attributes of position element --> <xsl:template match="@name[not(parent::position)]"> <xsl:copy/> </xsl:template> </xsl:stylesheet>
When re-namespacing using copy, the old namespace may stubbornly
refuse to go away even when it is not needed. Consider the
foo
document again with an additional element from
a doc
namespace:
<foo:someElement xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo" xmlns: doc="http://www.ora.com/XMLCookbook/namespaces/doc"> <foo:aChild> <foo:aGrandChild/> <foo:aGrandChild> <doc:doc>This documentation should not be removed or altered in any way. </doc:doc> </foo:aGrandChild> </foo:aChild> </foo:someElement>
If you apply the re-namespacing stylesheet to this document, the
foo
namespace is carried along with the
doc
element:
<bar:someElement xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar"> <bar:aChild> <bar:aGrandChild/> <bar:aGrandChild> <doc:doc xmlns:doc="http://www.ora.com/XMLCookbook/namespaces/doc" xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo"> This documentation should not be removed or altered in any way. </doc:doc> </bar:aGrandChild> </bar:aChild> </bar:someElement>
This is because the doc
element is processed by
xsl:copy
. Both xsl:copy
and
xsl:copy-of
always copy all namespaces associated with an element. In XSLT 2.0
both xsl:copy
and xsl:copy-of
have an optional attribute called copy-namespaces
,
which you can set to yes
or no
.
Since the doc
element is enclosed in elements from
the foo
namespace, it has a foo
namespace node, even though it is not directly visible in the input.
To avoid copying this unwanted namespace, use
xsl:element
to make sure that elements are
recreated, not copied:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo" xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <!-- For all elements create a new element with the same name and namespace --> <xsl:template match="*"> <xsl:element name="{name()}" namespace="{namespace-uri()}"> <xsl:apply-templates/> </xsl:element> </xsl:template> <xsl:template match="foo:*"> <xsl:element name="bar:{local-name()}"> <xsl:apply-templates/> </xsl:element> </xsl:template> </xsl:stylesheet>
You can even use this technique to strip all namespaces from a document:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="*"> <xsl:element name="{local-name()}"> <xsl:apply-templates/> </xsl:element> </xsl:template> </xsl:stylesheet>
You have two or more identically structured documents and you would like to merge them into a single document.
If the content of the documents is distinct or you are not concerned about duplicates, then the solution is simple:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:param name="doc2"/> <xsl:template match="/*"> <xsl:copy> <xsl:copy-of select="* | document($doc2)/*/*"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
If duplicates exist among input documents but you want the output document to contain unique entries, you can use techniques discussed in Recipe 5.1 for removing duplicates. Consider the following two documents in Example 8-11 and Example 8-12.
<people which="MeAndMyFriends"> <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/> <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/> <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/> <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/> </people>
<people which="MeAndMyCoWorkers"> <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/> <person firstname="Al" lastname="Zehtooney" age="33" height="5.3"/> <person firstname="Brad" lastname="York" age="38" height="6.0"/> <person firstname="Charles" lastname="Xavier" age="32" height="5.8"/> </people>
This stylesheet merges and removes the duplicate element using
xsl:sort
and the exsl:node-set
extensions:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common"> <xsl:import href="exsl.xsl" /> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:param name="doc2"/> <!-- Here we introduce a 'key' attribute to make removing duplicates --> <!-- easier --> <xsl:variable name="all"> <xsl:for-each select="/*/person | document($doc2)/*/person"> <xsl:sort select="concat(@lastname,@firstname)"/> <person key="{concat(@lastname, @firstname)}"> <xsl:copy-of select="@* | node()" /> </person> </xsl:for-each> </xsl:variable> <xsl:template match="/"> <people> <xsl:for-each select="exsl:node-set($all)/person[not(@key = preceding-sibling::person[1]/@key)]"> <xsl:copy-of select="."/> </xsl:for-each> </people> </xsl:template>
Removing duplicates this way has three drawbacks. First, it alters the order of the elements, which might be undesirable. Second, it requires the use of the node-set extension in XSLT 1.0. Third, it is not generic in the sense that you must rewrite the entire stylesheet for every situation when you want a non-duplicating merge.
One way to address these problems uses
xsl:key
:
<!-- Stylesheet: merge-simple-using-key.xslt --> <!-- Import this stylesheet into another that defines the key --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:merge="http:www.ora.com/XSLTCookbook/mnamespaces/merge"> <xsl:param name="doc2"/> <xsl:template match="/*"> <!--Copy the outermost element of the source document --> <xsl:copy> <!-- For each child in the source, determine if it should be copied to the destination based on its existence in the other document. --> <xsl:for-each select="*"> <!-- Call a template which determines a unique key value for this element. It must be defined in the including stylesheet. --> <xsl:variable name="key-value"> <xsl:call-template name="merge:key-value"/> </xsl:variable> <xsl:variable name="element" select="."/> <!--This for-each is simply to change context to the second document --> <xsl:for-each select="document($doc2)/*"> <!-- Use key as a mechanism for testing the presence of the element in the second document. The key should be defined by the including stylesheet --> <xsl:if test="not(key('merge:key', $key-value))"> <xsl:copy-of select="$element"/> </xsl:if> </xsl:for-each> </xsl:for-each> <!--Copy all elements in the second document --> <xsl:copy-of select="document($doc2)/*/*"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
The following stylesheet imports the previous one and defines the key and a template to retrieve the key’s value:
<!-- This stylesheet defines uniqueness of elements in terms of a key. --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:merge="http:www.ora.com/XSLTCookbook/mnamespaces/merge"> <xsl:include href="merge-simple-using-key.xslt"/> <!--A person is uniquely defined by the concatenation of last and first names --> <xsl:key name="merge:key" match="person" use="concat(@lastname,@firstname)"/> <xsl:output method="xml" indent="yes"/> <!-- This template retrives the key value for an element --> <xsl:template name="merge:key-value"> <xsl:value-of select="concat(@lastname,@firstname)"/> </xsl:template> </xsl:stylesheet>
A second way to merge and remove duplicates uses value-based set operations that are discussed in Recipe 9.2. This book presents the solution, but refers the reader to that recipe for more information. Example 8-13 and Example 8-14 include more stylesheets.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset"> <xsl:import href="../query/vset.ops.xslt"/> <xsl:output method="xml" indent="yes"/> <xsl:param name="doc2"/> <xsl:template match="/*"> <xsl:copy> <xsl:call-template name="vset:union"> <xsl:with-param name="nodes1" select="*"/> <xsl:with-param name="nodes2" select="document($doc2)/*/*"/> </xsl:call-template> </xsl:copy> </xsl:template> </xsl:stylesheet>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset"> <xsl:import href="merge-using-vset-union.xslt"/> <xsl:template match="person" mode="vset:element-equality"> <xsl:param name="other"/> <xsl:if test="concat(@lastname,@firstname) = concat($other/@lastname,$other/@firstname)"> <xsl:value-of select="true()"/> </xsl:if> </xsl:template> </xsl:stylesheet>
The vset:union
-based solution involves less new
code than the key-based solution; however, for large documents, the
xsl:key
-based solution is likely to be faster.
Merging documents is often necessary when separate individuals or processes produce parts of the document. Merging is also necessary when reconstituting a very large document that was split up to be processed in parallel or because it was too cumbersome to handle as a whole.
The examples in this section address the simple case when just two documents are merged. If an arbitrary number of documents are merged, a mechanism is required to pass a list of documents into the stylesheet. One technique uses a parameter containing all filenames separated by spaces and employs a simple tokenizer (Recipe 2.9) to extract the names. Another technique passes all the filenames in the source document, as shown in Example 8-15 and Example 8-16.
<mergeDocs> <doc path="people1.xml"/> <doc path="people2.xml"/> <doc path="people3.xml"/> <doc path="people4.xml"/> </mergeDocs>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:variable name="docs" select="/*/doc"/> <xsl:template match="mergeDocs"> <xsl:apply-templates select="doc[1]"/> </xsl:template> <!--Match the first doc to create the topmost element --> <xsl:template match="doc"> <xsl:variable name="path" select="@path"/> <xsl:for-each select="document($path)/*"> <xsl:copy> <!-- Merge children of doc 1 --> <xsl:copy-of select="@* | *"/> <!--Loop over remaining docs to merge their children --> <xsl:for-each select="$docs[position() > 1]"> <xsl:copy-of select="document(@path)/*/*"/> </xsl:for-each> </xsl:copy> </xsl:for-each> </xsl:template> </xsl:stylesheet>
You have two or more dissimilar documents, and you would like to merge them into a single document.
The process of merging dissimilar data can vary from application to application. Therefore, this chapter cannot present a single generic solution. Instead, it anticipates common ways for two dissimilar documents to be brought together and provides solutions for each case.
Incorporating a
document as a subpart is the most trivial interpretation of this type
of merge. The basic idea is to use xsl:copy-of
to
copy one document or document part into the appropriate part of a
second document. The following example merges two documents into a
container document that uses element names in the container as
indications of what files to merge:
<MyNoteBook> <friends> </friends> <coworkers> </coworkers> <projects> <project>Replalce mapML with XSLT engine using Xalan C++</project> <project>Figure out the meaning of life.</project> <project>Figure out where the dryer is hiding all those missing socks</project> </projects> </MyNoteBook> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="friends | coworkers"> <xsl:copy> <xsl:variable name="file" select="concat(local-name(),'.xml')"/> <xsl:copy-of select="document($file)/*/*"/> </xsl:copy> </xsl:template> ... </xsl:stylesheet> <?xml version="1.0" encoding="UTF-8"?> <MyNoteBook> <friends> <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/> <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/> <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/> <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/> </friends> <coworkers> <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/> <person firstname="Al" lastname="Zehtooney" age="33" height="5.3"/> <person firstname="Brad" lastname="York" age="38" height="6.0"/> <person firstname="Charles" lastname="Xavier" age="32" height="5.8"/> </coworkers> <projects> <project>Replalce mapML with XSLT engine using Xalan C++</project> <project>Figure out the meaning of life.</project> <project>Figure out where the dryer is hiding all those missing socks </project> </projects> </MyNoteBook>
An interesting variation of this case is a document that signals the
inline inclusion of another document. The W3C defines a standard way
of doing this, called XInclude (http://www.w3.org/TR/xinclude/). You can
implement a general-purpose XInclude
processor in XSLT
by extending copy.xslt:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude"> <xsl:for-each select="document(@href)"> <xsl:apply-templates/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
The xsl:for-each
only changes the context to the
included document. Then use xsl:apply-templates
to
continue copying the included document’s content.
A variation of simple inclusion combines elements that are children of common parent element types. Consider two biologists who have collected information about animals separately. As a first step to building a unified animal database, they may decide to weave the data together at a point of structural commonality.
Biologist1 has this file:
<animals> <mammals> <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/> <animal common="human" species="Homo Sapien" family="Primates"/> </mammals> <reptiles> <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/> <animal common="gecko" species="Gekko gecko" order="Squamata"/> </reptiles> <birds> <animal common="sea gull" species="Larus occidentalis" order="Charadriiformes"/> <animal common="Black-Backed Woodpecker" species="Picoides arcticus" order="Piciformes"/> </birds> </animals>
Biologist2 has this file:
<animals> <mammals> <animal common="hippo" species="Hippopotamus amphibius" family=" Hippopotamidae"/> <animal common="arabian camel" species="Camelus dromedarius" family="Camelidae"/> </mammals> <insects> <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/> <animal common="Dung Bettle" species=" Onthophagus australis" family="Scarabaeidae"/> </insects> <amphibians> <animal common="Green Sea Turtle" species="Chelonia mydas" family="Cheloniidae"/> <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/> </amphibians> </animals>
The files have similar but not identical schema. Both files contain the class Mammalia, but differ in the other organizational levels. At the animal level, one biologist recorded information about the animal’s order, while the other recorded data about the animal’s family. The following stylesheet weaves the documents together at the animal’s class level (the second level in document structure):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:param name="doc2file"/> <xsl:variable name="doc2" select="document($doc2file)"/> <xsl:variable name="thisDocsClasses" select="/*/*"/> <xsl:template match="/*"> <xsl:copy> <!-- Merge common sections between source doc and doc2. Also includes sections unique to source doc. --> <xsl:for-each select="*"> <xsl:copy> <xsl:copy-of select="*"/> <xsl:copy-of select="$doc2/*/*[name() = name(current())]/*"/> </xsl:copy> </xsl:for-each> <!-- Merge sections unique to doc2 --> <xsl:for-each select="$doc2/*/*"> <xsl:if test="not($thisDocsClasses[name() = name(current())])"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:copy> </xsl:template> </xsl:stylesheet>
Application of the stylesheet results in a document that can be further normalized by hand or through another automated method:
<animals> <mammals> <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/> <animal common="human" species="Homo Sapien" order="Primates"/> <animal common="hippo" species="Hippopotamus amphibius" family=" Hippopotamidae"/> <animal common="arabian camel" species="Camelus dromedarius" family="Camelidae"/> </mammals> <reptiles> <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/> <animal common="gecko" species="Gekko gecko" order="Squamata"/> </reptiles> <birds> <animal common="sea gull" species="Larus occidentalis" order="Charadriiformes"/> <animal common="Black-Backed Woodpecker" species="Picoides arcticus" order="Piciformes"/> </birds> <insects> <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/> <animal common="Dung Bettle" species=" Onthophagus australis" family="Scarabaeidae"/> </insects> <amphibians> <animal common="Green Sea Turtle" species="Chelonia mydas" family="Cheloniidae"/> <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/> </amphibians> </animals>
A less-trivial merge occurs when one document is juxtaposed with another document or made children of its elements, based on the elements’ matching characteristic. For example, consider the following merge of documents containing different information about people:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:param name="doc2file"/> <xsl:variable name="doc2" select="document($doc2file)"/> <xsl:template match="person"> <xsl:copy> <xsl:for-each select="@*"> <xsl:element name="{local-name()}"> <xsl:value-of select="."/> </xsl:element> </xsl:for-each> <xsl:variable name="matching-person" select="$doc2/*/person[@name=concat(current()/@firstname,' ', current()/@lastname)]"/> <xsl:element name="smoker"> <xsl:value-of select="$matching-person/@smoker"/> </xsl:element> <xsl:element name="sex"> <xsl:value-of select="$matching-person/@sex"/> </xsl:element> </xsl:copy> </xsl:template> </xsl:stylesheet>
This stylesheet performs two tasks. It converts attribute-encoded
information in the input stylesheets to elements and merges
information from $doc2
that is not present in the
source document.
Merging XML with disparate schema is less well-defined then merging documents of identical schema. This chapter discusses three interpretations of merging, but other, more complicated types could exist. One possibility is that a merge could bring documents together so that inclusion, weaving, and joining all play a part in the final result. As such, it would be difficult to create a single, generic, XSLT-based merge utility that solves everyone’s particular merge problems. However, the examples in this section provide a useful head start in crafting more ambitious types of merges.
For XSLT 1.0, you must rely on a widely available but nonstandard
extension that allows multiple output documents.[3] The solution determines the level in the
document structure to serialize and determines the name of the
resulting file. The following stylesheet splits the
salesBySalesPerson.xml from Chapter 4 into separate files for each salesperson.
The stylesheet works in Saxon. Saxon allows use of the XSLT 1.1
xsl:document
element when the stylesheet version
is set to 1.1 and some processors support
exslt:document
from exslt.org.[4]
If you prefer not to use version 1.1, then you can use the
saxon:output
extension:
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:include href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="salesperson"> <xsl:variable name="outFile" select="concat('salesperson.',translate(@name,' ','_'),'.xml')"/> <!-- Non-standard saxon xsl:document! --> <xsl:document href="{$outFile}"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:document> </xsl:template> <xsl:template match="salesBySalesperson"> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet>
Although the previous stylesheet is specific to Saxon, the technique
works with most XSLT 1.0 processors with only minor changes. Saxon
also has the saxon:output
extension element
(xmlns:saxon = "http://icl.com/saxon
“). Xalan uses
xalan:redirect
(xmlns:xalan =
"http://xml.apache.org/xalan
“).
An interesting variation of splitting also produces an output file
that xinclude
s the generated subfiles:
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="salesperson"> <xsl:variable name="outFile" select="concat('salesperson.',translate(@name,' ','_'),'.xml')"/> <xsl:document href="{$outFile}"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:document> <xi:include href="{$outFile}" xmlns:xi="http://www.w3.org/2001/XInclude"/> </xsl:template> </xsl:stylesheet>
If you worry that your XSLT processor might someday recognize
XInclude
and mistakenly try to include the same
file that was just output, you can replace the
xi:include
literal result element with
xsl:element
:
<xsl:element name="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude"> <xsl:attribute name="href"> <xsl:value-of select="$outFile"/> </xsl:attribute> </xsl:element>
Recipe 14.1 contains more examples that use multiple output document extensions.
You have a document with elements organized in a more deeply nested fashion than you would prefer. You want to flatten the tree.
If your goal is simply to flatten without regard to the information encoded by the deeper structure, then you need to apply an overriding copy. The overriding template must match the elements you wish to discard and apply templates without copying.
Consider the following input, which segregates people into two categories—salaried and union:
<people> <union> <person> <firstname>Warren</firstname> <lastname>Rosenbaum</lastname> <age>37</age> <height>5.75</height> </person> <person> <firstname>Dror</firstname> <lastname>Seagull</lastname> <age>28</age> <height>5.10</height> </person> <person> <firstname>Mike</firstname> <lastname>Heavyman</lastname> <age>45</age> <height>6.0</height> </person> <person> <firstname>Theresa</firstname> <lastname>Archul</lastname> <age>37</age> <height>5.5</height> </person> </union> <salaried> <person> <firstname>Sal</firstname> <lastname>Mangano</lastname> <age>37</age> <height>5.75</height> </person> <person> <firstname>Jane</firstname> <lastname>Smith</lastname> <age>28</age> <height>5.10</height> </person> <person> <firstname>Rick</firstname> <lastname>Winters</lastname> <age>45</age> <height>6.0</height> </person> <person> <firstname>James</firstname> <lastname>O'Riely</lastname> <age>33</age> <height>5.5</height> </person> </salaried> </people>
This stylesheet simply discards the extra structure:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8"/> <xsl:template match="people"> <xsl:copy> <!--discard parents of person elements --> <xsl:apply-templates select="*/person" /> </xsl:copy> </xsl:template> </xsl:stylesheet>
Having additional structure in a document is generally good because it usually makes the document easier to process with XSLT. However, too much structure bloats the document and makes it harder for people to understand. Humans generally prefer to infer relationships by spatial text organization rather than with extra syntactic baggage.
The following example shows that the extra structure is not superfluous, but encodes additional information. If you want to retain information about the structure while flattening, then you should probably create an attribute or child element to capture the information.
This stylesheet creates an attribute:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="yes"/> <!--discard parents of person elements --> <xsl:template match="*[person]"> <xsl:apply-templates/> </xsl:template> <xsl:template match="person"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:attribute name="class"> <xsl:value-of select="local-name(..)"/> </xsl:attribute> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
This variation creates an element:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:strip-space elements="*"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" /> <!--discard parents of person elements --> <xsl:template match="*[person]"> <xsl:apply-templates/> </xsl:template> <xsl:template match="person"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:element name="class"> <xsl:value-of select="local-name(..)"/> </xsl:element> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
You can use xsl:strip-space
and
indent="yes
" on the xsl:output
element so the output will not contain a whitespace gap, as shown
here:
<people>
...
<person>
<class>union</class>
<firstname>Warren</firstname>
<lastname>Rosenbaum</lastname>
<age>37</age>
<height>5.75</height>
</person>
<-- Whitespace gap here!
<person>
<class>salaried</class>
<firstname>Sal</firstname>
<lastname>Mangano</lastname>
<age>37</age>
<height>5.75</height>
</person>
...
</people>
You have a poorly designed document that can use extra structure.[5]
This is the opposite problem from that solved in Recipe 8.7. Here you need to add additional structure to a document, possibly to organize its elements by some additional criteria.
This type of deepening transformation example undoes the flattening transformation performed in Recipe 8.7:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="people"> <union> <xsl:apply-templates select="person[@class = 'union']" /> </union> <salaried> <xsl:apply-templates select="person[@class = 'salaried']" /> </salaried> </xsl:template> </xsl:stylesheet>
In a misguided effort to streamline XML, some people attempt to encode information by inserting sibling elements rather than parent elements.[6]
For example, suppose someone distinguished between union and salaried employees in the following way:
<people> <class name="union"/> <person> <firstname>Warren</firstname> <lastname>Rosenbaum</lastname> <age>37</age> <height>5.75</height> </person> ... <person> <firstname>Theresa</firstname> <lastname>Archul</lastname> <age>37</age> <height>5.5</height> </person> <class name="salaried"/> <person> <firstname>Sal</firstname> <lastname>Mangano</lastname> <age>37</age> <height>5.75</height> </person> ... <person> <firstname>James</firstname> <lastname>O'Riely</lastname> <age>33</age> <height>5.5</height> </person> </people>
Notice that the elements signifying union and salaried
class
elements are now empty. The intent is that
all following-siblings of a class
element belong
to that class until another class
element is
encountered or there are no more siblings. This type of encoding is
easy to grasp, but more difficult for an XSLT program to process. To
correct this representation, you need to create a stylesheet that
computes the set difference between all person elements following the
first occurrence of a class element and the person elements following
the next occurrence of a class element. XSLT 1.0 does not have an
explicit set difference function. You can get essentially the same
effect and be more efficient by considering all elements following a
class
element whose position is less than the
position of elements following the next class
element:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <!-- The total number of people --> <xsl:variable name="num-people" select="count(/*/person)"/> <xsl:template match="class"> <!--The last position we want to consider. --> <xsl:variable name="pos" select="$num-people - count(following-sibling::class/following-sibling::person)"/> <xsl:element name="{@name}"> <!-- Copy people that follow this class but whose position is less than or equal to $pos.--> <xsl:copy-of select="following-sibling::person[position() <= $pos]"/> </xsl:element> </xsl:template> <!-- Ignore person elements. They were coppied above. --> <xsl:template match="person"/> </xsl:stylesheet>
More subtly, a key can be used as follows:
<xsl:key name="people" match="person" use="preceding-sibling::class[1]/@name" /> <xsl:template match="people"> <people> <xsl:apply-templates select="class" /> </people> </xsl:template> <xsl:template match="class"> <xsl:element name="{@name}"> <xsl:copy-of select="key('people', @name)" /> </xsl:element> </xsl:template>
A step-by-step approach is another alternative:
<xsl:template match="people"> <people> <xsl:apply-templates select="class[1]" /> </people> </xsl:template> <xsl:template match="class"> <xsl:element name="{@name}"> <xsl:apply-templates select="following-sibling::*[1][self::person]" /> </xsl:element> <xsl:apply-templates select="following-sibling::class[1]" /> </xsl:template> <xsl:template match="person"> <xsl:copy-of select="." /> <xsl:apply-templates select="following-sibling::*[1][self::person]" /> </xsl:template>
Using XSLT 2.0’s
xsl:for-each-group
allows you to achieve a more
generic solution than we did in the 1.0 solution. Although there are
1.0 solutions that are generic (see Discussion), none is quite as
simple:
<xsl:template match="people"> <xsl:for-each-group select="person" group-by="preceding-sibling::class[1]/@name"> <xsl:element name="{curent-grouping-key()"> <xsl:apply-templates select="current-group()" /> </xsl:element> </xsl:for-each> </xsl:template>
You can exploit xsl:for-each-group
with the
group-starting-with
option to solve this problem:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="people"> <xsl:copy> <xsl:for-each-group select="*" group-starting-with="class"> <xsl:element name="{@name}"> <xsl:apply-templates select="current-group()[not(self::class)]"/> </xsl:element> </xsl:for-each-group> </xsl:copy> </xsl:template> </xsl:stylesheet>
When you added structure based on existing data, you explicitly referred to the criteria that formed the categories of interest (e.g., union and salaried). It would be better if the stylesheet figured out these categories by itself. This makes the stylesheet more generic at the cost of added complexity:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <!-- build a unique list of all classes --> <xsl:variable name="classes" select="/*/*/@class[not(. = ../preceding-sibling::*/@class)]"/> <xsl:template match="/*"> <!-- For each class create an element named after that class that contains elements of that class --> <xsl:for-each select="$classes"> <xsl:variable name="class-name" select="."/> <xsl:element name="{$class-name}"> <xsl:for-each select="/*/*[@class=$class-name]"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:for-each> </xsl:element> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Although not 100% generic, this stylesheet avoids making assumptions
about what kinds of classes exist in the document. The only
application-specific information in this stylesheet is the fact that
the categories are encoded in an attribute @class
and that the attribute occurs in elements that are two levels down
from the root.
The solution can be implemented explicitly in terms of set difference. This solution is elegant, but impractical for large documents with many categories. The trick used here for computing set difference is explained in Recipe 9.1:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="copy.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="class"> <!--All people following this class element --> <xsl:variable name="nodes1" select="following-sibling::person"/> <!--All people following the next class element --> <xsl:variable name="nodes2" select="following-sibling::class/following-sibling::person"/> <xsl:element name="{@name}"> <xsl:copy-of select="$nodes1[count(. | $nodes2) != count($nodes2)]"/> </xsl:element> </xsl:template> <xsl:template match="person"/> </xsl:stylesheet>
You need to reorganize the information in an XML document to make some implicit information explicit and some explicit information implicit.
Again, consider the SalesBySalesPerson.xml document from Chapter 4:
<salesBySalesperson> <salesperson name="John Adams" seniority="1"> <product sku="10000" totalSales="10000.00"/> <product sku="20000" totalSales="50000.00"/> <product sku="25000" totalSales="920000.00"/> </salesperson> <salesperson name="Wendy Long" seniority="5"> <product sku="10000" totalSales="990000.00"/> <product sku="20000" totalSales="150000.00"/> <product sku="30000" totalSales="5500.00"/> </salesperson> <salesperson name="Willie B. Aggressive" seniority="10"> <product sku="10000" totalSales="1110000.00"/> <product sku="20000" totalSales="150000.00"/> <product sku="25000" totalSales="2920000.00"/> <product sku="30000" totalSales="115500.00"/> <product sku="70000" totalSales="10000.00"/> </salesperson> <salesperson name="Arty Outtolunch" seniority="10"/> </salesBySalesperson>
Which products were sold by which salesperson and how much income the salesperson created for each sold product is explicit. The total income generated by each product is implicit, as are the names of all salespeople who sold any given product.
Therefore, to reorganize this document, you would need to convert to a view that shows sales by product. The following stylesheet accomplishes this transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:key name="sales_key" match="salesperson" use="product/@sku"/> <xsl:variable name="products" select="//product"/> <xsl:variable name="unique-products" select="$products[not(@sku = preceding::product/@sku)]"/> <xsl:template match="/"> <salesByProduct> <xsl:for-each select="$unique-products"> <xsl:variable name="sku" select="@sku"/> <xsl:copy> <xsl:copy-of select="$sku"/> <xsl:attribute name="totalSales"> <xsl:value-of select="sum($products[@sku=$sku]/@totalSales)"/> </xsl:attribute> <xsl:for-each select="key('sales_key',$sku)"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:attribute name="sold"> <xsl:value-of select="product[@sku=$sku]/@totalSales"/> </xsl:attribute> </xsl:copy> </xsl:for-each> </xsl:copy> </xsl:for-each> </salesByProduct> </xsl:template> </xsl:stylesheet>
The resulting output is shown here:
<salesByProduct> <product sku="10000" totalSales="2110000"> <salesperson name="John Adams" seniority="1" sold="10000.00"/> <salesperson name="Wendy Long" seniority="5" sold="990000.00"/> <salesperson name="Willie B. Aggressive" seniority="10" sold="1110000.00"/> </product> <product sku="20000" totalSales="350000"> <salesperson name="John Adams" seniority="1" sold="50000.00"/> <salesperson name="Wendy Long" seniority="5" sold="150000.00"/> <salesperson name="Willie B. Aggressive" seniority="10" sold="150000.00"/> </product> <product sku="25000" totalSales="3840000"> <salesperson name="John Adams" seniority="1" sold="920000.00"/> <salesperson name="Willie B. Aggressive" seniority="10" sold="2920000.00"/> </product> <product sku="30000" totalSales="121000"> <salesperson name="Wendy Long" seniority="5" sold="5500.00"/> <salesperson name="Willie B. Aggressive" seniority="10" sold="115500.00"/> </product> <product sku="70000" totalSales="10000"> <salesperson name="Willie B. Aggressive" seniority="10" sold="10000.00"/> </product> </salesByProduct>$
An alternative solution is based on the
Muenchian Method named after Steve
Muench. This method uses an xsl:key
to facilitate
the extraction of unique products. The expression
$products[count(.|key('product_key', @sku)[1]) =
1]
selects the first product in the particular group, where
the grouping is by sku:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:variable name="doc" select="/"/> <xsl:key name="product_key" match="product" use="@sku"/> <xsl:key name="sales_key" match="salesperson" use="product/@sku"/> <xsl:variable name="products" select="//product"/> <xsl:template match="/"> <salesByProduct> <xsl:for-each select="$products[count(.|key('product_key',@sku)[1]) = 1]"> <xsl:variable name="sku" select="@sku"/> <xsl:copy> <xsl:copy-of select="$sku"/> <xsl:attribute name="totalSales"> <xsl:value-of select="sum(key('product_key',$sku)/@totalSales)"/> </xsl:attribute> <xsl:for-each select="key('sales_key',$sku)"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:attribute name="sold"> <xsl:value-of select="product[@sku=$sku]/@totalSales"/> </xsl:attribute> </xsl:copy> </xsl:for-each> </xsl:copy> </xsl:for-each> </salesByProduct> </xsl:template> </xsl:stylesheet>
This problem becomes straightforward in XSLT 2.0 because you can take
advantage of xsl:for-each-group
:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <salesByProduct> <!-- Group products by sku --> <xsl:for-each-group select="//product" group-by="@sku"> <xsl:copy> <xsl:copy-of select="@sku"/> <!-- Use current-group() to total up sales --> <xsl:attribute name="totalSales" select="format-number(sum(current-group()/@totalSales),'#')"/> <!-- Copy salesperson elements that contain a child product with sku of current product group --> <xsl:for-each select="/*/salesperson"> <xsl:if test="product[@sku eq current-grouping-key()]"> <xsl:copy> <xsl:copy-of select="@*"/> </xsl:copy> </xsl:if> </xsl:for-each> </xsl:copy> </xsl:for-each-group> </salesByProduct> </xsl:template> </xsl:stylesheet>
The solution presents a very application-specific example. This scenario cannot be helped. Presenting a generic reorganizing stylesheet is difficult, if not impossible, because these types of reorganizations vary based on the nature of the particular transformed document.
However, some common idioms are likely to appear in these sorts of reorganizations.
First, since you reorganize the document tree completely, it is
unlikely that a solution will rely primarily on matching and applying
templates. These sorts of stylesheets are much more likely to use an
iterative style. In other words, the solutions will probably rely
heavily on xsl:for-each
.
Second, recipes in this class almost always initialize global variables that contain elements extracted from deep within the XML structure. In addition, you will probably need to determine a unique subset of these elements. See Recipe 5.3 for a complete discussion of the techniques available for constructing unique sets of elements.
Third, reorganization often involves reaggregating data by using sums, products, or other more complex aggregations. Chapter 3 and Chapter 16 discuss advanced techniques for computing these aggregations.
Clearly, the power of xsl:for-each-group
makes
complex reorganizations of xml element much easier. The key is to
develop a clear understanding of the criteria that constitutes the
group and then reorganizing the other elements of the document by
their relation to the group.
Recipe 6.2 contains more examples
of
using
XSLT 2.0’s for-each-group
.
[1] The only other stylistic issue I have seen software developers get more passionate about is where to put the curly braces in C-like programming languages (e.g., C++ and Java).
[2] As evidence of my naming ineptitude, my son actually spent two whole days in this world without a name. My wife and I simply could not think of a good one that we both liked. To our credit, we both understood the importance of picking a good name and we think Leonardo agrees.
[3] In
XSLT 2.0, this facility is available and uses a new element called
xsl:result-document
. See Chapter 6 for
details.
[4] XSLT 1.1 is no longer an official version. It was abandoned in favor of XSLT 2.0.
[5] It may be well-designed from a particular set of goals, but those goals aren’t yours.
[6] To be fair, not every occurrence of this technique is misguided. Design is a navigation between competing tradeoffs.