Chapter 8. XML to XML

To change and to change for the better are two different things.

German proverb

Introduction

One of the beauties of XML is that if you don’t like some XML document, you can change it. Since it is impossible to please everyone, transforming XML to XML is extremely common. However, you will not transform XML only to improve the structure of a poorly designed schema. Sometimes you need to merge disparate XML documents into a single document. At other times, you want to break up a large document into smaller subdocuments. You might also wish to preprocess a document to filter out only the relevant information, without changing its structure, before sending it off for further processing.

A simple but important tool in many XML-to-XML transformations is the identity transform. This tool is a stylesheet that copies an input document to an output document without changing it. This task may seem better suited to the operating systems copy operation, but as the following examples demonstrate, this simple stylesheet can be imported into other stylesheets to yield very common types of transformations with little added coding effort.

Example 8-1 shows the identity stylesheet. I actually prefer calling this stylesheet the copying stylesheet, and I call the techniques that utilize it the overriding copy idiom .

Example 8-1. copy.xslt (continued)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

8.1. Converting Attributes to Elements

Problem

You have a document that encodes information with attributes, and you would like to use child elements instead.

Solution

XSLT 1.0

This problem is tailor-made for what the introduction to this chapter calls the overriding copy idiom. This example transforms attributes to elements globally:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="@*">
  <xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}">
    <xsl:value-of select="."/>
  </xsl:element>  
</xsl:template>
   
</xsl:stylesheet>

The stylesheet works by overriding the copy behavior for attributes. It replaces the behavior with a template that converts an attribute into an element (of the same name) whose content is the attribute’s value. It also assumes that this new element should be in the same namespace as the attribute’s parent. If you prefer not to make assumptions, then use the following code:

<xsl:template match="@*">
  <xsl:variable name="namespace">
    <xsl:choose>
      <!--Use namespsace of attribute, if there is one -->
      <xsl:when test="namespace-uri()">
        <xsl:value-of select="namespace-uri()" />
      </xsl:when>
      <!--Otherwise use parents namespace -->
      <xsl:otherwise>
        <xsl:value-of select="namespace-uri(..)" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:element name="{name()}" namespace="{$namespace}">
    <xsl:value-of select="." />
  </xsl:element>
</xsl:template>

You’ll often want to be selective when transforming attributes to elements (see Example 8-1 to Example 8-3).

Example 8-2. Input
<people which="MeAndMyFriends">
  <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>
  <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/>
  <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/>
  <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/>
</people>
Example 8-3. A stylesheet that transforms person attributes only
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="person/@*">
  <xsl:element name="{local-name(.)}" namespace="{namespace-uri(..)}">
    <xsl:value-of select="."/>
  </xsl:element>  
</xsl:template>
   
</xsl:stylesheet>
Example 8-4. Output
<people which="MeAndMyFriends">
   
   <person>
      <firstname>Sal</firstname>
      <lastname>Mangano</lastname>
      <age>38</age>
      <height>5.75</height>
   </person>
   
   <person>
      <firstname>Mike</firstname>
      <lastname>Palmieri</lastname>
      <age>28</age>
      <height>5.10</height>
   </person>
   
   <person>
      <firstname>Vito</firstname>
      <lastname>Palmieri</lastname>
      <age>38</age>
      <height>6.0</height>
   </person>
   
   <person>
      <firstname>Vinny</firstname>
      <lastname>Mari</lastname>
      <age>37</age>
      <height>5.8</height>
   </person>
   
</people>

XSLT 2.0

In XSLT 2.0, the solution can be streamlined but the recipe remains essentially the same. Here you can replace the awkward xsl:choose with an XPath 2.0 if-expression.

<xsl:template match="@*">
  <xsl:variable name="namespace" 
                select="if (namespace-uri()) then namespace-uri() 
                                             else namespace-uri(..)"/>
  <xsl:element name="{name()}" namespace="{$namespace}">
    <xsl:value-of select="." />
  </xsl:element>
</xsl:template>

Discussion

This section and Recipe 8.2 address the problems that arise when a document designer makes a poor choice between encoding information in attributes versus elements. The attribute-versus-element decision is one of the most controversial aspects of document design.[1] These examples are helpful because they allow you to correct your own or others’ (perceived) mistakes.

8.2. Converting Elements to Attributes

Problem

You have a document that encodes information using child elements, and you would like to use attributes instead.

Solution

As with Recipe 8.1, you can use the overriding copy idiom. However, when transforming elements to attributes, you must selectively determine where the transformation will be applied. This is because the idea of transforming all elements to attributes is nonsensical. The following stylesheet reverses the attribute-to-element transformation we performed in Recipe 8.1:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:for-each select="*">
      <xsl:attribute name="{local-name(.)}">
        <xsl:value-of select="."/>
      </xsl:attribute>  
    </xsl:for-each>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

Discussion

XSLT 1.0

Converting from elements to attributes is not always as straightforward as transforming in the opposite direction. If the elements being converted to attributes have attributes themselves, you must decide what will become of them. In the preceding solution, they would be lost. Another alternative would be to promote them to the new parent:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:for-each select="*">
      <xsl:attribute name="{local-name(.)}">
        <xsl:value-of select="."/>
      </xsl:attribute>  
      <xsl:copy-of select="@*"/>
    </xsl:for-each>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

However, this works only if all the attributes names in question are unique. If this is not the case, you will have to rename attributes, perhaps as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:for-each select="*">
      <xsl:attribute name="{local-name(.)}">
        <xsl:value-of select="."/>
      </xsl:attribute>  
      <xsl:variable name="elem-name" select="local-name(.)"/>
                  <xsl:for-each select="@*">
                  <xsl:attribute name="{concat($elem-name,'-',local-name(.))}">
                  <xsl:value-of select="."/>
                  </xsl:attribute>  
                  </xsl:for-each>
    </xsl:for-each>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

Another complication arises if the sibling elements do not have unique names, because in this case, they would clash upon becoming attributes. Another possible strategy is to create an attribute from an element only if the element does not have attributes or element children of its own, does not repeat in its parent element, and has parents without attributes:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"/>
   
<!-- Match elements that are parents -->
<xsl:template match="*[*]">
  <xsl:choose>
    <!-- Only convert children if this element has no attributes -->
    <!-- of its own -->
    <xsl:when test="not(@*)">
      <xsl:copy>
        <!-- Convert children to attributes if the child has -->
        <!-- no children or attributes and has a unique name -->
        <!-- amoung its siblings -->
        <xsl:for-each select="*">
          <xsl:choose>
            <xsl:when test="not(*) and not(@*) and
                            not(preceding-sibling::*[name() =
                                                     name(current())]) 
                            and 
                            not(following-sibling::*[name() = 
                                                     name(current())])">
              <xsl:attribute name="{local-name(.)}">
                <xsl:value-of select="."/>
              </xsl:attribute>  
            </xsl:when>
            <xsl:otherwise>
              <xsl:apply-templates select="."/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:for-each>
      </xsl:copy>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy>
        <xsl:apply-templates/>
      </xsl:copy>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
   
</xsl:stylesheet>

XSLT 2.0

Here you partially simplify and speed up the 1.0 solution by utilizing xsl:for-each-group. The trick is to use group-by="name()" to determine if there are siblings with identical names. In addition, this solution promotes the attributes of the converted element, to the parent, which you can do as well in the 1.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"/>
   
<!-- Match elements that are parents -->
<xsl:template match="*[*]">
  <xsl:choose>
    <!-- Only convert children if this element has no attributes -->
    <!-- of its own -->
    <xsl:when test="not(@*)">
      <xsl:copy>
        <!-- Convert children to attributes if the child has -->
        <!-- no children or attributes and has a unique name -->
        <!-- amoung its siblings -->
        <xsl:for-each-group select="*" group-by="name()">
          <xsl:choose>
            <xsl:when test="not(*) and count(current-group()) eq 1">
              <xsl:attribute name="{local-name(.)}">
                <xsl:value-of select="."/>
              </xsl:attribute>
              <!-- Copy attributes of child to parent element -->
              <xsl:copy-of select="@*"/>  
            </xsl:when>
            <xsl:otherwise>
              <xsl:apply-templates select="current-group()"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy>
        <xsl:apply-templates select="@*| node()"/>
      </xsl:copy>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
  
</xsl:stylesheet>

Warning

There is a limitation in both the 1.0 and 2.0 stylesheets in that they assume a certain kind of canonical structure where at a certain level, all elements that qualify to be converted to attributes appear before those that do not. Below is an example document that violates this assumption:

<E1>
    <E2>
        <e31>a</e31>
        <e32>b</e32>
        <e33>c</e33>
    </E2>
    <test>a</test>
    <E2>
        <e31>u</e31>
        <e32>v</e32>
        <e33>w</e33>
    </E2>
    <E2>
        <e31>x</e31>
        <e32>y</e32>
        <e33>z</e33>
    </E2>
</E1>

Notice how the test element could, in theory, be converted to an attribute of E1. In fact, the stylesheet will attempt to do so. However, it will fail because of a constraint in XSLT that attributes can only be copied to a node before any other nodes are copied. If you need to deal with messy documents such as this, you can do so in two passes. During the first pass, do not actually convert the elements to attributes but rather copy them as elements tagged with a special attribute indicating they are eligible for conversion. Then on the second pass, convert all tagged elements to attributes of their parent first and then copy all other child elements to the parent unchanged.

8.3. Renaming Elements or Attributes

Problem

You need to rename or re-namespace elements or attributes in an XML document.

Solution

If you need to rename a small number of attributes or elements, use a straightforward version of the overriding copy idiom, as shown in Example 8-4.

Example 8-5. Rename person to individual
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
   
<xsl:template match="person">
  <individual>
    <xsl:apply-templates select="@* | node()"/>
  </individual>
</xsl:template>
   
</xsl:stylesheet>

Or, alternatively, use xsl:element:

...
<xsl:template match="person">
  <xsl:element name="individual">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
...

Renaming attributes is just as straightforward:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
   
<xsl:template match="@lastname">
  <xsl:attribute name="surname">
    <xsl:value-of select="."/>
  </xsl:attribute>
</xsl:template>
   
</xsl:stylesheet>

Sometimes you need to re-namespace rather than rename, as shown in Example 8-5.

Example 8-6. A document using the namespace foo
<foo:someElement xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo">
  <foo:aChild>
    <foo:aGrandChild/>
    <foo:aGrandChild>
    </foo:aGrandChild>
  </foo:aChild>
</foo:someElement>

For each element in the foo namespace, create a new element in the bar namespace, as shown in Example 8-6 and Example 8-7.

Example 8-7. A stylesheet that maps foo to bar
<xsl:stylesheet version="1.0"   
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo"
 xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:strip-space elements="*"/>
   
<xsl:template match="foo:*">
  <xsl:element name="bar:{local-name()}">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>     
   
</xsl:stylesheet>
Example 8-8. Output
<bar:someElement xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar">
   <bar:aChild>
      <bar:aGrandChild/>
      <bar:aGrandChild/>
   </bar:aChild>
</bar:someElement>

Discussion

Naming is an important skill that few software practitioners (including yours truly) have mastered.[2] Hence, you should know how to rename things when you don’t get the names quite right on the first get go.

If many elements or attributes need renaming, then you may want to use a generic table-driven approach, as shown in Example 8-8 to Example 8-10.

Example 8-9. A generic table-driven rename stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ren="http://www.ora.com/namespaces/rename">
   
<xsl:import href="copy.xslt"/>
   
<!--Override in importing stylesheet -->
<xsl:variable name="lookup"  select="/.."/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="*">
  <xsl:choose>
    <xsl:when test="$lookup/ren:element[@from=name(current())]">
      <xsl:element 
           name="{$lookup/ren:element[@from=local-name(current())]/@to}">
        <xsl:apply-templates select="@*"/>
        <xsl:apply-templates/>
      </xsl:element>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-imports/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
   
<xsl:template match="@*">
  <xsl:choose>
    <xsl:when test="$lookup/ren:attribute[@from=name(current())]">
      <xsl:attribute name="{$lookup/ren:attribute[@from=name(current())]/@to}">
        <xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-imports/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
   
</xsl:stylesheet>
Example 8-10. Using the table driven stylesheet
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ren="http://www.ora.com/namespaces/rename">
   
<xsl:import href="TableDrivenRename.xslt"/>
   
<!-- Load the lookup table. We define it locally but it can also
 come from an external file -->     
<xsl:variable name="lookup"  select="document('')/*[ren:*]"/>
   
<!-- Define the renaming rules -->
<ren:element from="person" to="individual"/>
<ren:attribute from="firstname" to="givenname"/>
<ren:attribute from="lastname" to="surname"/>
<ren:attribute from="age" to="yearsOld"/>
   
</xsl:stylesheet>
Example 8-11. Output
<?xml version="1.0" encoding="UTF-8"?>
<people which="MeAndMyFriends">
   
   <individual givenname="Sal" surname="Mangano" yearsOld="38" height="5.75"/>
   
   <individual givenname="Mike" surname="Palmieri" yearsOld="28" height="5.10"/>
   
   <individual givenname="Vito" surname="Palmieri" yearsOld="38" height="6.0"/>
   
   <individual givenname="Vinny" surname="Mari" yearsOld="37" height="5.8"/>
   
</people>

You can still use this approach if some elements or attributes need context-sensitive handling. For example, consider the following document fragment:

<clubs>
  <club name="The 500 Club">
    <members>
       <member name="Joe Smith">
         <position name="president"/>
      </member>
       <member name="Jill McFonald">
         <position name="treasurer"/>
      </member>
       <!-- ... -->
    <members>
  </club>
  <!-- ... -->
<clubs>

Suppose you want to change attribute @name to attribute @title, but only for position elements. If you use the table-driven approach, all elements containing a name attribute will be changed. The solution is to create a template that overrides the default behavior for all elements except position:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ren="http://www.ora.com/namespaces/rename">
   
<xsl:import href="TableDrivenRename.xslt"/>
   
<!-- Load the lookup table. We define it locally but it can also
 come from an external file -->     
<xsl:variable name="lookup"  select="document('')/*[ren:*]"/>
   
<!-- Define the renaming rules -->
<ren:attribute from="name" to="title"/>
   
<!--OVEVRIDE: Simply copy all names that are not attributes of position element -->
<xsl:template match="@name[not(parent::position)]">
     <xsl:copy/>
</xsl:template>
   
</xsl:stylesheet>

When re-namespacing using copy, the old namespace may stubbornly refuse to go away even when it is not needed. Consider the foo document again with an additional element from a doc namespace:

<foo:someElement xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo" xmlns:
doc="http://www.ora.com/XMLCookbook/namespaces/doc">
  <foo:aChild>
    <foo:aGrandChild/>
    <foo:aGrandChild>
      <doc:doc>This documentation should not be removed or altered in any way.
      </doc:doc>
    </foo:aGrandChild>
  </foo:aChild>
</foo:someElement>

If you apply the re-namespacing stylesheet to this document, the foo namespace is carried along with the doc element:

<bar:someElement xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar">
   <bar:aChild>
      <bar:aGrandChild/>
      <bar:aGrandChild>
         <doc:doc xmlns:doc="http://www.ora.com/XMLCookbook/namespaces/doc" 
            xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo">
          This documentation should not be removed or altered in any way.
         </doc:doc>
      </bar:aGrandChild>
   </bar:aChild>
</bar:someElement>

This is because the doc element is processed by xsl:copy. Both xsl:copy and xsl:copy-of always copy all namespaces associated with an element. In XSLT 2.0 both xsl:copy and xsl:copy-of have an optional attribute called copy-namespaces, which you can set to yes or no. Since the doc element is enclosed in elements from the foo namespace, it has a foo namespace node, even though it is not directly visible in the input. To avoid copying this unwanted namespace, use xsl:element to make sure that elements are recreated, not copied:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:foo="http://www.ora.com/XMLCookbook/namespaces/foo"
 xmlns:bar="http://www.ora.com/XMLCookbook/namespaces/bar">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:strip-space elements="*"/>
   
<!-- For all elements create a new element with the same 
name and namespace 
-->
<xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
   
<xsl:template match="foo:*">
  <xsl:element name="bar:{local-name()}">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>     
   
</xsl:stylesheet>

You can even use this technique to strip all namespaces from a document:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:strip-space elements="*"/>
   
<xsl:template match="*">
  <xsl:element name="{local-name()}">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>
   
</xsl:stylesheet>

8.4. Merging Documents with Identical Schema

Problem

You have two or more identically structured documents and you would like to merge them into a single document.

Solution

If the content of the documents is distinct or you are not concerned about duplicates, then the solution is simple:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:output method="xml" indent="yes"/>
   
<xsl:param name="doc2"/> 
   
<xsl:template match="/*">
  <xsl:copy>
    <xsl:copy-of select="* | document($doc2)/*/*"/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

If duplicates exist among input documents but you want the output document to contain unique entries, you can use techniques discussed in Recipe 5.1 for removing duplicates. Consider the following two documents in Example 8-11 and Example 8-12.

Example 8-12. Document 1
<people which="MeAndMyFriends">
     <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>
     <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/>
     <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/>
     <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/>
</people>
Example 8-13. Document 2
<people which="MeAndMyCoWorkers">
     <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>
     <person firstname="Al" lastname="Zehtooney" age="33" height="5.3"/>
     <person firstname="Brad" lastname="York" age="38" height="6.0"/>
     <person firstname="Charles" lastname="Xavier" age="32" height="5.8"/>
</people>

This stylesheet merges and removes the duplicate element using xsl:sort and the exsl:node-set extensions:

<xsl:stylesheet version="1.0"  
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common">
    
    <xsl:import href="exsl.xsl" />

<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:param name="doc2"/> 
<!-- Here we introduce a 'key' attribute to make removing duplicates -->
<!-- easier -->
<xsl:variable name="all">
  <xsl:for-each select="/*/person | document($doc2)/*/person">
    <xsl:sort select="concat(@lastname,@firstname)"/>
    <person key="{concat(@lastname, @firstname)}">
      <xsl:copy-of select="@* | node()" />
    </person>  </xsl:for-each>
</xsl:variable>
   
<xsl:template match="/">
     
<people>
     <xsl:for-each 
         select="exsl:node-set($all)/person[not(@key = 
                          preceding-sibling::person[1]/@key)]">
          <xsl:copy-of select="."/>
     </xsl:for-each>
</people>
     
</xsl:template>

Removing duplicates this way has three drawbacks. First, it alters the order of the elements, which might be undesirable. Second, it requires the use of the node-set extension in XSLT 1.0. Third, it is not generic in the sense that you must rewrite the entire stylesheet for every situation when you want a non-duplicating merge.

One way to address these problems uses xsl:key:

<!-- Stylesheet: merge-simple-using-key.xslt -->
<!-- Import this stylesheet into another that defines the key -->
   
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:merge="http:www.ora.com/XSLTCookbook/mnamespaces/merge">
     
<xsl:param name="doc2"/> 
   
<xsl:template match="/*">
  <!--Copy the outermost element of the source document -->
  <xsl:copy>
    <!-- For each child in the source, determine if it should be 
    copied to the destination based on its existence in the other document.
    -->
    <xsl:for-each select="*">
    
      <!-- Call a template which determines a unique key value for this
           element. It must be defined in the including stylesheet. 
      -->  
      <xsl:variable name="key-value">
        <xsl:call-template name="merge:key-value"/>
      </xsl:variable>
      
      <xsl:variable name="element" select="."/>
      <!--This for-each is simply to change context 
          to the second document 
      -->
      <xsl:for-each select="document($doc2)/*">
        <!-- Use key as a mechanism for testing the presence 
             of the element in the second document. The 
             key should be defined by the including stylesheet
        -->
        <xsl:if test="not(key('merge:key', $key-value))">
          <xsl:copy-of select="$element"/>
        </xsl:if>
      </xsl:for-each>
      
    </xsl:for-each>
   
    <!--Copy all elements in the second document -->
    <xsl:copy-of select="document($doc2)/*/*"/>
    
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

The following stylesheet imports the previous one and defines the key and a template to retrieve the key’s value:

<!-- This stylesheet defines uniqueness of elements in terms of a key. -->
<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:merge="http:www.ora.com/XSLTCookbook/mnamespaces/merge">
   
<xsl:include href="merge-simple-using-key.xslt"/>
   
<!--A person is uniquely defined by the concatenation of 
    last and first names -->
<xsl:key name="merge:key" match="person" 
         use="concat(@lastname,@firstname)"/>
   
<xsl:output method="xml" indent="yes"/>
   
<!-- This template retrives the key value for an element -->
<xsl:template name="merge:key-value">
  <xsl:value-of select="concat(@lastname,@firstname)"/>
</xsl:template>
   
</xsl:stylesheet>

A second way to merge and remove duplicates uses value-based set operations that are discussed in Recipe 9.2. This book presents the solution, but refers the reader to that recipe for more information. Example 8-13 and Example 8-14 include more stylesheets.

Example 8-14. A reusable stylesheet that implements the merge in terms of a union
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset">
   
<xsl:import href="../query/vset.ops.xslt"/>
   
<xsl:output method="xml" indent="yes"/>
   
<xsl:param name="doc2"/> 
   
<xsl:template match="/*">
  <xsl:copy>
    <xsl:call-template name="vset:union">
      <xsl:with-param name="nodes1" select="*"/>
      <xsl:with-param name="nodes2" select="document($doc2)/*/*"/>
    </xsl:call-template>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>
Example 8-15. A stylesheet defining what element equality means
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset">
   
<xsl:import href="merge-using-vset-union.xslt"/>
   
<xsl:template match="person" mode="vset:element-equality">
  <xsl:param name="other"/>
  <xsl:if test="concat(@lastname,@firstname) = 
                concat($other/@lastname,$other/@firstname)">  
    <xsl:value-of select="true()"/>
  </xsl:if>
</xsl:template>
   
</xsl:stylesheet>

The vset:union-based solution involves less new code than the key-based solution; however, for large documents, the xsl:key-based solution is likely to be faster.

Discussion

Merging documents is often necessary when separate individuals or processes produce parts of the document. Merging is also necessary when reconstituting a very large document that was split up to be processed in parallel or because it was too cumbersome to handle as a whole.

The examples in this section address the simple case when just two documents are merged. If an arbitrary number of documents are merged, a mechanism is required to pass a list of documents into the stylesheet. One technique uses a parameter containing all filenames separated by spaces and employs a simple tokenizer (Recipe 2.9) to extract the names. Another technique passes all the filenames in the source document, as shown in Example 8-15 and Example 8-16.

Example 8-16. XML-containing documents to be merged
<mergeDocs>
  <doc path="people1.xml"/>
  <doc path="people2.xml"/>
  <doc path="people3.xml"/>
  <doc path="people4.xml"/>
</mergeDocs>
Example 8-17. A stylesheet for merging the documents (assumes no duplicatesare in the content)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:output method="xml" indent="yes"/>
   
<xsl:variable name="docs" select="/*/doc"/>
   
<xsl:template match="mergeDocs">
     <xsl:apply-templates select="doc[1]"/>
</xsl:template>
   
<!--Match the first doc to create the topmost element -->
<xsl:template match="doc">
  <xsl:variable name="path" select="@path"/>
  <xsl:for-each select="document($path)/*">
    <xsl:copy>
       <!-- Merge children of doc 1 -->
      <xsl:copy-of select="@* | *"/>
       <!--Loop over remaining docs to merge their children -->
      <xsl:for-each select="$docs[position() > 1]">
          <xsl:copy-of select="document(@path)/*/*"/>
      </xsl:for-each>
    </xsl:copy>
  </xsl:for-each> 
</xsl:template>
   
</xsl:stylesheet>

8.5. Merging Documents with Unlike Schema

Problem

You have two or more dissimilar documents, and you would like to merge them into a single document.

Solution

The process of merging dissimilar data can vary from application to application. Therefore, this chapter cannot present a single generic solution. Instead, it anticipates common ways for two dissimilar documents to be brought together and provides solutions for each case.

Incorporate one document as a subpart of a parent document

Incorporating a document as a subpart is the most trivial interpretation of this type of merge. The basic idea is to use xsl:copy-of to copy one document or document part into the appropriate part of a second document. The following example merges two documents into a container document that uses element names in the container as indications of what files to merge:

<MyNoteBook>
  <friends>
  </friends>
  <coworkers>
  </coworkers>
  <projects>
    <project>Replalce mapML with XSLT engine using Xalan C++</project>
    <project>Figure out the meaning of life.</project>
    <project>Figure out where the dryer is hiding all those missing socks</project>
  </projects>  
</MyNoteBook>
   
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
  
  <xsl:template match="friends | coworkers">
    <xsl:copy>
      <xsl:variable name="file" select="concat(local-name(),'.xml')"/>
      <xsl:copy-of select="document($file)/*/*"/>
    </xsl:copy>
  </xsl:template>
...
</xsl:stylesheet>
   
<?xml version="1.0" encoding="UTF-8"?>
<MyNoteBook>
   <friends>
      <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>
      <person firstname="Mike" lastname="Palmieri" age="28" height="5.10"/>
      <person firstname="Vito" lastname="Palmieri" age="38" height="6.0"/>
      <person firstname="Vinny" lastname="Mari" age="37" height="5.8"/>
   </friends>
   <coworkers>
      <person firstname="Sal" lastname="Mangano" age="38" height="5.75"/>
      <person firstname="Al" lastname="Zehtooney" age="33" height="5.3"/>
      <person firstname="Brad" lastname="York" age="38" height="6.0"/>
      <person firstname="Charles" lastname="Xavier" age="32" height="5.8"/>
   </coworkers>
   <projects>
      <project>Replalce mapML with XSLT engine using Xalan C++</project>
      <project>Figure out the meaning of life.</project>
      <project>Figure out where the dryer is hiding all those missing socks
      </project>
   </projects>
</MyNoteBook>

An interesting variation of this case is a document that signals the inline inclusion of another document. The W3C defines a standard way of doing this, called XInclude (http://www.w3.org/TR/xinclude/). You can implement a general-purpose XInclude processor in XSLT by extending copy.xslt:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
   
<xsl:template match="xi:include" xmlns:xi="http://www.w3.org/2001/XInclude">
  <xsl:for-each select="document(@href)"> 
    <xsl:apply-templates/>
  </xsl:for-each>
</xsl:template> 
   
</xsl:stylesheet>

The xsl:for-each only changes the context to the included document. Then use xsl:apply-templates to continue copying the included document’s content.

Weave two documents together

A variation of simple inclusion combines elements that are children of common parent element types. Consider two biologists who have collected information about animals separately. As a first step to building a unified animal database, they may decide to weave the data together at a point of structural commonality.

Biologist1 has this file:

<animals>
  <mammals>
    <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/>
    <animal common="human" species="Homo Sapien" family="Primates"/>
  </mammals>
  <reptiles>
    <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/>
    <animal common="gecko" species="Gekko gecko" order="Squamata"/>
  </reptiles>
  <birds>
    <animal common="sea gull" species="Larus occidentalis" order="Charadriiformes"/>
    <animal common="Black-Backed Woodpecker" species="Picoides arcticus"
    order="Piciformes"/>
  </birds>
</animals>

Biologist2 has this file:

<animals>
  <mammals>
    <animal common="hippo" species="Hippopotamus amphibius" 
    family=" Hippopotamidae"/>
    <animal common="arabian camel" species="Camelus dromedarius" family="Camelidae"/>
  </mammals>
  <insects>
    <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/>
    <animal common="Dung Bettle" species=" Onthophagus australis"
    family="Scarabaeidae"/>
  </insects>
  <amphibians>
    <animal common="Green Sea Turtle" species="Chelonia mydas" family="Cheloniidae"/>
    <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/>
  </amphibians>
</animals>

The files have similar but not identical schema. Both files contain the class Mammalia, but differ in the other organizational levels. At the animal level, one biologist recorded information about the animal’s order, while the other recorded data about the animal’s family. The following stylesheet weaves the documents together at the animal’s class level (the second level in document structure):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
   
  <xsl:param name="doc2file"/>
  
  <xsl:variable name="doc2" select="document($doc2file)"/>
  <xsl:variable name="thisDocsClasses" select="/*/*"/>
  
<xsl:template match="/*">
  <xsl:copy>
    <!-- Merge common sections between source doc and doc2. Also includes
          sections unique to source doc. -->
    <xsl:for-each select="*">
      <xsl:copy>
        <xsl:copy-of select="*"/>
        <xsl:copy-of select="$doc2/*/*[name() = name(current())]/*"/>
      </xsl:copy>
    </xsl:for-each>
   
    <!-- Merge sections unique to doc2 -->
    <xsl:for-each select="$doc2/*/*">
      <xsl:if test="not($thisDocsClasses[name() = name(current())])">
        <xsl:copy-of select="."/>
      </xsl:if>
    </xsl:for-each>
  </xsl:copy>
</xsl:template>
  
</xsl:stylesheet>

Application of the stylesheet results in a document that can be further normalized by hand or through another automated method:

<animals>
   <mammals>
      <animal common="chimpanzee" species="Pan troglodytes" order="Primates"/>
      <animal common="human" species="Homo Sapien" order="Primates"/>
      <animal common="hippo" species="Hippopotamus amphibius" 
      family=" Hippopotamidae"/>
      <animal common="arabian camel" species="Camelus dromedarius" 
      family="Camelidae"/>
   </mammals>
   <reptiles>
      <animal common="boa constrictor" species="Boa constrictor" order="Squamata"/>
      <animal common="gecko" species="Gekko gecko" order="Squamata"/>
   </reptiles>
   <birds>
      <animal common="sea gull" species="Larus occidentalis" 
      order="Charadriiformes"/>
      <animal common="Black-Backed Woodpecker" species="Picoides arcticus" 
      order="Piciformes"/>
   </birds>
   <insects>
      <animal common="Lady Bug" species="Adalia bipunctata" family="Coccinellidae"/>
      <animal common="Dung Bettle" species=" Onthophagus australis" 
      family="Scarabaeidae"/>
   </insects>
   <amphibians>
      <animal common="Green Sea Turtle" species="Chelonia mydas"
      family="Cheloniidae"/>
      <animal common="Green Tree Frog" species=" Hyla cinerea" family="Hylidae "/>
   </amphibians>
</animals>

Join elements from two documents to make new elements

A less-trivial merge occurs when one document is juxtaposed with another document or made children of its elements, based on the elements’ matching characteristic. For example, consider the following merge of documents containing different information about people:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
  <xsl:param name="doc2file"/>
  
  <xsl:variable name="doc2" select="document($doc2file)"/>
   
  <xsl:template match="person">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:element name="{local-name()}">
          <xsl:value-of select="."/>
        </xsl:element>
      </xsl:for-each>
      <xsl:variable name="matching-person" 
          select="$doc2/*/person[@name=concat(current()/@firstname,' ',
                                              current()/@lastname)]"/>
      <xsl:element name="smoker">
        <xsl:value-of select="$matching-person/@smoker"/>
      </xsl:element>
      <xsl:element name="sex">
        <xsl:value-of select="$matching-person/@sex"/>
      </xsl:element>
    </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

This stylesheet performs two tasks. It converts attribute-encoded information in the input stylesheets to elements and merges information from $doc2 that is not present in the source document.

Discussion

Merging XML with disparate schema is less well-defined then merging documents of identical schema. This chapter discusses three interpretations of merging, but other, more complicated types could exist. One possibility is that a merge could bring documents together so that inclusion, weaving, and joining all play a part in the final result. As such, it would be difficult to create a single, generic, XSLT-based merge utility that solves everyone’s particular merge problems. However, the examples in this section provide a useful head start in crafting more ambitious types of merges.

See Also

The examples in this section focused on merging elements in a one-to-one relationship. Recipe 9.5 shows how to join information in disparate XML from the perspective of database queries. These techniques are also applicable to merging in a one-to-many relationship.

8.6. Splitting Documents

Problem

You want to partition elements from a single document into subdocuments.

Solution

XSLT 1.0

For XSLT 1.0, you must rely on a widely available but nonstandard extension that allows multiple output documents.[3] The solution determines the level in the document structure to serialize and determines the name of the resulting file. The following stylesheet splits the salesBySalesPerson.xml from Chapter 4 into separate files for each salesperson. The stylesheet works in Saxon. Saxon allows use of the XSLT 1.1 xsl:document element when the stylesheet version is set to 1.1 and some processors support exslt:document from exslt.org.[4]

If you prefer not to use version 1.1, then you can use the saxon:output extension:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:include href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
   
<xsl:template match="salesperson">
  <xsl:variable name="outFile" 
  select="concat('salesperson.',translate(@name,' ','_'),'.xml')"/>        
  <!-- Non-standard saxon xsl:document! -->
  <xsl:document href="{$outFile}"> 
       <xsl:copy>
              <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
       </xsl:copy>
  </xsl:document>
</xsl:template>
   
<xsl:template match="salesBySalesperson">
  <xsl:apply-templates/>
</xsl:template>
   
</xsl:stylesheet>

Discussion

Although the previous stylesheet is specific to Saxon, the technique works with most XSLT 1.0 processors with only minor changes. Saxon also has the saxon:output extension element (xmlns:saxon = "http://icl.com/saxon“). Xalan uses xalan:redirect (xmlns:xalan = "http://xml.apache.org/xalan“).

An interesting variation of splitting also produces an output file that xincludes the generated subfiles:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
<xsl:import href="copy.xslt"/>
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
     
<xsl:template match="salesperson">
  <xsl:variable name="outFile" 
      select="concat('salesperson.',translate(@name,' ','_'),'.xml')"/>        
  <xsl:document href="{$outFile}">
       <xsl:copy>
              <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
       </xsl:copy>
  </xsl:document>
   
  <xi:include href="{$outFile}" 
                        xmlns:xi="http://www.w3.org/2001/XInclude"/>
  
</xsl:template>   
   
</xsl:stylesheet>

If you worry that your XSLT processor might someday recognize XInclude and mistakenly try to include the same file that was just output, you can replace the xi:include literal result element with xsl:element:

  <xsl:element name="xi:include" 
         xmlns:xi="http://www.w3.org/2001/XInclude">
    <xsl:attribute name="href">
      <xsl:value-of select="$outFile"/>
    </xsl:attribute> 
  </xsl:element>

See Also

Recipe 14.1 contains more examples that use multiple output document extensions.

8.7. Flattening an XML Hierarchy

Problem

You have a document with elements organized in a more deeply nested fashion than you would prefer. You want to flatten the tree.

Solution

If your goal is simply to flatten without regard to the information encoded by the deeper structure, then you need to apply an overriding copy. The overriding template must match the elements you wish to discard and apply templates without copying.

Consider the following input, which segregates people into two categories—salaried and union:

<people>
  <union>
    <person>
      <firstname>Warren</firstname>
      <lastname>Rosenbaum</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
    <person>
      <firstname>Dror</firstname>
      <lastname>Seagull</lastname>
      <age>28</age>
      <height>5.10</height>
    </person>
    <person>
      <firstname>Mike</firstname>
      <lastname>Heavyman</lastname>
      <age>45</age>
      <height>6.0</height>
    </person>
    <person>
      <firstname>Theresa</firstname>
      <lastname>Archul</lastname>
      <age>37</age>
      <height>5.5</height>
    </person>
  </union>
  <salaried>
    <person>
      <firstname>Sal</firstname>
      <lastname>Mangano</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
    <person>
      <firstname>Jane</firstname>
      <lastname>Smith</lastname>
      <age>28</age>
      <height>5.10</height>
    </person>
    <person>
      <firstname>Rick</firstname>
      <lastname>Winters</lastname>
      <age>45</age>
      <height>6.0</height>
    </person>
    <person>
      <firstname>James</firstname>
      <lastname>O'Riely</lastname>
      <age>33</age>
      <height>5.5</height>
    </person>
  </salaried>
</people>

This stylesheet simply discards the extra structure:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
    
  <xsl:template match="people">
    <xsl:copy>
      <!--discard parents of person elements --> 
      <xsl:apply-templates select="*/person" />
    </xsl:copy>
  </xsl:template>
   
</xsl:stylesheet>

Discussion

Having additional structure in a document is generally good because it usually makes the document easier to process with XSLT. However, too much structure bloats the document and makes it harder for people to understand. Humans generally prefer to infer relationships by spatial text organization rather than with extra syntactic baggage.

The following example shows that the extra structure is not superfluous, but encodes additional information. If you want to retain information about the structure while flattening, then you should probably create an attribute or child element to capture the information.

This stylesheet creates an attribute:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" 
  omit-xml-declaration="yes"/>
      
  <!--discard parents of person elements --> 
  <xsl:template match="*[person]">
       <xsl:apply-templates/>
  </xsl:template>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:attribute name="class">
               <xsl:value-of select="local-name(..)"/>
               </xsl:attribute>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

This variation creates an element:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:strip-space elements="*"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
      
  <!--discard parents of person elements --> 
  <xsl:template match="*[person]">
       <xsl:apply-templates/>
  </xsl:template>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:element name="class">
               <xsl:value-of select="local-name(..)"/>
               </xsl:element>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

You can use xsl:strip-space and indent="yes" on the xsl:output element so the output will not contain a whitespace gap, as shown here:

<people>
...
    <person>
      <class>union</class>
      <firstname>Warren</firstname>
      <lastname>Rosenbaum</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
                                      <-- Whitespace gap here!
   
    <person>
      <class>salaried</class>
      <firstname>Sal</firstname>
      <lastname>Mangano</lastname>
      <age>37</age>
      <height>5.75</height>




    </person>
...
 </people>

8.8. Deepening an XML Hierarchy

Problem

You have a poorly designed document that can use extra structure.[5]

Solution

This is the opposite problem from that solved in Recipe 8.7. Here you need to add additional structure to a document, possibly to organize its elements by some additional criteria.

Add structure based on existing data

This type of deepening transformation example undoes the flattening transformation performed in Recipe 8.7:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <xsl:template match="people">
    <union>
       <xsl:apply-templates select="person[@class = 'union']" />
    </union>
    <salaried>
       <xsl:apply-templates select="person[@class = 'salaried']" />
    </salaried>
  </xsl:template>  
   
</xsl:stylesheet>

Add structure to correct a poorly designed document

In a misguided effort to streamline XML, some people attempt to encode information by inserting sibling elements rather than parent elements.[6]

For example, suppose someone distinguished between union and salaried employees in the following way:

<people>
  <class name="union"/>
  <person>
    <firstname>Warren</firstname>
    <lastname>Rosenbaum</lastname>
    <age>37</age>
    <height>5.75</height>
  </person>
...
  <person>
    <firstname>Theresa</firstname>
    <lastname>Archul</lastname>
    <age>37</age>
    <height>5.5</height>
  </person>
  <class name="salaried"/>
  <person>
    <firstname>Sal</firstname>
    <lastname>Mangano</lastname>
    <age>37</age>
    <height>5.75</height>
  </person>
...
  <person>
    <firstname>James</firstname>
    <lastname>O'Riely</lastname>
    <age>33</age>
    <height>5.5</height>
  </person>
</people>

Notice that the elements signifying union and salaried class elements are now empty. The intent is that all following-siblings of a class element belong to that class until another class element is encountered or there are no more siblings. This type of encoding is easy to grasp, but more difficult for an XSLT program to process. To correct this representation, you need to create a stylesheet that computes the set difference between all person elements following the first occurrence of a class element and the person elements following the next occurrence of a class element. XSLT 1.0 does not have an explicit set difference function. You can get essentially the same effect and be more efficient by considering all elements following a class element whose position is less than the position of elements following the next class element:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <!-- The total number of people -->
  <xsl:variable name="num-people" select="count(/*/person)"/>     
  
  <xsl:template match="class">
    <!--The last position we want to consider. -->
    <xsl:variable name="pos" 
             select="$num-people - 
               count(following-sibling::class/following-sibling::person)"/>
    <xsl:element name="{@name}">
      <!-- Copy people that follow this class but whose position is 
           less than or equal to $pos.-->   
      <xsl:copy-of 
              select="following-sibling::person[position() &lt;= $pos]"/>
     </xsl:element> 
  </xsl:template>
   
<!-- Ignore person elements. They were coppied above. -->
<xsl:template match="person"/>
   
</xsl:stylesheet>

More subtly, a key can be used as follows:

<xsl:key name="people" match="person" 
         use="preceding-sibling::class[1]/@name" />
   
<xsl:template match="people">
  <people>
    <xsl:apply-templates select="class" />
  </people>
</xsl:template>
   
<xsl:template match="class">
  <xsl:element name="{@name}">
    <xsl:copy-of select="key('people', @name)" />
  </xsl:element>
</xsl:template>

A step-by-step approach is another alternative:

<xsl:template match="people">
  <people>
    <xsl:apply-templates select="class[1]" />
  </people>
</xsl:template>
   
<xsl:template match="class">
  <xsl:element name="{@name}">
    <xsl:apply-templates select="following-sibling::*[1][self::person]" />
  </xsl:element>
  <xsl:apply-templates select="following-sibling::class[1]" />
</xsl:template>
   
<xsl:template match="person">
  <xsl:copy-of select="." />
  <xsl:apply-templates select="following-sibling::*[1][self::person]" />
</xsl:template>

XSLT 2.0

Add structure based on existing data

Using XSLT 2.0’s xsl:for-each-group allows you to achieve a more generic solution than we did in the 1.0 solution. Although there are 1.0 solutions that are generic (see Discussion), none is quite as simple:

<xsl:template match="people">
  <xsl:for-each-group select="person" 
                      group-by="preceding-sibling::class[1]/@name">
      <xsl:element name="{curent-grouping-key()">
        <xsl:apply-templates select="current-group()" />
      </xsl:element>
    </xsl:for-each>
</xsl:template>
Add structure to correct a poorly designed document

You can exploit xsl:for-each-group with the group-starting-with option to solve this problem:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:import href="copy.xslt"/>

    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:template match="people">
       <xsl:copy>
         <xsl:for-each-group select="*" group-starting-with="class">
        <xsl:element name="{@name}">
           <xsl:apply-templates select="current-group()[not(self::class)]"/>
        </xsl:element>
         </xsl:for-each-group>
       </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

Discussion

Add structure based on existing data

When you added structure based on existing data, you explicitly referred to the criteria that formed the categories of interest (e.g., union and salaried). It would be better if the stylesheet figured out these categories by itself. This makes the stylesheet more generic at the cost of added complexity:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <!-- build a unique list of all classes -->
  <xsl:variable name="classes" 
            select="/*/*/@class[not(. = ../preceding-sibling::*/@class)]"/>  
  <xsl:template match="/*">
    <!-- For each class create an element named after that 
         class that contains elements of that class -->
    <xsl:for-each select="$classes">
      <xsl:variable name="class-name" select="."/>
      <xsl:element name="{$class-name}">
        <xsl:for-each select="/*/*[@class=$class-name]">
          <xsl:copy>
            <xsl:apply-templates/>
          </xsl:copy>
        </xsl:for-each>
      </xsl:element>
   </xsl:for-each>
  </xsl:template>       
   
</xsl:stylesheet>

Although not 100% generic, this stylesheet avoids making assumptions about what kinds of classes exist in the document. The only application-specific information in this stylesheet is the fact that the categories are encoded in an attribute @class and that the attribute occurs in elements that are two levels down from the root.

Add structure to correct a poorly designed document

The solution can be implemented explicitly in terms of set difference. This solution is elegant, but impractical for large documents with many categories. The trick used here for computing set difference is explained in Recipe 9.1:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
     
  <xsl:template match="class">
    <!--All people following this class element -->
    <xsl:variable name="nodes1" select="following-sibling::person"/>
    <!--All people following the next class element -->
    <xsl:variable name="nodes2" 
          select="following-sibling::class/following-sibling::person"/>
    <xsl:element name="{@name}">
      <xsl:copy-of select="$nodes1[count(. | $nodes2) != count($nodes2)]"/>
     </xsl:element> 
  </xsl:template>
   
<xsl:template match="person"/>
   
</xsl:stylesheet>

8.9. Reorganizing an XML Hierarchy

Problem

You need to reorganize the information in an XML document to make some implicit information explicit and some explicit information implicit.

Solution

XSLT 1.0

Again, consider the SalesBySalesPerson.xml document from Chapter 4:

<salesBySalesperson>
  <salesperson name="John Adams" seniority="1">
    <product sku="10000" totalSales="10000.00"/>
    <product sku="20000" totalSales="50000.00"/>
    <product sku="25000" totalSales="920000.00"/>
  </salesperson>
  <salesperson name="Wendy Long" seniority="5">
    <product sku="10000" totalSales="990000.00"/>
    <product sku="20000" totalSales="150000.00"/>
    <product sku="30000" totalSales="5500.00"/>
  </salesperson>
  <salesperson name="Willie B. Aggressive" seniority="10">
    <product sku="10000" totalSales="1110000.00"/>
    <product sku="20000" totalSales="150000.00"/>
    <product sku="25000" totalSales="2920000.00"/>
    <product sku="30000" totalSales="115500.00"/>
    <product sku="70000" totalSales="10000.00"/>
  </salesperson>
  <salesperson name="Arty Outtolunch" seniority="10"/>
</salesBySalesperson>

Which products were sold by which salesperson and how much income the salesperson created for each sold product is explicit. The total income generated by each product is implicit, as are the names of all salespeople who sold any given product.

Therefore, to reorganize this document, you would need to convert to a view that shows sales by product. The following stylesheet accomplishes this transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:key name="sales_key" match="salesperson" use="product/@sku"/>
   
<xsl:variable name="products" select="//product"/>
<xsl:variable name="unique-products" 
    select="$products[not(@sku = preceding::product/@sku)]"/>
   
<xsl:template match="/">
  <salesByProduct>
    <xsl:for-each select="$unique-products">
      <xsl:variable name="sku" select="@sku"/>
      <xsl:copy> 
        <xsl:copy-of select="$sku"/>
        <xsl:attribute name="totalSales">
          <xsl:value-of select="sum($products[@sku=$sku]/@totalSales)"/>
        </xsl:attribute>
        <xsl:for-each select="key('sales_key',$sku)">
          <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="sold">
              <xsl:value-of select="product[@sku=$sku]/@totalSales"/>
            </xsl:attribute>
          </xsl:copy>
        </xsl:for-each>
      </xsl:copy>
    </xsl:for-each>
  </salesByProduct>
</xsl:template>
   
</xsl:stylesheet>

The resulting output is shown here:

<salesByProduct>
   <product sku="10000" totalSales="2110000">
      <salesperson name="John Adams" seniority="1" sold="10000.00"/>
      <salesperson name="Wendy Long" seniority="5" sold="990000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="1110000.00"/>
   </product>
   <product sku="20000" totalSales="350000">
      <salesperson name="John Adams" seniority="1" sold="50000.00"/>
      <salesperson name="Wendy Long" seniority="5" sold="150000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="150000.00"/>
   </product>
   <product sku="25000" totalSales="3840000">
      <salesperson name="John Adams" seniority="1" sold="920000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="2920000.00"/>
   </product>
   <product sku="30000" totalSales="121000">
      <salesperson name="Wendy Long" seniority="5" sold="5500.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="115500.00"/>
   </product>
   <product sku="70000" totalSales="10000">
      <salesperson name="Willie B. Aggressive" seniority="10" sold="10000.00"/>
   </product>
</salesByProduct>$

An alternative solution is based on the Muenchian Method named after Steve Muench. This method uses an xsl:key to facilitate the extraction of unique products. The expression $products[count(.|key('product_key', @sku)[1]) = 1] selects the first product in the particular group, where the grouping is by sku:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:variable name="doc" select="/"/>
   
<xsl:key name="product_key" match="product" use="@sku"/>
<xsl:key name="sales_key" match="salesperson" use="product/@sku"/>
   
<xsl:variable name="products" select="//product"/>
   
<xsl:template match="/">
  <salesByProduct>
    <xsl:for-each select="$products[count(.|key('product_key',@sku)[1]) 
                           = 1]">
      <xsl:variable name="sku" select="@sku"/>
      <xsl:copy> 
        <xsl:copy-of select="$sku"/>
        <xsl:attribute name="totalSales">
          <xsl:value-of select="sum(key('product_key',$sku)/@totalSales)"/>
        </xsl:attribute>
        <xsl:for-each select="key('sales_key',$sku)">
          <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="sold">
              <xsl:value-of select="product[@sku=$sku]/@totalSales"/>
            </xsl:attribute>
          </xsl:copy>
        </xsl:for-each>
      </xsl:copy>
    </xsl:for-each>
  </salesByProduct>
</xsl:template>
   
</xsl:stylesheet>

XSLT 2.0

This problem becomes straightforward in XSLT 2.0 because you can take advantage of xsl:for-each-group:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
 <xsl:template match="/">
  <salesByProduct>
   <!-- Group products by sku -->
   <xsl:for-each-group select="//product" group-by="@sku">
    <xsl:copy>
     <xsl:copy-of select="@sku"/>
     <!-- Use current-group() to total up sales -->
     <xsl:attribute name="totalSales" 
                    select="format-number(sum(current-group()/@totalSales),'#')"/>
     <!-- Copy salesperson elements that contain a child product with sku of current 
          product group -->
     <xsl:for-each select="/*/salesperson">
      <xsl:if test="product[@sku eq current-grouping-key()]">
       <xsl:copy>
        <xsl:copy-of select="@*"/>
       </xsl:copy>
      </xsl:if>
     </xsl:for-each>
    </xsl:copy>
   </xsl:for-each-group>
  </salesByProduct>
 </xsl:template>
</xsl:stylesheet>

Discussion

XSLT 1.0

The solution presents a very application-specific example. This scenario cannot be helped. Presenting a generic reorganizing stylesheet is difficult, if not impossible, because these types of reorganizations vary based on the nature of the particular transformed document.

However, some common idioms are likely to appear in these sorts of reorganizations.

First, since you reorganize the document tree completely, it is unlikely that a solution will rely primarily on matching and applying templates. These sorts of stylesheets are much more likely to use an iterative style. In other words, the solutions will probably rely heavily on xsl:for-each.

Second, recipes in this class almost always initialize global variables that contain elements extracted from deep within the XML structure. In addition, you will probably need to determine a unique subset of these elements. See Recipe 5.3 for a complete discussion of the techniques available for constructing unique sets of elements.

Third, reorganization often involves reaggregating data by using sums, products, or other more complex aggregations. Chapter 3 and Chapter 16 discuss advanced techniques for computing these aggregations.

XSLT 2.0

Clearly, the power of xsl:for-each-group makes complex reorganizations of xml element much easier. The key is to develop a clear understanding of the criteria that constitutes the group and then reorganizing the other elements of the document by their relation to the group.

See Also

Recipe 6.2 contains more examples of using XSLT 2.0’s for-each-group.



[1] The only other stylistic issue I have seen software developers get more passionate about is where to put the curly braces in C-like programming languages (e.g., C++ and Java).

[2] As evidence of my naming ineptitude, my son actually spent two whole days in this world without a name. My wife and I simply could not think of a good one that we both liked. To our credit, we both understood the importance of picking a good name and we think Leonardo agrees.

[3] In XSLT 2.0, this facility is available and uses a new element called xsl:result-document. See Chapter 6 for details.

[4] XSLT 1.1 is no longer an official version. It was abandoned in favor of XSLT 2.0.

[5] It may be well-designed from a particular set of goals, but those goals aren’t yours.

[6] To be fair, not every occurrence of this technique is misguided. Design is a navigation between competing tradeoffs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset