Reorganizing an XML Hierarchy

Problem

You need to reorganize the information in an XML document to make some implicit information explicit and some explicit information implicit.

Solution

Again, consider the SalesBySalesPerson.xml document from Chapter 4:

<salesBySalesperson>
  <salesperson name="John Adams" seniority="1">
    <product sku="10000" totalSales="10000.00"/>
    <product sku="20000" totalSales="50000.00"/>
    <product sku="25000" totalSales="920000.00"/>
  </salesperson>
  <salesperson name="Wendy Long" seniority="5">
    <product sku="10000" totalSales="990000.00"/>
    <product sku="20000" totalSales="150000.00"/>
    <product sku="30000" totalSales="5500.00"/>
  </salesperson>
  <salesperson name="Willie B. Aggressive" seniority="10">
    <product sku="10000" totalSales="1110000.00"/>
    <product sku="20000" totalSales="150000.00"/>
    <product sku="25000" totalSales="2920000.00"/>
    <product sku="30000" totalSales="115500.00"/>
    <product sku="70000" totalSales="10000.00"/>
  </salesperson>
  <salesperson name="Arty Outtolunch" seniority="10"/>
</salesBySalesperson>

Which products were sold by which salesperson and how much income the salesperson created for each sold product is explicit. The total income generated by each product is implicit, as are the names of all salespeople who sold any given product.

Therefore, to reorganize this document, you would need to convert to a view that shows sales by product. The following stylesheet accomplishes this transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:key name="sales_key" match="salesperson" use="product/@sku"/>
   
<xsl:variable name="products" select="//product"/>
<xsl:variable name="unique-products" 
    select="$products[not(@sku = preceding::product/@sku)]"/>
   
<xsl:template match="/">
  <salesByProduct>
    <xsl:for-each select="$unique-products">
      <xsl:variable name="sku" select="@sku"/>
      <xsl:copy> 
        <xsl:copy-of select="$sku"/>
        <xsl:attribute name="totalSales">
          <xsl:value-of select="sum($products[@sku=$sku]/@totalSales)"/>
        </xsl:attribute>
        <xsl:for-each select="key('sales_key',$sku)">
          <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="sold">
              <xsl:value-of select="product[@sku=$sku]/@totalSales"/>
            </xsl:attribute>
          </xsl:copy>
        </xsl:for-each>
      </xsl:copy>
    </xsl:for-each>
  </salesByProduct>
</xsl:template>
   
</xsl:stylesheet>

The resulting output is shown here:

<salesByProduct>
   <product sku="10000" totalSales="2110000">
      <salesperson name="John Adams" seniority="1" sold="10000.00"/>
      <salesperson name="Wendy Long" seniority="5" sold="990000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="1110000.00"/>
   </product>
   <product sku="20000" totalSales="350000">
      <salesperson name="John Adams" seniority="1" sold="50000.00"/>
      <salesperson name="Wendy Long" seniority="5" sold="150000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="150000.00"/>
   </product>
   <product sku="25000" totalSales="3840000">
      <salesperson name="John Adams" seniority="1" sold="920000.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="2920000.00"/>
   </product>
   <product sku="30000" totalSales="121000">
      <salesperson name="Wendy Long" seniority="5" sold="5500.00"/>
      <salesperson name="Willie B. Aggressive" seniority="10" sold="115500.00"/>
   </product>
   <product sku="70000" totalSales="10000">
      <salesperson name="Willie B. Aggressive" seniority="10" sold="10000.00"/>
   </product>
</salesByProduct>$

An alternative solution is based on the Muenchian Method named after Steve Muench. This method uses an xsl:key to facilitate the extraction of unique products. The expression $products[count(.|key('product_key', @sku)[1]) = 1] selects the first product in the particular group, where the grouping is by sku:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:variable name="doc" select="/"/>
   
<xsl:key name="product_key" match="product" use="@sku"/>
<xsl:key name="sales_key" match="salesperson" use="product/@sku"/>
   
<xsl:variable name="products" select="//product"/>
   
<xsl:template match="/">
  <salesByProduct>
    <xsl:for-each select="$products[count(.|key('product_key',@sku)[1]) 
                           = 1]">
      <xsl:variable name="sku" select="@sku"/>
      <xsl:copy> 
        <xsl:copy-of select="$sku"/>
        <xsl:attribute name="totalSales">
          <xsl:value-of select="sum(key('product_key',$sku)/@totalSales)"/>
        </xsl:attribute>
        <xsl:for-each select="key('sales_key',$sku)">
          <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="sold">
              <xsl:value-of select="product[@sku=$sku]/@totalSales"/>
            </xsl:attribute>
          </xsl:copy>
        </xsl:for-each>
      </xsl:copy>
    </xsl:for-each>
  </salesByProduct>
</xsl:template>
   
</xsl:stylesheet>

Discussion

The solution presents a very application-specific example. This scenario cannot be helped. Presenting a generic reorganizing stylesheet is difficult, if not impossible, because these types of reorganizations vary based on the nature of the particular transformed document.

However, some common idioms are likely to appear in these sorts of reorganizations.

First, since you reorganize the document tree completely, it is unlikely that a solution will rely primarily on matching and applying templates. These sorts of stylesheets are much more likely to use an iterative style. In other words, the solutions will probably rely heavily on xsl:for-each.

Second, recipes in this class almost always initialize global variables that contain elements extracted from deep within the XML structure. In addition, you will probably need to determine a unique subset of these elements. See Recipe 4.3 for a complete discussion of the techniques available for constructing unique sets of elements.

Third, reorganization often involves reaggregating data by using sums, products, or other more complex aggregations. Chapter 2 and Chapter 14 discuss advanced techniques for computing these aggregations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset