A join is the process of considering all pairs of element as being related (i.e., a Cartesian product) and keeping only those pairs that meet the join relationship (usually equality).
To demonstrate, I have adapted the supplier parts database found in Date’s An Introduction to Database Systems (Addison Wesley, 1986) to XML:
<database> <suppliers> <supplier id="S1" name="Smith" status="20" city="London"/> <supplier id="S2" name="Jones" status="10" city="Paris"/> <supplier id="S3" name="Blake" status="30" city="Paris"/> <supplier id="S4" name="Clark" status="20" city="London"/> <supplier id="S5" name="Adams" status="30" city="Athens"/> </suppliers> <parts> <part id="P1" name="Nut" color="Red" weight="12" city="London"/> <part id="P2" name="Bult" color="Green" weight="17" city="Paris"/> <part id="P3" name="Screw" color="Blue" weight="17" city="Rome"/> <part id="P4" name="Screw" color="Red" weight="14" city="London"/> <part id="P5" name="Cam" color="Blue" weight="12" city="Paris"/> <part id="P6" name="Cog" color="Red" weight="19" city="London"/> </parts> <inventory> <invrec sid="S1" pid="P1" qty="300"/> <invrec sid="S1" pid="P2" qty="200"/> <invrec sid="S1" pid="P3" qty="400"/> <invrec sid="S1" pid="P4" qty="200"/> <invrec sid="S1" pid="P5" qty="100"/> <invrec sid="S1" pid="P6" qty="100"/> <invrec sid="S2" pid="P1" qty="300"/> <invrec sid="S2" pid="P2" qty="400"/> <invrec sid="S3" pid="P2" qty="200"/> <invrec sid="S4" pid="P2" qty="200"/> <invrec sid="S4" pid="P4" qty="300"/> <invrec sid="S4" pid="P5" qty="400"/> </inventory> </database>
The join to be performed will answer the question, “Which suppliers and parts are in the same city (co-located)?”
You can use two basic techniques to approach this problem in XSLT.
The first uses nested for-each
loops:
<xsl:template match="/"> <result> <xsl:for-each select="database/suppliers/*"> <xsl:variable name="supplier" select="."/> <xsl:for-each select="/database/parts/*[@city=current( )/@city]"> <colocated> <xsl:copy-of select="$supplier"/> <xsl:copy-of select="."/> </colocated> </xsl:for-each> </xsl:for-each> </result> </xsl:template>
The second approach uses apply-templates
:
<xsl:template match="/"> <result> <xsl:apply-templates select="database/suppliers/supplier" /> </result> </xsl:template> <xsl:template match="supplier"> <xsl:apply-templates select="/database/parts/part[@city = current( )/@city]"> <xsl:with-param name="supplier" select="." /> </xsl:apply-templates> </xsl:template> <xsl:template match="part"> <xsl:param name="supplier" select="/.." /> <colocated> <xsl:copy-of select="$supplier" /> <xsl:copy-of select="." /> </colocated> </xsl:template>
If one of the sets of elements to be joined has a large number of
members, then consider using
xsl:key
to
improve performance:
<xsl:key name="part-city" match="part" use="@city"/> <xsl:template match="/"> <result> <xsl:for-each select="database/suppliers/*"> <xsl:variable name="supplier" select="."/> <xsl:for-each select="key('part-city',$supplier/@city)"> <colocated> <xsl:copy-of select="$supplier"/> <xsl:copy-of select="."/> </colocated> </xsl:for-each> </xsl:for-each> </result> </xsl:template>
Each stylesheet produces the same result:
<result> <colocated> <supplier id="S1" name="Smith" status="20" city="London"/> <part id="P1" name="Nut" color="Red" weight="12" city="London"/> </colocated> <colocated> <supplier id="S1" name="Smith" status="20" city="London"/> <part id="P4" name="Screw" color="Red" weight="14" city="London"/> </colocated> <colocated> <supplier id="S1" name="Smith" status="20" city="London"/> <part id="P6" name="Cog" color="Red" weight="19" city="London"/> </colocated> <colocated> <supplier id="S2" name="Jones" status="10" city="Paris"/> <part id="P2" name="Bult" color="Green" weight="17" city="Paris"/> </colocated> <colocated> <supplier id="S2" name="Jones" status="10" city="Paris"/> <part id="P5" name="Cam" color="Blue" weight="12" city="Paris"/> </colocated> <colocated> <supplier id="S3" name="Blake" status="30" city="Paris"/> <part id="P2" name="Bult" color="Green" weight="17" city="Paris"/> </colocated> <colocated> <supplier id="S3" name="Blake" status="30" city="Paris"/> <part id="P5" name="Cam" color="Blue" weight="12" city="Paris"/> </colocated> <colocated> <supplier id="S4" name="Clark" status="20" city="London"/> <part id="P1" name="Nut" color="Red" weight="12" city="London"/> </colocated> <colocated> <supplier id="S4" name="Clark" status="20" city="London"/> <part id="P4" name="Screw" color="Red" weight="14" city="London"/> </colocated> <colocated> <supplier id="S4" name="Clark" status="20" city="London"/> <part id="P6" name="Cog" color="Red" weight="19" city="London"/> </colocated> </result>
The join you performed is called an equi-join
because the
elements are related by equality. More generally, joins can be formed
using other relations. For example, consider the query,
“Select all combinations of supplier and part
information for which the supplier city follows the part city in
alphabetical order.”
It would be nice if you could simply write the following stylesheet, but XSLT 1.0 does not define relational operations on string types:
<xsl:template match="/"> <result> <xsl:for-each select="database/suppliers/*"> <xsl:variable name="supplier" select="."/> <!— This does not work! —> <xsl:for-each select="/database/parts/*[current( )/@city > @city]"> <colocated> <xsl:copy-of select="$supplier"/> <xsl:copy-of select="."/> </colocated> </xsl:for-each> </xsl:for-each> </result> </xsl:template>
Instead, you must create a table using xsl:sort
that can map city names onto integers that reflect the ordering. Here
you rely on Saxon’s ability to treat variables
containing result-tree fragments as node sets when the version is set
to 1.1. However, you can also use the node-set function of your
particular XSLT 1.0 processor or use an XSLT 2.0 processor:
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:variable name="unique-cities" select="//@city[not(. = ../preceding::*/@city)]"/> <xsl:variable name="city-ordering"> <xsl:for-each select="$unique-cities"> <xsl:sort select="."/> <city name="{.}" order="{position( )}"/> </xsl:for-each> </xsl:variable> <xsl:template match="/"> <result> <xsl:for-each select="database/suppliers/*"> <xsl:variable name="s" select="."/> <xsl:for-each select="/database/parts/*"> <xsl:variable name="p" select="."/> <xsl:if test="$city-ordering/*[@name = $s/@city]/@order > $city-ordering/*[@name = $p/@city]/@order"> <supplier-city-follows-part-city> <xsl:copy-of select="$s"/> <xsl:copy-of select="$p"/> </supplier-city-follows-part-city> </xsl:if> </xsl:for-each> </xsl:for-each> </result> </xsl:template> </xsl:stylesheet>
This query results in the following output:
<result> <supplier-city-follows-part-city> <supplier id="S2" name="Jones" status="10" city="Paris"/> <part id="P1" name="Nut" color="Red" weight="12" city="London"/> </supplier-city-follows-part-city> <supplier-city-follows-part-city> <supplier id="S2" name="Jones" status="10" city="Paris"/> <part id="P4" name="Screw" color="Red" weight="14" city="London"/> </supplier-city-follows-part-city> <supplier-city-follows-part-city> <supplier id="S2" name="Jones" status="10" city="Paris"/> <part id="P6" name="Cog" color="Red" weight="19" city="London"/> </supplier-city-follows-part-city> <supplier-city-follows-part-city> <supplier id="S3" name="Blake" status="30" city="Paris"/> <part id="P1" name="Nut" color="Red" weight="12" city="London"/> </supplier-city-follows-part-city> <supplier-city-follows-part-city> <supplier id="S3" name="Blake" status="30" city="Paris"/> <part id="P4" name="Screw" color="Red" weight="14" city="London"/> </supplier-city-follows-part-city> <supplier-city-follows-part-city> <supplier id="S3" name="Blake" status="30" city="Paris"/> <part id="P6" name="Cog" color="Red" weight="19" city="London"/> </supplier-city-follows-part-city> </result>