Correct ordering

Deciding what is the correct order is not as trivial an issue as may first appear. A number of factors, including language, capitalization and relative string lengths all contribute to this decision.

Sorting order will differ for each human language. The Roman alphabet used in English has an obvious order, with 'A' always appearing before 'B', and 'Y' always appearing before 'Z'. But it is far less obvious whether 'é' and ' ' are to be treated as the same characters or different, and if different which should appear first. This problem is even more obvious for different alphabets, such as Greek and Cyrillic. Fortunately, the UNICODE initiative (to create standard ways to store text electronically for all the world languages) covers sorting issues, and the XSLT standard recommends that processors use these recommendations. For this to work, however, it is important that the language of the text be recognizable by the XSLT processor. The Lang attribute is used for this purpose. The value 'EN' represents the English language. The standard states that when this attribute is absent, a default value is derived from the system environment. In practice, this may mean a derivation from the 'xml:lang' attribute in the source document, but when this is also absent, English will typically be assumed.

The simplest sorting algorithm imaginable would sort using the ASCII (American Standard Code for Information Interchange) codes of the characters. This would certainly ensure that words beginning with 'A' (ASCII code 65) would appear before words beginning with 'B' (ASCII code 66), and that terms that start with digits, such as '16-bit' would appear first ('1' has an ASCII value of 49). However, such an algorithm would also place all words beginning with an upper-case letter before the first word containing a lower-case letter, putting 'Zebra' before 'abacus' ('Z' is ASCII code 90, and 'a' is 97), which is rarely desired. XSLT therefore requires a more sophisticated approach that ignores capitalization (though not totally, as the order in which 'a' and 'A' must appear may still be important). By default, 'a' precedes 'A'.

The length of the text string is also significant. A shorter string that has the same characters as the first part of a longer string appears first. For example, 'the' appears before 'theory'.

A space character is deemed to be more significant than any other symbol. This means that 'a z' appears before 'abc', but the shorter-string rule takes precedence and so 'ab' also appears before 'a z'.

This example reflects the various rules described above:

<DIV>
  <P>1</P>
  <P>5</P>
  <P>55a</P>
  <P>a</P>
  <P>ab</P>
  <P>Ab</P>
  <P>a z</P>
  <P>abc</P>
</DIV>

Finally, when two items have exactly the same content, they must be output in their original order. This may seem a trivial point to make; if the strings are identical, then the order in which they are output should be irrelevant. However, it is possible to sort an item using only part of its content as the sort key, and in this circumstance the ordering becomes significant. XSLT can deal with this problem by assigning a secondary sort order for all items that have identical primary sort values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset