Pseudo HTML output

The HTML data format used to tag Web pages for presentation in browsers is, like XML, based on SGML. For this reason, there is enough similarity between HTML documents and well-formed XML documents that is is possible to output an XML file that will be accepted as an HTML document by existing Web browsers. The format of start-tags, end-tags, attributes and entity references are essentially the same. For example, the HTML tag '<P>' (Paragraph) can also be considered a valid XML tag. The following example replaces a Note element with HTML P (Paragraph) and B (Bold) elements:

<xsl:template match="note">
  <P CLASS="notepara1"><B><xsl:apply-templates/></B></P>
</xsl:template>

It is important to note here that none of the markup minimization techniques allowed in HTML documents are permitted in XML. But this is only a minor inconvenience. For example, the following template is not legal for two reasons; the Paragraph end-tag is missing, and the attribute value is not quoted. The stylesheet itself is therefore an invalid XML document instance:

<xsl:template match="Note">
  <P CLASS=notepara1><!-- <<< MISSING QUOTES --><B>
    <xsl:apply-templates/>
    </B><!-- MISSING END PARA -->
</xsl:template>

Much more seriously, there is a discrepancy between HTML and XML formats concerning the format of empty elements. The XML specification allows empty elements to be represented by a single tag, ending with '/>'. HTML does not use this form, but does allow the end-tag to be omitted (and in some cases a browser may even insist on the end-tag being omitted). The next section discusses a clean way to generate true HTML; but that option may not be supported by the chosen XSLT processor, in which case the more primitive techniques described here must be used instead.

In HTML 4.0, the empty elements are:

  • AREA (a clickable area of an image)

  • BASE (a specified new base 'home' directory)

  • BASEFONT (a specified font to use as the default)

  • BR (break – start a new line)

  • COL (definitions of various attributes for one column of a table)

  • FRAME (specification for a single independent region of the browser window)

  • HR (horizonal rule – a line across the page)

  • IMG (image identifier)

  • ISINDEX (query allowed)

  • LINK (linked resource – including stylesheets)

  • META (meta data)

  • PARAM (applet parameter)

Usually, the most commonly needed of these are IMG and BR. Some current browsers object to the '/>' empty element markup, even if a preceding space is inserted, and retained by the XSL processor on output. Using end-tags is preferable, though some XSL processors may automatically reduce tags with no content to the single tag form. It may therefore be necessary to ensure that this does not happen by enclosing a single space within a text element:

<xsl:stylesheet match="newline">
  <BR><xsl:text> </xsl:text></BR>
</xsl:stylesheet>

Current browsers accept this approach, but in the latter case interpret this as two line-break instructions instead of one.

Where possible, these complications should be avoided using the technique described below.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset