Chapter 5. Rendering and presenting XML documents Word Power User Task

  • Word Markup Language (WordML)

  • Mixing WordML with other vocabularies

  • Creating WordML with stylesheets

In the previous chapter we learned how to use Word to create and edit XML documents as unrendered abstractions. We also learned how to convert a rendered Word document to XML. We did these things using Word’s default schema-independent presentation of XML documents: pink icons that represent tags.

In this chapter, we do the opposite. We learn how to transform an abstract XML document into a WordML rendition, both manually and with XSLT stylesheets. Doing so allows us to use Word’s WYSIWYG interface to view and print the documents, and even to edit the abstract XML data.

These presentations can include any formatting available in Word, such as styles and page numbers. The transformations can also filter out unwanted information, or summarize or reorganize it.

Skills required

Skills required

In addition to general Word end user skills, it is helpful to understand the basics of XSLT, which can be found in Chapter 18, “XSL Transformations (XSLT)”, on page 392.

Word Markup Language (WordML)

The Word Markup Language (WordML) is the native XML representation for Microsoft Word. It captures everything that might be known about a Word document. It covers not just the text of the document itself, but also all the formatting, all the styles associated with that document (whether they are used or not), and all of the various settings (such as page margins and tabs). Since it covers so many things, it is very verbose, and it is somewhat difficult to understand just by reading it.

Nevertheless, WordML has a significant benefit over the equivalent .doc binary format of Word documents: Any tool that can parse XML can make use of the Word document. This includes tools that transform, display, search, validate, store, index and query XML documents.

As Office 2003 increases in popularity, we expect third-party tools to be released that will use WordML to process Word documents in new ways and to generate Word documents from other data sources.

Caution

Caution

Because WordML is a native Word document representation, Word treats it quite differently from other uses of XML. To avoid the constant interjection of “except for WordML”, we normally do not include WordML when we discuss Word’s treatment of XML documents. If we do mean to include it, that will be clear from the context.

The WordML vocabulary

WordML is a large, complex vocabulary with over 400 different element types. Fortunately, in order to create, or even parse, WordML documents, you only need to be familiar with a small fraction of the vocabulary.[1] In fact, the first WordML document you write can be quite small and simple. It is shown in Example 5-1.

Example 5-1. Your first WordML document (minimal WordML.xml)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
 xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
  <w:body>
    <w:p>
      <w:r><w:t>hello, Word</w:t></w:r>
    </w:p>
  </w:body>
</w:wordDocument>

Saving a Word document as WordML

Recall Doug’s article for Worldwide Widget’s newsletter. It started life as an ordinary Word document. We repeat it here in Figure 5-1 for your convenience.

Doug’s article (article.doc)

Figure 5-1. Doug’s article (article.doc)

The default format when you save a Word document is still the binary .doc format. However, if you choose to save a document as XML and Word cannot associate that document with a schema, it will be saved as WordML.[2]

Let’s save Doug’s article as WordML and see what we get. To do so:

  1. On the File menu, click Save As.

  2. Select XML Document (*.xml) from the Save as type list.

  3. Click Save.

We’ll look at the actual WordML representation, as a Word rendition would be identical to Figure 5-1. Because the WordML document is extremely long, we will excerpt pieces as examples as we go along.

Structure of a WordML document

The basic structure of a WordML document is shown in Model 5-1/>.

Example 5-1. WordML document structure

[Document (wordDocument)
  [0..1]Document Properties -- General (DocumentProperties)
  [0..1]Lists (lists)
  [0..1]Styles(styles)
  [0..1]Document Properties -- Word-specific (docPr)
  [1..1]Body (body)

The root of a WordML document is always a wordDocument element. The most commonly used children of a wordDocument element are:

  • an optional DocumentProperties element, which contains general information about the document such as the date it was created and last updated, the author name, and the revision number

  • an optional lists element contains information about the formatting of lists, such as the type of bullet or number, and the indentation used

  • an optional styles element contains the information about the styles used in the document, such as the font and size, language, and paragraph formatting

  • an optional docPr element, which contains Word-specific information on the settings for the document, such as margins and header and footer properties

  • a required body element that contains the bulk of the document

As you can see, most of these elements can be left out. If you omit an optional element, it defaults to the settings for new documents in Word.

In the beginning

Example 5-2 shows the very beginning of the WordML document.[3]

Example 5-2. Beginning of WordML document (article WordML.xml)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
 xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
 xmlns:v="urn:schemas-microsoft-com:vml"
 xmlns:w10="urn:schemas-microsoft-com:office:word"
 xmlns:SL="http://schemas.microsoft.com/schemaLibrary/2003/core"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
 xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xml:space="preserve">
  <o:DocumentProperties>
    <o:Title>Heading 1</o:Title>
    <o:Author>Priscilla Walmsley</o:Author>
    <o:LastAuthor>Priscilla Walmsley</o:LastAuthor>
    <o:Revision>2</o:Revision>

line 1

The document starts out on line 1 with an XML declaration, which identifies the document as XML and indicates the encoding used in the document.

line 2

On line 2, a processing instruction appears which identifies the document as a Word document. The purpose of this processing instruction is to tell Windows to open this file in Word, rather than in Internet Explorer, which is often the application associated with the .xml extension.

line 3

The root element is w:wordDocument, whose start-tag has a number of namespace declarations.

line 4

The namespace of the WordML vocabulary is: http://schemas.microsoft.com/office/word/2003/wordml This namespace is commonly mapped to the w prefix, although there is no requirement that this prefix be used.

line 13

The first child of w:wordDocument is a o:DocumentProperties element that contains general information about the document. It is followed by a huge number of elements representing style information, which is not shown.

The body

The body of the WordML document, represented by the body element, contains all the text of the document. Its structure is shown in Model 5-2.

The body can contain sections that contain paragraphs, or it can contain paragraphs directly. Paragraphs, in turn, contain text runs, which contain text elements, which contain data characters. There is a separate text run for every data character string that has a distinct style or other properties. A paragraph can also contain images, hyperlinks and other components.

Example 5-2. WordML body structure

Body (body)
  [0..*]Section (sect)
         [0..*]Paragraph (p)
                 [0..*]Text Run (r)
                         [0..*]Text (t)
  [0..*]Paragraph (p)
      ...

Paragraphs and text

Each paragraph is represented by a p element. The paragraph has a style (and possibly other settings) associated with it in its properties child, pPr. If no style is associated with the paragraph, it defaults to “Normal” style.

A text run (r) can contain multiple text elements, as well as pictures, footnotes, fields and other Word objects. A text element (t), on the other hand, can only contain data characters, with no child elements. Every data character in the document text is contained directly in a t element.

An excerpt from the body of the WordML representation of Figure 5-1 is shown in Example 5-3. It contains two paragraphs (p elements). The first paragraph has a pPr child that identifies properties of the paragraph, namely that the style is “Heading2”. It then contains a text run (r element), which contains a single text element (t).

Example 5-3. WordML paragraphs (article WordML.xml)

<w:p>
  <w:pPr>
    <w:pStyle w:val="Heading2"/>
  </w:pPr>
  <w:r><w:t>A great month!</w:t></w:r>
</w:p>
<w:p>
  <w:r><w:t>This month's figures are a </w:t></w:r>
  <w:r>
    <w:rPr>
      <w:i/>
    </w:rPr>
    <w:t>huge</w:t>
  </w:r>
  <w:r>
    <w:t> improvement over this month last year. We sold
1,342 widgets for a total revenue of $14,327.</w:t>
  </w:r>
</w:p>

The second paragraph contains three text runs (w:r elements). As the word “huge” is in italics, it must have its own text run with its own properties (the w:rPr element) that specify the italics (the w:i element).

Lists

Bulleted and numbered lists are common in Word documents. In WordML, list items are simply paragraphs that refer to a list ID in their properties. The list ID corresponds to a list defined in the lists section of the document.

For example, suppose Doug wanted to list the identifying elements of his article in a bulleted list, as shown in Figure 5-2. The corresponding WordML would look like Example 5-4.

List in Word

Figure 5-2. List in Word

Example 5-4. WordML list

<w:p>
  <w:pPr>
    <w:listPr><w:ilvl w:val="0"/><w:ilfo w:val="2"/></w:listPr>
  </w:pPr>
  <w:r><w:t>Title: Sales Update</w:t></w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:listPr><w:ilvl w:val="0"/><w:ilfo w:val="2"/></w:listPr>
  </w:pPr>
  <w:r><w:t>Author: Doug Jones</w:t></w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:listPr><w:ilvl w:val="0"/><w:ilfo w:val="2"/></w:listPr>
  </w:pPr>
  <w:r><w:t>Date: February 3, 2004</w:t></w:r>
</w:p>

Each paragraph properties (pPr) element contains a list properties (listPr) element which in turn has two children:

  • The ilvl element indicates the level of the item in the list, starting with zero. If a list contains items at different outline levels, this property indicates this.

  • The ilfo element associates the paragraph with a specific list. The number specified in its val attribute is an ID that corresponds to the ilfo attribute of a list element in the lists section.

The lists element of the same document appears in Example 5-5. Notice that it has two types of children. The listDef element defines various properties of the list, such as the style used and a unique identifier. The list element has only a unique identifier and the link to a listDef element through its ilst child. The many levels of definitions for lists are due to the complexity of starting and stopping the numbering for numbered lists.

Example 5-5. The WordML lists element

<w:lists>

  <w:listDef w:listDefId="0">
    <w:lsid w:val="1E525C74"/>
    <w:listStyleLink w:val="Style1bulletpw"/>
  </w:listDef>

  <w:list w:ilfo="1">
    <w:ilst w:val="0"/>
  </w:list>

</w:lists>

Tables

The structure of WordML tables (Model 5-3) is very similar to XHTML tables, so if you are familiar with HTML you have a head start. A table element (tbl) can appear anywhere a paragraph can appear, namely as a child of body.

Example 5-3. WordML table structure

Table (tbl)
  [1..1]Table Properties (tblPr)
  [1..1]Table Grid (tblGrid)
         [1..*]Table Grid Column (tblGridCol)
  [0..*]Row (tr)
         [0..1]Row Properties (trPr)
         [1..*]Cell (tc)
                 [1..1]Cell Properties (tcPr)
                 [0..*]Tables (tbl)
                 [1..*]Paragraphs (p)

The table properties element (tblPr) is used to specify the properties of the table, such as the style used, the cell spacing, and the borders. The element is required, but none of its children (which set the individual properties) is required, so it is possible to have an empty tblPr element. All of the settings have defaults, which are used in case they are not specified.

The table grid element (tblGrid) is used to set the column widths. For each column in the table it contains a tblGridCol with a w attribute that specifies the column width in twips (twentieths of a point). The tblGrid element and its tblGridCol children are required.

Each row in the table is represented by a tr element. Each tr element has an optional properties child, trPr, and one or more cells, represented by tc elements. Each tc may itself have a properties child, tcPr, and must have one or more other tables (tbl) or paragraphs (P). The last child of the tc must always be a paragraph rather than another table.

Suppose that Doug wants to display sales data in a table. The table shown in Example 5-6 will look like Figure 5-3 when shown in Word.

Sales table displayed in Word

Figure 5-3. Sales table displayed in Word

Example 5-6. WordML table

<w:tbl>
  <w:tblGrid>
    <w:gridCol w:w="828"/>
    <w:gridCol w:w="1620"/>
    <w:gridCol w:w="1440"/>
  </w:tblGrid>
  <w:tr>
    <w:tc>
      <w:p>
        <w:pPr><w:pStyle w:val="Heading3"/></w:pPr>
        <w:r><w:t>Q</w:t></w:r>
      </w:p>
    </w:tc>
    <w:tc>
      <w:p>
        <w:pPr><w:pStyle w:val="Heading3"/></w:pPr>
        <w:r><w:t>Revenue</w:t></w:r>
      </w:p>
    </w:tc>
    <w:tc>
      <w:p>
        <w:pPr><w:pStyle w:val="Heading3"/></w:pPr>
        <w:r><w:t>Profit</w:t></w:r>
      </w:p>
    </w:tc>
  </w:tr>
  <w:tr>
    <w:tc><w:p><w:r><w:t>1</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$14,332.35</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$2,115.12</w:t></w:r></w:p></w:tc>
  </w:tr>
  <w:tr>
    <w:tc><w:p><w:r><w:t>2</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$13,224.22</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$1,655.51</w:t></w:r></w:p></w:tc>
  </w:tr>
  <w:tr>
    <w:tc><w:p><w:r><w:t>3</w:t></w:r></w:p></w:tc>
    <w:tc>
    <w:p><w:r><w:t>$14,778.26</w:t></w:r></w:p></w:tc><w:tc>
    <w:p><w:r><w:t>$2,243.98</w:t></w:r></w:p></w:tc>
  </w:tr>
  <w:tr>
    <w:tc><w:p><w:r><w:t>4</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$17,455.15</w:t></w:r></w:p></w:tc>
    <w:tc><w:p><w:r><w:t>$2,988.22</w:t></w:r></w:p></w:tc>
  </w:tr>
</w:tbl>

For more complex tables, you can use the many table formatting features of Word, such as vertical and horizontal merge, and borders and shading. You can even include tables within other tables, as we saw.

Tip

Tip

When designing a complex table, the best approach is to create an example of the table in Word and save it as WordML. This will give you a model to work from, and will save you the effort of learning every single relevant WordML element.

Images

An image embedded in a Word document is represented in WordML by a pict element. Each pict element contains a Vector Markup Language (VML) description of the shape, location and size of the image, and the image data itself in base64Binary datatype format.

Tip

Tip

As with other Word components, the best way to include an image in a generated WordML document is to create a Word document that contains the image in the desired location and size, and save it as WordML. You can then copy the pict element from the saved WordML document and place it in your XSLT stylesheet.

Hyperlinks

A hyperlink is represented in WordML by an hlink element. Example 5-7 shows a paragraph that has an embedded hyperlink.

Example 5-7. Hyperlink in WordML

<w:p>
  <w:r>
    <w:t>More information on the new marketing
         plan can be found at </w:t>
  </w:r>
  <w:hlink w:dest="http://www.xmlinoffice.com/mkplan">
    <w:r>
      <w:rPr><w:rStyle w:val="Hyperlink"/></w:rPr>
      <w:t>http://www.xmlinoffice.com/mkplan</w:t>
    </w:r>
  </w:hlink>
  <w:r>
    <w:t>. </w:t>
  </w:r>
</w:p>

The hlink element is contained directly within the p element, rather than within a text run. In fact, it contains its own text run for the hyperlink text that appears when the document is presented, as in Figure 5-4. The dest attribute of the hlink element specifies the linked URL.

Hyperlink displayed in Word

Figure 5-4. Hyperlink displayed in Word

Using Word styles

There are four kinds of style in Word:

  • A character style applies to a data character string within a paragraph.

  • A paragraph style applies to an entire paragraph.

  • A table style has special settings relating to tables, such as background color and justification.

  • A list style has special settings related to lists, such as the bullet or numbering used.

There are quite a few different properties of a style, ranging from character properties, such as font and size, to paragraph properties, such as indentation and tab settings. Any style setting that can be specified in Word can also be expressed in WordML.

A style example

The styles element that appears before the body contains all the information about the styles used in the document. Each style element has a unique name that is specified in its styleId attribute. The text in the body of the document then refers to these styles by name.

In Example 5-3, the first paragraph refers to the style whose name is “Heading2”. The style element for Heading2 is shown in Example 5-8.

Example 5-8. WordML style (article WordML.xml)

<w:style w:type="paragraph" w:styleId="Heading2">
  <w:name w:val="heading 2"/>
  <w:basedOn w:val="Normal"/>
  <w:next w:val="Normal"/>
  <w:rsid w:val="CF4316"/>
  <w:pPr>
    <w:pStyle w:val="Heading2"/>
    <w:spacing w:before="240" w:after="60"/>
  </w:pPr>
  <w:rPr>
    <w:rFonts w:ascii="Arial" w:h-ansi="Arial" w:cs="Arial"/>
    <w:b/>
    <w:b-cs/>
    <w:kern w:val="48"/>
    <w:sz w:val="48"/>
    <w:sz-cs w:val="48"/>
  </w:rPr>
</w:style>

Generating WordML style definitions

Fortunately, there is no need to learn all the WordML elements for the style settings you need. Attempting to construct WordML style definitions by hand would be a tedious, trial-and-error process. Because Word already provides a user-friendly front-end for defining styles, you should use Word itself to create a document that has all the styles you want to use.

You can save that document as WordML using the procedure described in 5.1.2, “Saving a Word document as WordML”, on page 89. The result is a WordML document that contains all the styles you need. You can then copy the styles section of that document (and the lists section if needed).

This is a good approach not just for paragraph styles, but also for character styles. For example, if you wish to italicize a word in the middle of a sentence, you could do this using the i property for the text run, as shown in Example 5-3. However, it is sometimes difficult to remember the names of all the different properties that can be applied to text.

Using Word, you can create a character style for italics named, for example, “emphasis”. Any text that should be italicized because it should be emphasized can then refer to that style, rather than using the i property. In effect you are using the principles of generalized markup for style names, just as you do for XML element-type names.

As with XML, this approach to style definitions has the added benefit of making it easy to apply a change to all text of that type. For example, if you use italics for both emphasized words and citations, you can create two styles: “emphasis” and “citation”. If later, you decide you want to put citations in a different font, you can simply change the “citation” style rather than having to change the font of some but not all of the italicized text.

Mixing WordML with other vocabularies

As we saw in 4.6, “Saving a document”, on page 79, WordML can be interspersed with other vocabularies. When a Word document associated with a schema is saved as XML, by default the saved file contains both WordML elements and elements of the associated schema.

For example, saving an article document as XML results in a document that contains elements from the article schema interspersed with WordML elements, as shown in Example 5-9.

Example 5-9. article/WordML mixture (article data and WordML.xml)

<ns0:section>
  <ns0:header>
    <w:p>
      <w:pPr>
        <w:pStyle w:val="Heading2"/>
      </w:pPr>
      <w:r>
        <w:t>A great month!</w:t>
      </w:r>
    </w:p>
  </ns0:header>
  <ns0:para>
    <w:p>
      <w:r>
        <w:t>This month's figures are a</w:t>
      </w:r>
      <ns0:em>
        <w:r>
          <w:rPr>
            <w:i/>
          </w:rPr>
          <w:t>huge</w:t>
        </w:r>
      </ns0:em>
      <w:r>
        <w:t> improvement over this month last year. We sold 1,342
widgets for a total revenue of 14,327.</w:t>
      </w:r>
    </w:p>
  </ns0:para>
</ns0:section>

Namespaces are used to distinguish between the two vocabularies. The WordML elements use the w prefix, as in w:p and w:t. The article vocabulary uses the ns0 prefix, as in ns0:section and ns0:header.

Because of the hierarchical structure of XML, an element from the article schema must always contain one or more entire WordML paragraphs, or be contained in a WordML paragraph. It is not possible for it to span part of one WordML paragraph plus part of the next paragraph.

In addition, each element from the article schema must contain its own text run. It cannot be included as a child of a text run element (r), nor as a child of a text element (t). For example, the following text run is illegal:

<w:r><w:t><ns0:title>Sales Update</ns0:title></w:t></w:r>

Instead, the ns0:title element should be moved out to contain the w:r element, as in:

<ns0:title><w:r><w:t>Sales Update</w:t></w:r></ns0:title>

Combining WordML with other vocabularies allows all of the Word formatting and other information to be retained, so that the document can be reopened in Word and the styles and settings will be intact. This is useful if you or other users will continue to edit the document in Word.

However, a mixed document is not valid according to the article schema. If you need an article document to pass to an application that is expecting it, just save the document as XML with the Save data only box checked.

Creating WordML with stylesheets

WordML is very rarely created by hand, because it is much easier for a person to format a document with Word than to compose the equivalent WordML representation. But that approach is most useful for one-off tasks.

If you want to create Word renditions of multiple XML documents of the same document type, it is more efficient to generate WordML using XSLT. This technique also allows data from other sources to be incorporated into the Word documents. Moreover, multiple views of the same data (e.g. for different classes of user) can be created using several different transformations.

This section provides an overview of the creation and use of stylesheets for Microsoft Word. For more information on XSLT stylesheets, please see Chapter 18, “XSL Transformations (XSLT)”, on page 392.

Creating an XSLT stylesheet

Each desired rendition is expressed as an XSLT stylesheet that is associated with a particular schema in the Word Schema Library. An XSLT stylesheet contains XSLT instruction elements (usually prefixed with xsl:) that select elements and attributes from the source document. The instructions are interleaved with elements from other namespaces that are to appear in the result document.

For use with Word, a stylesheet must transform documents from your source vocabulary (e.g. article) to WordML. As we have seen, this can be a challenging task, since WordML is quite complex and not entirely intuitive. But fortunately, as we have also seen, you can copy most of the complex parts from an existing WordML document.

If you have (or create) a Word document that has the settings you want to use – for example the styles and page margins – you can simply save that document as WordML. You can then paste the beginning of the document, which contains all the settings, into your stylesheet.

Stylesheet structure

Worldwide Widget maintains an archive of its newsletter articles in XML so they can easily be reused. Authors frequently access the archive to rework old articles for new issues of the newsletter. The company has implemented a stylesheet that will transform an XML article (the source document) into a WordML/article combination (the result document).

Example 5-10 shows the general structure of the stylesheet (named article_view.xsl) that accomplishes this.

Example 5-10. Article stylesheet general structure (article_view.xsl)

<xsl:stylesheet version="1.0" xml:space="preserve"
  xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
  xmlns:art="http://xmlinoffice.com/article"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/">
   <w:wordDocument xml:space="preserve">
      <w:lists>  <!--taken from Word--> </w:lists>
      <w:styles> <!--taken from Word--> </w:styles>
      <w:docPr>  <!--taken from Word--> </w:docPr>
      <w:body>
        <xsl:apply-templates select="/art:article"/>
      </w:body>
   </w:wordDocument>
   </xsl:template>

<!--rest of template rules here-->

</xsl:stylesheet>

Rather than start from scratch with the styles, we took the lists and styles elements from a Word document that had styles defined to our liking, as was recommended in 5.1.6.2, “Generating WordML style definitions”, on page 101.[4]

The body element contains the xsl:apply-templates instruction to apply the correct template rule to the source elements whose element-type name is article. (Only one such element is allowed by the article schema.)[5]

Template rules

The article_view.xsl stylesheet has a template rule for every article schema element type. The template rule that matches the root element, article, is shown in Example 5-11. It inserts WordML paragraphs (p elements) for the title, author and date. It also applies other template rules that transform the article element’s children.

Example 5-11. Template rule for article schema element type (article_view.xsl)

<xsl:template match="art:article">
 <xsl:copy>
  <w:p>
   <w:pPr>
    <w:pStyle w:val="Heading1"/>
   </w:pPr>
   <xsl:apply-templates select="art:title"/>
  </w:p>
  <w:p>
   <xsl:apply-templates select="art:author"/>
  </w:p>
  <w:p>
   <w:pPr>
    <w:pBdr>
     <w:bottom w:val="single" w:sz="6" w:space="22" w:color="auto"/>
    </w:pBdr>
   </w:pPr>
   <xsl:apply-templates select="art:date"/>
  </w:p>
  <xsl:apply-templates select="art:body"/>
 </xsl:copy>
</xsl:template>

The xsl:copy instruction copies a source element so that it also appears in the resulting WordML; it does not copy child elements or attributes. If you want a pure WordML document with no elements from the article vocabulary, you can leave out the xsl:copy instructions in the templates.

The template rule shown in Example 5-12 is used to transform both author and date elements, since they are processed similarly. Their contents are simply included unchanged in a text element contained within a text run.

Example 5-12. Template rule for author and date element types (article_view.xsl)

<xsl:template match="art:author|art:date">
  <xsl:copy>
    <w:r>
      <w:rPr>
        <w:b/>
      </w:rPr>
      <w:t><xsl:value-of select="."/></w:t>
    </w:r>
  </xsl:copy>
</xsl:template>

A third template rule, for the section element type, is shown in Example 5-13. It includes a paragraph (p) for the header, then uses an XSLT for-each element to loop through the child paragraphs and process them individually.

Example 5-13. Template rule for section element type (article_view.xsl)

<xsl:template match="art:section">
  <xsl:copy>
    <w:p>
      <w:pPr>
        <w:pStyle w:val="Heading2"/>
      </w:pPr>
      <xsl:apply-templates select="art:header"/>
    </w:p>
    <xsl:for-each select="art:para">
      <xsl:apply-templates select="."/>
    </xsl:for-each>
  </xsl:copy>
</xsl:template>

The result of applying the stylesheet is a WordML/article combination that can be displayed in Word with or without the article tag icons, as shown in Figure 5-5.

Result of applying stylesheet to an article

Figure 5-5. Result of applying stylesheet to an article

Using stylesheets

This section explains how to use our newly-created stylesheet (article_view.xsl) with Word.

Associating stylesheets with schemas

Stylesheets are associated with schemas using the Schema Library, where they are known as solutions. Multiple stylesheets can be associated with the same schema. First, let’s associate the article_view.xsl stylesheet with the article schema. To do this:

  1. On the Tools menu, click Templates and Add-Ins.

  2. Click the XML Schema tab.

  3. Click Schema Library.

  4. Select the article schema.

  5. Click Add Solution.

  6. Browse to the location of the article_view.xsl document and select it.

  7. This will bring up the Solution Settings dialog shown in Figure 5-6.

    The Solution Settings dialog

    Figure 5-6. The Solution Settings dialog

  8. The default type is XSL Transformation, which is what we want in this case.

  9. In the Alias box, enter a nickname for the solution, such as “Article View”, and click OK.

  10. The solution (stylesheet) now appears in the bottom half of the Schema Library dialog, as shown in Figure 5-7.

    Solution listed in the Schema Library dialog

    Figure 5-7. Solution listed in the Schema Library dialog

It is possible in the Schema Library dialog to specify a particular solution as the default. To do this, select a solution from the Default solution list. The default stylesheet is applied whenever a document associated with that schema is opened.

Opening a document with a stylesheet

There are three ways to choose a stylesheet to apply while opening a document: by default, while opening the document, and after opening the document but before editing it.

Default stylesheet

Opening a document whose schema is associated with a default stylesheet will result in that stylesheet being applied automatically. For example, now, when we open article.xml, the article_view.xsl stylesheet is automatically applied, as shown in Figure 5-8.

Article with article_view stylesheet applied

Figure 5-8. Article with article_view stylesheet applied

If you previously had the Show XML tags in document box checked, you will have to uncheck it in order to see the document rendered according to the stylesheet. You can do this on the XML Structure task pane.

Choose while opening

If there is no default stylesheet for a schema, or if you wish to use a different style, you can choose a stylesheet while opening a document. To do this:

  1. On the File menu, click Open.

  2. Select the file you wish to open.

  3. Click the down arrow on the right side of the Open button, and select Open with Transform from the list.

  4. Select the file name of the stylesheet you wish to use. The document will open using the specified transformation.

Choose before editing

The XML Document task pane lists all the associated stylesheets, as well as the generic Data only rendition that is used as a default when no other stylesheet is available. You can check out the different renditions of the document simply by selecting them from this list.

You can also click Browse to look for an additional XSLT stylesheet to apply.

You can choose between stylesheets only until you have begun editing the document. Once you change the document in any way, the XML Document task pane closes and you lose the ability to change to a different rendition of the document.

Tip

Tip

Styles are not necessarily applied to newly added elements. For example, if you insert a new section in your article, its header element will not automatically be given the Heading2 style like the original section headers. However, if you save the document using Save data only, then reopen it, the stylesheet will format it appropriately.

Saving a document using a transformation

You can apply a stylesheet when you are saving a document. This may seem confusing, since styles are generally associated with the way a document is formatted and presented rather than the way it is stored.

However, XSLT stylesheets are not just for formatting; you can also use them to transform XML documents from one vocabulary to another. This feature is useful if, for example, you want to allow the user to work with a small, simple vocabulary, but the documents need to be transformed into a more complex vocabulary to be used by another process.

To specify a stylesheet when saving a document:

  1. On the File menu, click Save As.

  2. Select XML Document (*.xml) from the Save as type list.

  3. Check the Apply transform box.

  4. Click the Transform... button to select a stylesheet.

  5. Select the stylesheet you wish to apply and click Open.

  6. Click Save.



[1] A reference guide that covers the entire WordML vocabulary is included with the Microsoft Word XML Content Development Kit that can be downloaded from the MSDN library at: http://msdn.microsoft.com

[2] We saw in 4.6, “Saving a document”, on page 79 how to save a document that is associated with a schema, using that schema alone. Later we will see how to save it using a combination of its own schema and WordML.

[3] Some whitespace was added to all examples to make them more readable.

[4] The comment <!--taken from Word--> appears instead of the actual definitions to reduce the size of the example; for a full listing, see article_view.xsl.

[5] The xsl:template instruction element is actually a template rule; its content is the template. The xsl:apply-templates instruction actually applies template rules, which include match patterns as well as templates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset