Well-Formed XML Documents

Let’s define a term that you will often encounter when working with XML: well-formed-ness. A well-formed XML document follows the syntax rules governed by the World Wide Web Consortium (W3C) in the XML 1.0 Specification. Well-formed-ness means the following:

  • An XML document must contain at least one element, the root element.

  • There can be only one root element. All other elements are nested inside the root element, either directly or indirectly.

The XML document in Listing A.1 is well-formed, but that in Listing A.2 is not because the closing name element appears after the description opening tag.

Listing A.2. A not well-formed XML Document
<product>
    <name>
        ChocChic
    <description>
        chocolate with mint 100g
    </name>
    </description>
</product>

What does it take to write a well-formed XML document? The short answer is that the document must meet all the well-formedness constraints specified in the W3C’s XML 1.0 recommendation: an XML document has three parts—the prolog, an element part, and a miscellaneous part.

The Prolog

The prolog starts an XML document. It contains an XML declaration, miscellaneous part, and DTD. (I’ll explain about DTD later). All the parts in the prolog are optional. Therefore, an XML document can still be well-formed even if the prolog is empty. However, an XML document with an empty prolog is not valid.

The XML declaration part of the prolog contains the version information, optional encoding declaration, and optional stand-alone document declaration. These prolog examples contain only the XML declaration part:

<?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" standalone="yes"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

The only valid version number for an XML document is currently 1.0. The encoding declaration is the language encoding for the document. The default value for the encoding declaration is UTF-8. A value of “yes” for the standalone declaration means the XML document does not refer to any external document; “no” means otherwise.

The Element Part

Right after the prolog comes the element part. An XML element begins with a start tag and ends with an end tag. A start tag begins with < and ends with >. An end tag starts with </ and ends with >. Here is an XML element, the tag name is productId:

<productId>1</productId>

A tag name starts with a letter, an underscore, or a colon. Following the first character are letters, digits, underscores, hyphens, periods, and colons. A tag name can contain no white space.

Empty elements are also possible. For example:

<bodyNumber></bodyNumber>

Empty elements can be written using only one tag:

<bodyNumber/>

An element can have attributes, which are name-value pairs containing additional data for the element. You separate the name and the value in an attribute with the equal sign. Attribute names follow the same rule as tag names. You must enclose attribute values in quotation marks, either double quotes or single quotes. Using double quotes is more common, but you can use single quotes if the value itself contains double quotes. In the case where an attribute value contains both single quotes and double quotes, you can encode the single quote and double quote characters. &apos; represents a single quote and &quot; represents a double quote.

Note

The <, >, and & characters are special characters that must also be encoded. You can use &amp; for the ampersand (&), &lt; for <, and &gt; for >.


For example, the following product element has an attribute called in_stock. The attribute has the value of 6.

<product in_stock="6">
    <name>ChicChoc</name>
</product>

Confusion often arises whether or not to write data related to an element as an attribute or as a child element. For example, you can rewrite the preceding product element as follows:

<product>
    <in_stock>6</in_stock>
    <name>ChicChoc</name>
</product>

Whether to use an attribute or a child element is entirely up to you. However, the general rule of thumb says that you should not have more than 10 attributes in one element.

The Miscellaneous Part

This part can contain comments or processing instructions. An XML comment starts with <!-- and ends with -->.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset