5.6. XML Schema: An Alternative to DTDs

Some people have complained that DTDs use an old and inflexible syntax and aren't expressive enough for some needs. Others find it strange that documents follow one syntax and DTDs another. The content models and attribute list declarations are difficult to read and understand, and it's frustrating that patterns for data in elements and attributes can't be specified.

For these reasons, there are a number of proposed alternatives to the venerable DTD. XML Schema (sometimes referred to as XSchema) is one example, which we introduce here. Though it is still just a candidate recommendation of the XML Schema Working Group at the W3C, the essentials represented here shouldn't change much. Unlike DTD syntax, XML Schema syntax is well-formed XML, making it possible to use your favorite XML tools to edit it. It also provides much more control over datatypes and patterns, making it a more attractive language for enforcing strict data entry requirements.

Consider this example, a census form. The census-taker, going door to door, enters information in a little electronic tablet. A schema helps keep her data organized by enforcing datatypes, in case she writes something in the wrong field. Here's how an instance of the document type might look in XML.

<census date="1999-04-29">
  <censustaker>738</censustaker>
  <address>
    <number>510</number>
    <street>Yellowbrick Road</street>
    <city>Munchkinville</city>
    <province>Negbo</province>
  </address>
  <occupants>
    <occupant status="adult">
      <firstname>Floyd</firstname>
      <surname>Fleegle</surname>
      <age>61</age>
    </occupant>
    <occupant>
      <firstname>Phylis</firstname>
      <surname>Fleegle</surname>
      <age>52</age>
    </occupant>
    <occupant>
      <firstname>Filbert</firstname>
      <surname>Fleegle</surname>
      <age>22</age>
  </occupants>
</census>

Now, here's how we code the schema:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">

  <xsd:annotation>
    <xsd:documentation>
      Census form for the Republic of Oz
      Department of Paperwork, Emerald City
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="census" type="CensusType"/>

  <xsd:complexType name="CensusType">
    <xsd:element name="censustaker" type="xsd:decimal" minoccurs="0"/>
    <xsd:element name="address" type="Address"/>
    <xsd:element name="occupants" type="Occupants"/>
    <xsd:attribute name="date" type="xsd:date"/>
  </xsd:complexType>

  <xsd:complexType name="Address">
    <xsd:element name="number" type="xsd:decimal"/>
    <xsd:element name="street" type="xsd:string"/>
    <xsd:element name="city"   type="xsd:string"/>
    <xsd:element name="province"  type="xsd:string"/>
    <xsd:attribute name="postalcode" type="PCode"/>
  </xsd:complexType>

  <xsd:simpleType name="PCode" base="xsd:string">
    <xsd:pattern value="[A-Z]-d{3}"/>
  </xsd:simpleType>

  <xsd:complexType name="Occupants">
    <xsd:element name="occupant" minOccurs="1" maxOccurs="50">
     <xsd:complexType>
      <xsd:element name="firstname" type="xsd:string"/>
      <xsd:element name="surname" type="xsd:string"/>
      <xsd:element name="age">
       <xsd:simpleType base="xsd:positive-integer">
        <xsd:maxExclusive value="200"/>
       </xsd:simpleType>
      </xsd:element>
     </xsd:complexType>
    </xsd:element>
   </xsd:complexType>

</xsd:schema>

The first line identifies this document as a schema and associates it with the XML Schema namespace. For convenience, we'll drop the namespace prefix xsd: for the remainder of the discussion. The next structure, <annotation>, is a place to document the schema's purpose and other details.

5.6.1. Declaring Elements and Attributes

Next in our example is the first element type declaration. The attribute name assigns a generic identifier, while the attribute type sets the type of the element. There are two element types: simple and complex. A simple element declaration is one that has no attributes or elements for content. Since this particular element is the root element, it must be the other type, complex. In this case, the complex type is actually given a name, CensusType, that we use later to describe it. Though names aren't required, it's a good idea to use them for your own sanity.

5.6.1.1. Complex and simple element types

In the next piece of the schema, CensusType is defined as a <complexType> element. It contains three more element declarations and an attribute declaration. This not only declares three elements, <censustaker>, <address>, and <occupants>, but it establishes the content model for CensusType. So a <census> element must contain all three elements in that order and may optionally have an attribute date. This is quite different from the DTD style, where the content model consists of a string inside the element declaration, and attributes are declared separately in an attribute list declaration.

5.6.1.2. Content model restrictions

If a sequence of single elements doesn't provide enough information, XML Schema provides other options. The attributes minOccurs and maxOccurs set the number of times something can appear. minOccurs="0" overrides the default value of 1 and makes the element optional. maxOccurs="*" removes any maximum number, so the element can appear any number of times.

5.6.2. Datatypes

Every element and attribute declaration has a type attribute, as we saw in the first element declaration. Some types are predefined by XML Schema, such as string and decimal. A string type is ordinary character data like the CDATA type in DTD parlance. The decimal type is a number. Later, we declare an element <age> as type positive-integer and restrict it to being no greater than 200.

5.6.2.1. Predefined datatypes

Hey—we couldn't do this in DTDs! There is no way in DTDs to restrict character data to a pattern, while in XML Schema, there are quite a few ways. The following list shows some additional predefined types:


byte, float, long

Numerical formats. A byte is any signed 8-bit number and a long is any signed 32-bit number. A float is a floating-point number, for example 5.032E-6. Other numerical values represent abstract concepts rather than numbers, such as INF (infinity), -INF (negative infinity), and NaN (not a number, a category defined by IEEE for floating-point operations).


time, date, timeinstant, timeduration

Patterns for marking time, date, and duration.


boolean

A value of true or false. The numeric equivalent is also acceptable: 0 or 1.


binary

A pattern for binary numbers, for example 00101110.


language

A language code such as en-US.


uri-reference

The pattern for any URI, such as http://www.donut.org/cruller.xml#ingredients.


ID, IDREF, IDREFS, NMTOKEN, NMTOKENS

Attribute types that function just like their counterparts in DTDs.

There are many more types, which makes XSchema very exciting for certain documents, especially those dealing with specific kinds of data applications such as databases and order entry forms. Instead of our having to write a program that checks the datatypes, the XML parser performs that job for us.

5.6.2.2. Facets

Facets are properties used to specify a datatype, setting limits and boundaries on data values. For example, the <age> element whose datatype is positive-integer was given a maximum value of 200, called the max-inclusive facet. There are 13 other facets in XSchema, including precision, scale, encoding, pattern, enumeration, max-length, and others.

5.6.2.3. Patterns

The Address complex type declaration introduces another kind of pattern restriction. It has an attribute postalcode with a type of PCode, which is defined using a <pattern> declaration. If you can't find the pattern you want among the predefined types, you can create your own with this element. We defined PCode with the pattern string [A-Z]-d{3}, which reads "Any alphabetic character followed by a dash and three digits."

5.6.3. Advanced Capabilities

We won't go into more detail about writing schemas, because the standard is still not finished. However, some expected capabilities can be mentioned. First, there's much more that can be done with types. Elements as well as attributes can have enumerated values. Declarations can be grouped to inherit the same properties and to provide more complex content modeling, and they can also inherit the properties of other declarations in an object-oriented way.

XML Schema provides an interesting alternative to DTDs that allows document architects to design fields in much finer detail. But it isn't a replacement for DTDs at all. Why should there be only one way to describe structure in a document? DTDs still have their strengths: compact size, familiar syntax, simplicity. Together, the two provide alternate methods to achieve similar goals, and you can expect to see even more proposals added to the mix soon.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset