W3C XML Schema Basics

W3C XML Schema is a schema-definition language expressed in XML syntax. To avoid ambiguity, W3C XML Schema is often referred to as XSD Schema because in an earlier version it was called XML Schema Definition Language. Recently, the abbreviation WXS has also come into use to refer to the W3C XML Schema language.

Note

Other schema languages are expressed in XML syntax, such as RELAX NG (a combination of TREX and RELAX) and XDR (XML Data Reduced, from Microsoft).



In this section, you are introduced to some of the reasons why W3C XML Schema was developed as an alternative schema mechanism to the Document Type Definition (DTD).

Note

The W3C XML Schema specification is lengthy and very complex. This chapter can give you only an indication of some straightforward W3C XML Schema structures.



Limitations of DTDs

DTDs were inherited by XML from the Standard Generalized Markup Language (SGML). SGML was (and is) commonly used for document-centric data storage such as very large documents, including technical manuals. A DTD that describes most data as #PCDATA is adequate for many document-centric purposes because one piece of text is pretty much like another—simply a sequence of characters.

However, for many uses of XML to store data that might otherwise be stored in a relational or other type of database-management system, you will likely want to say more about the type of pieces of data that an element can contain.

A piece of data conforms to a type if the characters it contains express a defined idea. For example, you might have a date type as the allowed content of an element. If the characters contained were 2002/12/25, using an internationally recognized date format convention, you can interpret that as a date. If the element contained the characters $100.50, you would conclude that the type of the data contained in the element didn’t conform to a date type.



In a DTD, when mixed content was allowed, very few constraints could be imposed on the allowed content. For example, using a DTD, with mixed content it isn’t possible to impose a defined order on elements. W3C XML Schema provides greater control in this situation.

W3C XML Schema also gives greater control over how many occurrences of an element are allowed. For example, it allows you to define that an element occurs at least twice and at most five times:

<xsd:element name="someName" minOccurs="2" maxOccurs="5" /> 

You can’t do that in a DTD.

W3C XML Schema also specifies many additional datatypes for element content, and so on. W3C XML Schema has many built-in datatypes and also allows you to create your own, for example, by restricting allowed content to enumerated values or values defined by a regular expression.

W3C XML Schema Jargon

Let’s look briefly at some terminology. A W3C XML Schema document defines the allowed content for a class of XML documents. A single document of that class is called an instance document.

Elements and attributes are said to be declared in a W3C XML Schema document. The content of elements and attributes has a type, which can be either of simple type or complex type. Types can be built-in (that is, they are defined in the W3C XML Schema specification itself) or can be defined by a schema developer. Elements and attributes have declarations. Simple types and complex types have definitions.

Note

Typically, anything but very simple schemas are created semi-automatically by programs such as XML Spy. The examples shown in this chapter are intended to show you basic structures within a W3C XML Schema.



..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset