It is fairly easy to create a schema once you know the syntax. It is harder to design one well. This chapter focuses on a strategy for designing schemas that are accurate, durable, and easy to implement. Carefully planning your schema design strategy is especially important when creating a complex set of schemas, or a standard schema that is designed to be used and extended by others.
Schemas are a fundamental part of many XML-based applications, whether XML is being used in temporary messages for information sharing or as an enduring representation of content (e.g., in publishing). Enterprise architects, DBAs, and software developers often devote a lot of time to data design: They create enterprise data models, data dictionaries with strict naming, and documentation standards, and carefully design and optimize relational databases. Unfortunately, software designers and implementers often do not pay as much attention to good design when it comes to XML messages.
There are several reasons for this. Some people feel that with transitory XML messages, it is not important how they are structured. Some decide that it is easier to use whatever schema is generated for them by a toolkit. Others decide to use an industry-standard XML vocabulary, but fail to figure out how their data really fits into that standard, or to come up with a strategy for customizing it for their needs.
As with any data design, there are many ways to organize XML messages. For example, decisions must be made about how many levels of elements to include, whether the elements should represent generic or more specific concepts, how to represent relationships, and how far to break down data into separate elements. In addition, there are multiple ways to express the same XML structure in XML Schema. Decisions must be made about whether to use global versus local declarations, whether to use named versus anonymous types, whether to achieve reuse through type extension or through named model groups, and how schemas should be broken down into separate schema documents.
The choices you make when designing a schema can have a significant impact on the ease of implementation, ease of maintenance, and even the ongoing relevance of the system itself. Failure to take into account design goals such as reuse, graceful versioning, flexibility, and tool support can have serious financial impacts on software development projects.
When designing a schema, it is first important to understand what it will be used for. Schemas actually play several roles.
• Validation. Validation is the purpose that is most often associated with schemas. Given an XML document, you can use a schema to automatically determine whether that document is valid or not. Are all of the required elements there, in the right order? Do they contain valid values according to their data types? Schema validation does a good job of checking the basic structure and content of elements.
• A service contract. A schema serves as part of the understanding between two parties. The document provider and the document consumer can both use the schema as a machine-enforceable set of rules describing an interface between two systems or services.
• Documentation. Schemas are used to document the XML structure for the developers and end users that will be implementing or using it. Narrative human-readable annotations can be added to schema components to further document them. Although schemas themselves are not particularly human-readable, they can be viewed by less technical users in a graphical XML editor tool. In addition, there are a number of tools that will generate HTML documentation from schemas, making them more easily understood.
• Providing type information. Schemas contain information about the data types that can affect how the information is processed. For example, if the schema tells an XSLT 2.0 stylesheet that a value is an integer, it will know to sort it and compare to other values as an integer instead of a string.
• Assisted editing. For documents that will be hand-modified by human users, a schema can be used by XML editing software to provide context-sensitive validation, help, and content completion.
• Code generation. Schemas are also commonly used, particularly in web services and other structured data interfaces, to generate classes and interfaces that read and write the XML message payloads. When a schema is designed first, classes can be generated automatically from the schema definitions, ensuring that they match. Other software artifacts can also be generated from schemas, for example, data entry forms.
• Debugging. Schemas can assist in the debugging and testing processes for applications that will process the XML. For example, importing a schema into a XSLT 2.0 stylesheet or an XQuery query can help identify invalid paths and type errors in the code that, otherwise, may not have been found during testing.
As you can see, schemas are an important part of an XML implementation, and can be involved at both design time and run time. Although it is certainly possible to use XML without schemas, valuable functionality would be lost. You would be forced to use a nonstandard method to validate your messages, document your system interfaces, and generate code for web services. You also would not be able to take advantage of the many schema-based tools that implement this functionality at low cost.
The various roles of schemas should be taken into account when designing them. For example, use of obscure schema features can make code generation difficult, and not adequately documenting schemas can impact the usefulness of generated documentation.
Designing schemas well is a matter of paying attention to certain important design considerations: flexibility and extensibility, reusability, clarity and simplicity, support for versioning, interoperability, and tool support. This section takes a closer look at each of these design goals.
Schema design often requires a balancing act between flexibility, on the one hand, versus rigidity on the other. For example, suppose I am selling digital cameras that have a variety of features, such as resolution, battery type, and screen size. Each camera model has a different set of features, and the types of features change over time as new technology is developed. When designing a message that incorporates these camera descriptions, I want enough flexibility to handle variations in feature types, without having to redesign my message every time a new feature comes along. On the other hand, I want to be able to accurately and precisely specify these features.
To allow for total flexibility in the camera features, I could declare a features
element whose type contains an element wildcard, which means that any well-formed XML is allowed. This would have the advantage of being extremely versatile and adaptable to change. The disadvantage is that the message structure is very poorly defined. A developer trying to write an application to process the message would have no idea what features to expect and what format they might have.
On the other hand, I can declare highly constrained elements for each feature, with no opportunity for variation. This has the benefit of making the features well defined, easy to validate, and much more predictable. Validation is more effective because certain features can be required and their values can be constrained by specific data types. However, the schema is brittle because it must be changed every time a new feature is introduced. When the schema changes, the applications that process the documents must also often change.
The ideal design is usually somewhere in the middle. A balanced approach in the case of the camera features might be to create a repeating feature
element that contains the name of the feature as an attribute and the value of the feature as its content. This eliminates the brittleness while still providing a predictable structure for implementers.
Reuse is an important goal in the design of any software. Schemas that reuse XML components across multiple kinds of documents are easier for developers and users to learn, are more consistent, and save development and maintenance time that could be spent writing redundant software components.
Using XML Schema, reuse can be achieved in a number of ways.
• Reusing types. It is highly desirable to reuse complex and simple types in multiple element and attribute declarations. For example, you can define a complex type named AddressType
that represents a mailing address, and then use it for both BillingAddress
and ShippingAddress
elements. Only named, global types can be reused, so types in XML Schema should generally be named.
• Type inheritance. In XML Schema, complex types can be specialized from other types using type extensions. For example, I can create a more generic type ProductType
and derive types named CameraType
and LensType
from it. This is a form of reuse because CameraType
and LensType
inherit a shared set of properties from ProductType
.
• Named model groups and attribute groups. Through the use of named model groups, it is possible to define reusable pieces of content models. This is a useful alternative to type inheritance for types that are semantically different but just happen to share some properties with other types.
• Reusing schema documents. Entire schema documents can be reused by taking advantage of the include and import mechanisms of XML Schema. This is useful for defining components that might be used in several different contexts or services. In order to plan for reuse, schema documents should be broken down into logical components by subject area. Having schema documents that are too large and all-encompassing tends to inhibit reuse because it forces other schema documents to take all or nothing when importing them. It is also good practice to create a “core components” schema that has low-level building blocks, such as types for Address
and Quantity
, that are imported by all other schema documents.
When human users are creating and updating XML documents, clarity is of the utmost importance. If users have difficulty understanding the document structure, it will take far more time to edit a document, and the editing process will be much more prone to errors. Even when XML documents are both written and read by software applications, they still should be designed so that they are easy to conceptualize and process. Implementers on both sides—those who create XML documents and those who consume them—are writing and maintaining applications to process these messages, and they must understand them. Overly complex message designs lead to overly complex applications that create and process them, and both are hard to learn and maintain.
Properly and consistently naming schema components—elements, attributes, types, groups—can go a long way toward making the documents comprehensible. Using a common set of terms rather than multiple synonymous terms is good practice, as is the avoidance of obscure acronyms. In XML Schema, it is helpful to identify the kind of component in its name, for example by using the word “Type” at the end of type names. Namespaces should also be consistently and meaningfully named.
Of course, good documentation is very important to achieving clarity. XML Schema allows components to be documented using annotations. While you probably have other documentation that describes your system, having human-readable definitions of the components in your schema is very useful for people who maintain and use that schema. It also allows you to use tools that automatically generate schema documentation more effectively.
Consistent structure can also help improve clarity. For example, if many different types have child elements Identifier
and Name
, put them first and always in the same order. Reuse of components helps to ensure consistent structure.
It is often difficult to determine how many levels of elements to put in a message. Using intermediate elements that group together related properties can help with understanding. For example, embedding all address-related elements (street
, city
, etc.) inside an Address
child element, not directly inside a Customer
element, is an obvious choice. It makes the components of the address clearly contained and allows you to make the entire address optional or repeating.
It is also often useful to use intermediate elements to contain lists of list-like elements. For example, it is a good idea to embed a repeating sequence of OrderedItem
elements inside an OrderedItems
(plural) container, rather than directly inside a PurchaseOrder
element. These container elements can make messages easier to process and often work better with code generation tools.
However, there is such a thing as excessive use of intermediate elements. XML messages that are a dozen levels deep can become unwieldy and difficult to process.
It is best to minimize the number of ways a particular type of data or content can be expressed. Having multiple ways to represent a particular kind of data or content in your XML documents may seem like a good idea because it is more flexible. However, allowing too many choices is confusing to users, puts more of a burden on applications that process the documents, and can lead to interoperability problems.
Systems will change over time. Schemas should be designed with a plan for how to handle changes in a way that causes minimum impact on the systems that create and process XML documents.
A typical schema versioning strategy differentiates between major versions and minor versions. Major versions, such as 1.0, 2.0, or 3.0, are by definition disruptive and not backward-compatible; at times this is an unavoidable part of software evolution. On the other hand, minor versions, such as 1.1, 1.2, or 1.3, are backward-compatible. They involve changes to schemas that will still allow old message instances to be valid according to the new schema. For example, a version 1.2 message can be valid according to a version 1.3 schema if the version 1.3 limits itself to backward-compatible changes.
Schemas are used heavily by tools—not just for validation but also for the generation of code and documentation. In an ideal world, all schema parsers and toolkits would support the exact same schema language, and all schemas would be interoperable. The unfortunate reality is that tools, especially code generation tools, vary in their support for XML Schema, for several reasons.
• Some toolkits incorrectly implement features of XML Schema because the recommendation is complex and in some cases even ambiguous.
• Some web services toolkits deliberately do not support certain features of XML Schema because they do not find them to be relevant or useful to a particular use case, such as data binding.
• Some XML Schema concepts do not map cleanly onto object-oriented concepts. Even if a toolkit attempts to support these features, it may do so in a less than useful way.
In general, it is advisable to stick to a subset of the XML Schema language that is well supported by the kinds of toolkits you will be using in your environment. For example, features of XML Schema to avoid in a web services environment where data-binding toolkits are in use include
• Mixed content (elements that allow text content as well as children)
• choice
and all
model groups
• Complex content models with nested model groups
• Substitution groups
• Dynamic type substitution using the xsi:type
attribute
• Default and fixed values for elements or attributes
• Redefinition of schema documents
It is advisable to test your schemas against a variety of toolkits to be sure that they can handle them gracefully.
Many organizations that are implementing medium- to large-scale XML vocabularies develop enterprise-wide guidelines for schema design, taking into account the considerations described in this chapter. Sometimes these guidelines are organized into documents that are referred to as Naming and Design Rules (NDR) documents.
Having a cohesive schema design strategy has a number of benefits.
• It promotes a standard approach to schema development that improves consistency and therefore clarity.
• It ensures that certain strategies, such as how to approach versioning, are well thought out before too much investment has been made in development.
• It allows the proposed approach to be tested with toolkits in use in the organization to see if they generate manageable code.
• It serves as a basis for design reviews, which are a useful way for centralized data architects to guide or even enforce design standards within an organization.
A schema design strategy should include the following topics:
• Naming standards: standard word separators, upper versus lower case names, a standard glossary of terms, special considerations for naming types and groups. Naming standards are discussed in Section 21.6 on p. 559.
• Namespaces: what they should be named, how many to have, how many schema documents to use per namespace, how they should be documented. See Section 21.7 on p. 564 for namespace guidelines.
• Schema structure strategy: how many schema documents to have, recommended folder structure, global versus local components. Section 21.5 on p. 550 covers these topics.
• Documentation standards: the types of documentation required for schema components, where they are to be documented. Schema documentation is covered in Section 21.8 on p. 580.
• XML Schema features: a list of allowed (or prohibited) XML Schema features, limited to promote simplicity, better tool support, and interoperability.
• Versioning strategy: whether to require forward compatibility (and if so how to accomplish it), rules for backward compatibility of releases, patterns for version numbering. All of Chapter 23 is devoted to versioning, with particular attention paid to developing a versioning strategy in Section 23.4.1 on p. 636.
• Reuse strategy: recommended methods of achieving reuse, an approach for a common component library. Reuse is covered in Section 22.1 on p. 596.
• Extension strategy: which external standards are approved for use, description of the correct way to incorporate or extend them, how other standards should extend yours and under what conditions. Section 22.2 on p. 599 compares and contrasts six methods for extending schemas.
These considerations are covered in the rest of this chapter and the next two chapters.
There are a number of design decisions that affect the way a schema is organized, without impacting validation of instances. They include whether to use global or local declarations and how to modularize your schemas.
Some schema components can be either global or local. Element and attribute declarations, for example, can be scoped entirely with a complex type (local) or at the top level of the schema document (global). Type definitions (both simple and complex) can be scoped to a particular element or attribute declaration, in which case they are anonymous, or at the top level of the schema document, in which case they are named. Sections 6.1.3 on p. 95, 7.2.3 on p. 119, and 8.2.3 on p. 133 cover the pros and cons of global versus local components.
It is possible to decide individually for each component whether it should be global or local, but it is better to have a consistent strategy that is planned in advance. Table 21–1 provides an overview of the four possible approaches to the global/local decision. The names associated with the approaches (with the exception of Garden of Eden) were developed as the result of a discussion on XML-DEV led by Roger Costello, who wrote them up as a set of best practices at www.xfront.com/GlobalVersusLocal.pdf.
This section provides an overview of the advantages and disadvantages of each approach. All four of these approaches will validate the same instance, so the question is more one of schema design than XML document design.
In all four approaches, the attribute declarations are locally declared. This follows the recommended practice of allowing unqualified attribute names when the attributes are part of the vocabulary being defined by the schema.
The Russian Doll approach is characterized by all local definitions, with the exception of the root element declaration. All types are anonymous, and all element and attribute declarations are local. Example 21–1 is a Russian Doll schema.
The main disadvantage of this approach is that neither the elements nor the types are reusable. This can result in code that is redundant and hard to maintain. It can also be cumbersome to read. With all the indenting it is easy to lose track of where you are in the hierarchy.
There are a few advantages of this approach but they are less compelling.
• Since the elements are locally declared, it is possible to have more than one element with the same name but a different type or other characteristics. For example, there can be a number
child of product
that has a format different from a number
child of order
.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="product" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
• Since the elements are locally declared, their names can be unqualified in the instance. However, this practice is not recommended, as described in Section 21.7.3.6 on p. 579.
• There is only one global element declaration, so it is obvious which one is the root.
• Since there is no reuse, it is easier to see the impact of a change: You simply look up the hierarchy.
The Salami Slice uses global element declarations but anonymous (local) types. This places importance on the element as the unit of reuse. Example 21–2 is a Salami Slice schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
The disadvantage of this approach is that the types are not reusable by multiple element declarations. Often you will have multiple element names that have the same structure, such as billingAddress
and shippingAddress
with the same address structure. Using this model, the entire address structure would need to be respecified each time, or put into a named model group. Anonymous types also cannot be used in derivation—another form of reuse and sometimes an important expression of an information model. Although you can reuse the element declarations, this might mean watered-down element names, such as address
instead of a more specific kind of address. Since elements are globally declared, it is not possible to have more than one element with the same name but a different type or other characteristics; all element names in the entire schema must be unique.
This approach does have some advantages over Russian Doll, namely that it is more readable and does allow some degree of reuse through element declarations. Unlike Russian Doll, it does allow the use of substitution groups, which require global element declarations.
The Venetian Blind approach has local element declarations but named global types. Example 21–3 is a Venetian Blind schema.
The significant advantage to this approach is that the types are reusable. The advantages of named types over anonymous types are compelling and make this approach more flexible and better defined than either of the previous two approaches. Since elements are locally declared, it is possible to have more than one element with the same name but a different type or other characteristics, which also improves flexibility.
The disadvantage of this approach is that element declarations are not reused, so if they have complex constraints such as type alternatives or identity constraints, these need to be respecified in multiple places. Also, element declarations cannot participate in substitution groups.
If substitution groups aren’t needed, this is the author’s preferred approach and is the style used in most of this book. It allows for full reuse through types, but also allows the flexibility of varying element names. It maps very cleanly onto an object-oriented model, where the complex types are analogous to classes and the element declarations are analogous to instance variables that have that class.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog" type="CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
The Garden of Eden approach has all global (named) types and global element declarations. Example 21–4 is a Garden of Eden schema.
The only disadvantage of this approach, other than its verbosity, is that since elements are globally declared, it is not possible to have more than one element with the same name but a different type or other characteristics. All element names in the entire schema must be unique. However, some would consider unique, meaningful element names to be an advantage, and indeed it can simplify processing of the document using technologies like XSLT and SAX.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog" type="CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Garden of Eden is a very viable approach to schema design. Its big advantage is that it allows both element declarations and types to be referenced from multiple components, maximizing their reuse potential. In addition, compared to Venetian Blind, it allows the use of substitution groups. If substitution groups are needed, and it is acceptable to force the uniqueness of element names, this approach is the right choice. Many standard XML vocabularies use this approach.
Overall, the Garden of Eden and Venetian Blind, depending on your requirements, are the recommended approaches. The Russian Doll approach has obvious limitations in terms of reuse, and the Salami Slice approach does not benefit from the very significant advantages of named types over anonymous types.
Another decision related to schema structure is how to modularize your schema documents. Consider a project that involves orders for retail products. The order will contain information from several different domains. It will contain general information that applies to the order itself, such as the order number and date. It may also contain customer information, such as customer name, number, and address. Finally, it may contain product information, such as product number, name, description, and size.
Do you want one big schema document, or three schema documents, one for each of the subject areas (order, customer, and product)? There are a number of advantages to composing your schema representation from multiple schema documents.
• Easier reuse. If schema documents are small and focused, they are more likely to be reused. For example, a product catalog application might want to reuse the definitions from the product schema. This is much more efficient if the product catalog application is not forced to include everything from the order application.
• Ease of maintenance. Smaller schema documents are more readable and manageable.
• Reduced chance of name collisions. If different namespaces are used for the different schema documents, name collisions are less likely.
• Versioning granularity. It is helpful to separate components that change more frequently, or change on a different schedule, into a separate schema document. This creates less disruption when a new version is released. Code lists (simple types with enumerations) are often placed into individual schema documents so they can be versioned separately.
• Access control granularity. Security can be managed per schema document, allowing more granular access control.
Dividing up your schema documents too much, though, can make them hard to manage. For example, having one element declaration or type definition per schema document would necessitate the use of dozens of imports or includes in your schema documents. If different namespaces are used in these schema documents, that compounds the complexity, because an instance document will have to declare dozens of namespaces.
There are several ways to distribute your components among schema documents, for example:
• Subject area. If your instance will contain application data, this could mean one schema document per application or per database. If the instances incorporate XML documents of different types, such as test reports and product specifications, and each is defined by its own root element declaration, it would be logical to use a separate schema document for each document type.
• General/specific. There may be a base set of components that can be extended for a variety of different purposes. For example, you may create a schema document that contains generic (possibly abstract) definitions for purchase orders and invoices, and separate schema documents for each set of industry-specific extensions.
• Basic/advanced. Suppose you have a core set of components that are used in all instances, plus a number of optional components. You may want to define these optional components in a separate schema document. This allows an instance to be validated against just the core set of components or the enhanced set, depending on the application.
• Governance. Schemas should be divided so that a single schema document is not governed by more than one group of people.
• Versioning Schedule. As mentioned above, it is helpful to separate components that are versioned frequently or on different schedules, such as code lists.
Another issue when you break up schema documents is whether to use the same namespace for all of them, or break them up into separate namespaces. This issue is covered in detail in Section 21.7.2 on p. 565.
This section provides detailed recommendations for choosing and managing names for XML elements and attributes, as well as for other XML Schema components. It discusses naming guidelines, the use of qualified and unqualified names, and organizing a namespace.
Consistency in naming can be as important as the names themselves. Consistent names are easier to understand, remember, and maintain. This section provides guidelines for defining an XML naming standard to ensure the quality and consistency of names.
These guidelines apply primarily to the names that will appear in an instance—namely, element, attribute, and notation names. However, much of this section is also applicable to the names used within the schema, such as type, group, and identity constraint names.
Names in XML must start with a letter or underscore (_
), and can contain only letters, digits, underscores (_
), colons (:
), hyphens (-
), and periods (.
). Colons should be reserved for use with namespace prefixes. In addition, an XML name cannot start with the letters xml
in either upper or lower case.
Names in XML are always case-sensitive, so accountNumber
and AccountNumber
are two different element names.
Since schema components have XML names, these name restrictions apply not only to the element and attribute names that appear in instances, but also to the names of the types, named model groups, attribute groups, identity constraints, and notations you define in your schemas.
If a name is made up of several terms, such as “account number,” you should decide on a standard way to separate the terms. It can be done through capitalization (e.g., accountNumber
) or through punctuation (e.g., account-number
).
Some programming languages, database management systems, and other technologies do not allow hyphens or other punctuation in the names they use. Therefore, if you want to directly match your element names, for example, with variable names or database column names, you should use capitalization to separate terms.
If you choose to use capitalization, the next question is whether to use mixed case (e.g., AccountNumber
) or camel case (e.g., accountNumber
). In some programming languages, it is a convention to use mixed case for class names and camel case for instance variables. In XML, this maps roughly to using mixed case for type names and camel case for element names. This is the convention used in this book.
Regardless of which approach you choose, the most important thing is being consistent.
There is no technical limit to the length of an XML name. However, the ideal length of a name is somewhere between four and twelve characters. Excessively long element names, such as HazardousMaterialsHandlingFeeDomestic
, can, if used frequently, add dramatically to the size of the instance. They are also difficult to type, hard to distinguish from other long element names, and not very readable. On the other hand, very short element names, such as b
, can be too cryptic for people who are not familiar with the vocabulary.
In order to encourage consistency, it is helpful to choose a standard set of terms that will be used in your names. Table 21–2 shows a sample list of standard terms. This term list is used in the examples in this book. Synonyms are included in the list to prevent new terms from being created that have the same meaning as other terms.
These consistent terms are then combined to form element and attribute names. For example, “product number” might become productNumber
.
In some cases, the name will be too long if all of the terms are concatenated together. Therefore, it is useful to have a standard abbreviation associated with each term. Instead of productNumber
, prodNumber
may be more manageable by being shorter.
In some contexts, using an element name such as prodNumber
may be redundant. In Example 21–5, it is obvious from the context that the number is a product number as opposed to some other kind of number.
<product>
<prodNumber>557</prodNumber>
<prodName>Short-Sleeved Linen Blouse</prodName>
<prodSize sizeSystem="US-DRESS">10</prodSize>
</product>
In this case, it may be clearer to leave off the prod
term on the child elements, as shown in Example 21–6.
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size system="US-DRESS">10</size>
</product>
There may be other cases where the object is not so obvious. In Example 21–7, there are two names: a customer name and a product name. If we took out the terms cust
and prod
, we would not be able to distinguish between the two names. In this case, it should be left as shown.
<letter>Dear <custName>Priscilla Walmsley</custName>,
Unfortunately, we are out of stock of the
<prodName>Short-Sleeved Linen Blouse</prodName> in size
<prodSize>10</prodSize> that you ordered...</letter>
When creating element and attribute names, it is helpful to list two names: one to be used when the object is obvious, and one to be used in other contexts. This is illustrated in Table 21–3.
Some designers of XML vocabularies wonder whether they should even use namespaces. In general, using namespaces has a lot of advantages:
• It indicates clear ownership of the definitions in that namespace.
• The namespace name, if it is a URL, provides a natural place to locate more information about that XML vocabulary.
• It allows the vocabulary to be combined with other XML vocabularies with a clear separation and without the risk of name collision.
• It is an indication to processing software what kind of document it is.
The downside of using namespaces is the complexity, or perceived complexity. You have to declare them in your schemas and your instances, and the code you write to process the documents has to pay attention to them. Another possible disadvantage is their limited support in DTDs. If you are writing both a DTD and a schema for your vocabulary, it requires special care to flexibly allow namespace declarations and prefixes in the DTD.1
Despite some negative perceptions about namespaces, it is fairly unusual for standard, reusable vocabularies not to use namespaces. If you are writing a one-off XML vocabulary that is internal to a single organization or application, it is probably fine not to use namespaces. However, if you are planning for your vocabulary to be used by a variety of organizations, or combined with other vocabularies, namespaces are highly recommended.
The complexity of namespaces can be somewhat mitigated by choosing a straightforward namespace strategy. For example, using fewer namespaces, using qualified local element names, and using conventional prefixes consistently can make namespaces seem less cumbersome.
Consider a project that involves orders for retail products. An order will contain information from several different domains. It will contain general information that applies to the order itself, such as order number and date. It may also contain customer information, such as customer name, number, and address. Finally, it may contain product information, such as product number, name, description, and size. Is it best to use the same namespace for all of them, or break them up into separate namespaces? There are three approaches:
1. Same namespace: Use the same namespace for all of the schema documents.
2. Different namespaces: Use multiple namespaces, perhaps a different one for each schema document.
3. Chameleon namespaces: Use a namespace for the parent schema document, but no namespaces for the included schema documents.
It is possible to give all the schema documents the same target namespace and use include
to assemble them to represent a schema with that namespace. This is depicted in Figure 21–1.
Example 21–8 shows our three schema documents using this approach. They all have the same target namespace, and ord.xsd
includes the other two schema documents.
ord.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Example 21–9 shows an instance that conforms to the schema. Since there is only one namespace for all of the elements, a default namespace declaration is used.
The advantages of this approach are that it is uncomplicated and the instance is not cluttered by prefixes. The disadvantage is that you cannot have multiple global components with the same name in the same namespace, so you will have to be careful of name collisions.
<order xmlns="http://datypic.com/all"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/all ord.xsd">
<customer>
<name>Priscilla Walmsley</name>
</customer>
<items>
<product>
<number>557</number>
</product>
</items>
</order>
This approach assumes that you have control over all the schema documents. If you are using elements from a namespace over which you have no control, such as the XHTML namespace, you should use the approach described in the next section.
This approach is best within a particular application where you have control over all the schema documents involved.
It is also possible to give each schema document a different target namespace and use an import
(or other method) to assemble the multiple schema documents. This is depicted in Figure 21–2.
Example 21–10 shows our three schema documents using this approach. They all have different target namespaces, and ord.xsd
imports the other two schema documents.
ord.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns:cust="http://datypic.com/cust"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:import schemaLocation="cust.xsd"
namespace="http://datypic.com/cust"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="cust:CustomerType"/>
<xs:element name="items" type="prod:ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
prod.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/cust"
targetNamespace="http://datypic.com/cust"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Example 21–11 shows an instance that conforms to this schema. You are required to declare all three namespaces in the instance and to prefix the element names appropriately. However, since ord.xsd
imports the other two schema documents, you are not required to specify xsi:schemaLocation
pairs for all three schema documents, just the “main” one.
<order xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
xmlns:cust="http://datypic.com/cust"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/ord ord.xsd">
<customer>
<cust:name>Priscilla Walmsley</cust:name>
</customer>
<items>
<prod:product>
<prod:number>557</prod:number>
</prod:product>
</items>
</order>
To slightly simplify the instance, different default namespace declarations could appear at different levels of the document, resulting in the instance shown in Example 21–12. It could be simplified even further by the use of unqualified local element names, as discussed in Section 21.7.3.2 on p. 576.
<order xmlns="http://datypic.com/ord"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/ord ord.xsd">
<customer>
<name xmlns="http://datypic.com/cust">Priscilla Walmsley</name>
</customer>
<items>
<product xmlns="http://datypic.com/prod">
<number>557</number>
</product>
</items>
</order>
This is an obvious approach when you are using namespaces over which you have no control—for example, if you want to include XHTML elements in your product description. There is no point trying to copy all the XHTML element declarations into a new namespace. This would create maintenance problems and would not be very clear to users. In addition, applications that process XHTML may require that the elements be in the XHTML namespace.
The advantage to this approach is that the source and context of an element are very clear. In addition, it allows different groups to be responsible for different namespaces. Finally, you can be less concerned about name collisions, because the names must only be unique within a namespace.
The disadvantage of this approach is that instances are more complex, requiring prefixes for multiple different namespaces. Also, you cannot use the redefine
or override
feature on these components, since they are in a different namespace.
The third possibility is to specify a target namespace only for the “main” schema document, not the included schema documents. The included components then take on the target namespace of the including document. In our example, all of the definitions and declarations in both prod.xsd
and cust.xsd
would take on the http://datypic.com/ord namespace once they are included in ord.xsd
. This is depicted in Figure 21–3.
Example 21–13 shows our three schema documents using this approach. Neither prod.xsd
nor cust.xsd
has a target namespace, while ord.xsd
does. ord.xsd
includes the other two schema documents and changes their namespace as a result.
The instance in this case would look similar to that of the samenamespace approach shown in Example 21–9, since all the elements would be in the same namespace. The only difference is that in this case the namespace would be http://datypic.com/ord instead of http://datypic.com/all.
ord.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
The advantage of this approach is its flexibility. Components can be included in multiple namespaces, and redefined or overridden whenever desired.
The disadvantage is that the risk of name collisions is even more serious. If the non-namespace schema documents grow over time, the risk increases that there will be name collisions with the schema documents that include them. If they were in their own namespace, unexpected collisions would be far less likely.
Another disadvantage is that the chameleon components lack an identity. Namespaces can be well-defined containers that provide a recognizable context as well as specific semantics, documentation, and application code.
Many instances include elements from more than one namespace. This can potentially result in instances with a large number of different prefixes, one for each namespace. However, when an element declaration is local—that is, when it isn’t at the top level of a schema document—you have the choice of using either qualified or unqualified element names in instances, a concept that was introduced in Section 6.3 on p. 98. Let’s recap the two alternatives, this time looking at more complex multinamespace documents.
Example 21–14 shows an instance where all element names are qualified. Every element name has a prefix that maps it to either the product namespace or the order namespace.
<ord:order xmlns:ord="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<ord:number>123123</ord:number>
<ord:items>
<prod:product>
<prod:number>557</prod:number>
</prod:product>
</ord:items>
</ord:order>
This instance is very explicit about which namespace each element is in. There will be no confusion about whether a particular number
element is in the order namespace or in the product namespace. However, the application or person that generates this instance must be aware which elements are in the order namespace and which are in the product namespace.
Example 21–15, on the other hand, shows an instance where only the root element name, order
, is qualified. The other element names have no prefix, and since there is no default namespace provided, they are not in any namespace.
This instance has the advantage of looking slightly less complicated and not requiring the instance author to care about what namespace each element belongs in. In fact, the instance author does not even need to know of the existence of the product namespace.
<ord:order xmlns:ord="http://datypic.com/ord">
<number>123123</number>
<items>
<product>
<number>557</number>
</product>
</items>
</ord:order>
Let’s look at the schemas that would describe these two instances. Example 21–16 shows how to represent the schema for the instance in Example 21–14, which has qualified element names. The representation is made up of two schema documents: ord.xsd
, which defines components in the order namespace, and prod.xsd
, which defines components in the product namespace.
Both schema documents have elementFormDefault
set to qualified
. As a result, locally declared elements must use qualified element names in the instance. In this example, the declaration for order
is global and the declarations for product
, items
, and number
are local.
ord.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="items" type="prod:ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
prod.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
To create a schema for the instance in Example 21–15, which has unqualified names, you can simply change the value of elementFormDefault
in both schema documents to unqualified
. Since the default value is unqualified
, you could alternatively simply omit the attribute. In this case, globally declared elements still must use qualified element names, hence the use of ord:order
in the instance.
An important thing to notice is that the choice between qualified and unqualified names applies only to local element declarations. All globally declared elements must have qualified element names in the instance; there is no way to override this. In our example, all of the element declarations except order
are local. If the product
declaration had been global, the product
elements would have to use qualified element names, regardless of the value of elementFormDefault
.
This can cause confusion if you choose to use unqualified local element names, and you want to mix global and local element declarations. Not only would an instance author be required to know what namespace each element is in, but he or she would also need to know whether it was globally or locally declared in the schema.
This can be avoided by making all element declarations local, except for the declaration for the root elements. However, you may not have this choice if you import element declarations from namespaces that are not under your control. Also, if you plan to use substitution groups, the participating element declarations must be global.
Default namespaces do not mix well with unqualified element names. The instance in Example 21–17 declares the order namespace as the default namespace. However, this will not work with a schema document where elementFormDefault
is set to unqualified
, because it will be looking for the elements items
, product
, and number
in the order namespace, in which they are not—they are not in any namespace.
<order xmlns="http://datypic.com/ord">
<number>123ABBCC123</number>
<items>
<product>
<number>557</number>
</product>
</items>
</order>
Whether to use qualified or unqualified local element names is a matter of style. The advantages of using qualified local element names are:
• You can tell by looking at the document which namespace a name is in. If you see that a b
element is in the XHTML namespace, you can more quickly understand its meaning.
• There is no ambiguity to a person or application what namespace an element belongs in. In our example, there was a number
element in each of the namespaces. In most cases, you can determine from its position in the instance whether it is an order number or a product number, but not always.
• Certain applications or processors, for example an XHTML processor, might be expecting the element names to be qualified with the appropriate namespace.
• You can mix global and local element declarations without affecting the instance authors. You may be forced to make some element declarations global because you are using substitution groups, or because you are importing a schema document over which you have no control. If you use unqualified local names, the instance author has to know which element declarations are global and which are local.
The advantages of using unqualified local element names are:
• The instance author does not have to be aware of which namespace each element name is in. If many namespaces are used in the instance, this can simplify creation of instances.
• The lack of prefixes and namespace declarations makes the instance look less cluttered.
In general, it is best to use qualified element names, for the reasons stated above. If consistent prefixes are used, they just become part of the name, and authors get used to writing prod:number
rather than just number
. Most XML editors assist instance authors in choosing the right element names, prefixed or not.
The decision about qualified versus unqualified forms is simpler for attributes than elements. Qualified attribute names should only be used for attributes that apply to a variety of elements in a variety of namespaces, such as xml:lang
or xsi:type
. Such attributes are almost always declared globally. For locally declared attributes, whose scope is the type definition in which they appear, prefixes add extra text without any additional meaning.
The best way to handle qualification of attribute names is to ignore the form
and attributeFormDefault
attributes completely. Then, globally declared attributes will have qualified names, and locally declared attributes will have unqualified names, which makes sense.
XML Schema is a full-featured language for describing the structure of XML documents. However, it cannot express everything there is to know about an instance or the data it contains. This section explains how you can extend XML Schema to include additional information for users and applications, using two methods: annotations and non-native attributes.
Annotations are represented by annotation
elements, whose syntax is shown in Table 21–4. An annotation
may appear in almost any element in the schema, with the exception of annotation
itself and its children, appinfo
and documentation
. The schema
, override
, and redefine
elements can contain multiple annotation
elements anywhere among their children. All other elements may only contain one annotation
, and it must be their first child.
The content model for annotation
allows two types of children: documentation
and appinfo
. A documentation
element is intended to be human-readable user documentation, and appinfo
is machine-readable for applications. A single annotation
may contain multiple documentation
and appinfo
elements, in any order, which may serve different purposes.
User documentation is represented by the documentation
element. Sometimes it will consist of simple text content used for a description of a component. However, as it can contain child elements, it can be used for a more complex structure. Often, it is preferable to store more detail than just a simple description. The types of user information that you might add to a schema include:
• Descriptive information about what the component means. The name and structure of a type definition or element declaration can explain a lot, but they cannot impart all the semantics of the schema component.
• An explanation of why a component is structured in a particular way, or why certain XML Schema mechanisms are used.
• Metadata such as copyright information, who is responsible for the schema component, its version, and when it was last changed.
• Internationalization and localization parameters, including language translations.
• Examples of valid instances.
The syntax for a documentation
element is shown in Table 21–5. It uses an element wildcard for its content model, which specifies that it may contain any number of elements from any namespace (or no namespace), in any order. Its content is mixed, so it may contain character data as well as children.
The source
attribute can contain a URI reference that points to further documentation. The schema processor does not dereference this URI during validation.
Example 21–18 shows the use of documentation
to document the product
element declaration.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:element name="product" type="ProductType">
<xs:annotation>
<xs:documentation xml:lang="en"
source="http://datypic.com/prod.html#product">
<doc:description>This element represents a product.
</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>
<!--...-->
</xs:schema>
Although you can put character data content directly in documentation
or appinfo
, it is preferable to structure it using at least one child element. This allows the type of information (e.g., description) to be uniquely identified in the case of future additions.
Instead of (or in addition to) including the information in the annotation, you can also provide links to one or more external documents. To do this, you can either use the source
attribute or other mechanisms, such as XLink. This allows reuse of documentation that may apply to more than one schema component.
When creating reusable schema documents such as type libraries, it is helpful to have a complete definition of each component declared or defined in it. This ensures that schema authors are reusing the correct components, and allows you to automatically generate human-readable documentation about the components. Example 21–19 shows a schema document that includes a complete definition of a simple type CountryType
in its documentation. The example is roughly based on ISO 11179, the ISO standard for the specification and standardization of data elements.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:name>Country identifier</doc:name>
<doc:identifier>3166</doc:identifier>
<doc:version>1990</doc:version>
<doc:registrationAuthority>ISO</doc:registrationAuthority>
<doc:definition>A code for the names of countries of the
world.</doc:definition>
<doc:keyword>geopolitical entity</doc:keyword>
<doc:keyword>country</doc:keyword>
<!--...-->
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<!--...-->
</xs:restriction>
</xs:simpleType>
</xs:schema>
Another type of user documentation is code control information, such as when it was created and by whom, its version, and its dependencies. This is illustrated in Example 21–20. The element names used in the example are similar to the keywords in Javadoc.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:author>Priscilla Walmsley</doc:author>
<doc:version>1.1</doc:version>
<doc:since>1.0</doc:since>
<doc:see>
<doc:label>Country Code Listings</doc:label>
<doc:link>http://datypic.com/countries.html</doc:link>
</doc:see>
<doc:deprecated>false</doc:deprecated>
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<!--...-->
</xs:restriction>
</xs:simpleType>
</xs:schema>
There is a reason that schema
, override
, and redefine
elements can have multiple annotations anywhere in their content. These annotations can be used to break a schema document into sections and provide comments on each section. Example 21–21 shows annotations that serve as section comments. Although they are more verbose than regular XML comments (which are also permitted), they are more structured. This means that they can be used, for example, to generate XHTML documentation for the schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:annotation><xs:documentation><sectionHeader>
********* Product-Related Element Declarations ***************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="product" type="ProductType"/>
<xs:element name="size" type="SizeType"/>
<xs:annotation><xs:documentation><sectionHeader>
********* Order-Related Element Declarations *****************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="order" type="OrderType"/>
<xs:element name="items" type="ItemsType"/>
<!--...-->
</xs:schema>
There is a wide variety of use cases for adding application information to schemas. Some of the typical kinds of application information to include are:
• Extra validation rules, such as co-constraints. XML Schema alone cannot express every constraint you might want to impose on your instances.
• Mappings to other structures, such as databases or EDI messages. These mappings are a flexible way to tell an application where to store or extract individual elements.
• Mapping to XHTML forms or other user input mechanisms. The mappings can include special presentation information for each data element, such as translations of labels to other languages.
• Formatting information, such as a stylesheet fragment that can convert the instance element to XHTML, making it presentable to the user.
The syntax for appinfo
, shown in Table 21–6, is identical to that of documentation
, minus the xml:lang
attribute.
Example 21–22 shows the use of appinfo
to map product
to a database table.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:app="http://datypic.com/app">
<xs:element name="product" type="ProductType">
<xs:annotation>
<xs:appinfo>
<app:dbmapping>
<app:tb>PRODUCT_MASTER</app:tb>
</app:dbmapping>
</xs:appinfo>
</xs:annotation>
</xs:element>
<!--...-->
</xs:schema>
In this example, we declare a namespace http://datypic.com/app
for the dbmapping
and tb
elements used in the annotation. This is not required; appinfo
can contain elements with no namespace. However, it is preferable to use a namespace because it makes the extension easily distinguishable from other information that may be included for use by other applications.
In addition to annotations, all schema elements are permitted to have additional attributes. These attributes are known as non-native attributes, since they must be in a namespace other than the XML Schema Namespace. Example 21–23 shows an element declaration that has the non-native attribute description
.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/doc doc.xsd">
<xs:element name="product" type="ProductType"
doc:description="This element represents a product."/>
<!--...-->
</xs:schema>
As with appinfo
and documentation
contents, the non-native attributes are validated through a wildcard with lax validation. If attribute declarations can be found for the attributes, they will be validated, otherwise the processor will ignore them. In this case, the xsi:schemaLocation
attribute points to a schema document for the additional attributes. Example 21–24 shows a schema that might be used to validate the non-native attributes.
The schema does not include new declarations for schema elements. Rather, it contains global declarations of any non-native attributes.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/doc"
targetNamespace="http://datypic.com/doc">
<xs:attribute name="description" type="xs:string"/>
</xs:schema>
This is roughly the same as the general question of whether to use elements or attributes. Both can convey additional information, and both are made available to the application by the schema processor. Non-native attributes are less verbose, and perhaps more clear because they are closer to the definitions to which they apply. However, they have drawbacks: they cannot be used more than once in a particular element, they cannot be extended in the future, and they cannot contain other elements. For example, if you decide that you want the descriptions to be expressed in XHTML, this cannot be done if the description is an attribute. For more information on attributes versus elements, see Section 7.1 on p. 113.
It is generally a good idea to use URLs for namespace names and to put a resource at the location referenced by the URL. There are many reasons not to require your application to dereference a namespace name at runtime, including security, performance, and network availability. However, a person might want to dereference the namespace in order to find out more information about it.
It might seem logical to put a schema document at that location. Having a namespace name resolve to a schema document, though, is not ideal because:
• Many schemas may describe that namespace. Which one do you choose?
• A variety of documents in other formats may also describe that namespace, including DTDs, human-readable documentation, schemas written in other schema languages, and stylesheets. Each may be applicable in different circumstances.
• Schema documents are not particularly human-readable, even by humans who write them!
A better choice is a resource directory, which lists all the resources related to a namespace. Such a directory can be both human- and application-readable. It can also allow different resources to be used depending on the application or purpose.
One language that can be used to define a resource directory is RDDL (Resource Directory Description Language). RDDL is an extension of XHTML that is used to define a resource directory. It does not only apply to namespaces, but it is an excellent choice for documenting a namespace. Example 21–25 shows an RDDL document that might be placed at the location http://datypic.com/prod.
<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN"
"rddl/rddl-xhtml.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:rddl="http://www.rddl.org/">
<head><title>Product Catalog</title></head>
<body><h1>Product Catalog</h1>
<div id="toc"><h2>Table of Contents</h2>
<ol>
<li><a href="#intro">Introduction</a></li>
<li><a href="#related.resources">Resources</a></li>
</ol>
</div>
<div id="intro"><h2>Introduction</h2>
<p>This document describes the <a href="#xmlschemap1">Product
Catalog</a> namespace and contains a directory of links to
related resources.</p>
</div>
<div id="related.resources">
<h2>Related Resources for the Product Catalog Namespace</h2>
<!-- start resource definitions -->
<div class="resource" id="DTD">
<rddl:resource xlink:title="DTD for validation"
xlink:arcrole="http://www.rddl.org/purposes#validation"
xlink:role="http://www.isi.edu/in-
notes/iana/assignments/media-types/text/xml-dtd"
xlink:href="prod.dtd">
<h3>DTD</h3>
<p>A <a href="prod.dtd">DTD</a> for the Product Catalog.</p>
</rddl:resource>
</div>
<div class="resource" id="xmlschema">
<rddl:resource xlink:title="Products schema"
xlink:role="http://www.w3.org/2001/XMLSchema"
xlink:arcrole="http://www.rddl.org/purposes#schema-validation"
xlink:href="prod.xsd">
<h3>XML Schema</h3>
<p>An <a href="prod.xsd">XML Schema</a> for the Product
Catalog.</p>
</rddl:resource>
</div>
<div class="resource" id="documentation">
<rddl:resource xlink:title="Application Documentation"
xlink:role="http://www.w3.org/TR/html4/"
xlink:arcrole="http://www.rddl.org/purposes#reference"
xlink:href="prod.html">
<h3>Application Documentation</h3>
<p><a href="prod.html">Application documentation</a> for
the Product Catalog application.</p>
</rddl:resource>
</div>
</div>
</body></html>
This document defines three related resources. Each resource has a role, which describes the nature of the resource (e.g., schema, DTD, stylesheet), and an arcrole, which indicates the purpose of the resource (e.g., validation, reference). An application that wants to do schema validation, for example, can read this document and extract the location of the schema document to be used for validation. A person could also read this document in a browser, as shown in Figure 21–4.
For more information on RDDL, see www.rddl.org.