Chapter 21. Schema design and documentation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 21. Schema design and documentation

It is fairly easy to create a schema once you know the syntax. It is harder to design one well. This chapter focuses on a strategy for designing schemas that are accurate, durable, and easy to implement. Carefully planning your schema design strategy is especially important when creating a complex set of schemas, or a standard schema that is designed to be used and extended by others.

21.1. The importance of schema design

Schemas are a fundamental part of many XML-based applications, whether XML is being used in temporary messages for information sharing or as an enduring representation of content (e.g., in publishing). Enterprise architects, DBAs, and software developers often devote a lot of time to data design: They create enterprise data models, data dictionaries with strict naming, and documentation standards, and carefully design and optimize relational databases. Unfortunately, software designers and implementers often do not pay as much attention to good design when it comes to XML messages.

There are several reasons for this. Some people feel that with transitory XML messages, it is not important how they are structured. Some decide that it is easier to use whatever schema is generated for them by a toolkit. Others decide to use an industry-standard XML vocabulary, but fail to figure out how their data really fits into that standard, or to come up with a strategy for customizing it for their needs.

As with any data design, there are many ways to organize XML messages. For example, decisions must be made about how many levels of elements to include, whether the elements should represent generic or more specific concepts, how to represent relationships, and how far to break down data into separate elements. In addition, there are multiple ways to express the same XML structure in XML Schema. Decisions must be made about whether to use global versus local declarations, whether to use named versus anonymous types, whether to achieve reuse through type extension or through named model groups, and how schemas should be broken down into separate schema documents.

The choices you make when designing a schema can have a significant impact on the ease of implementation, ease of maintenance, and even the ongoing relevance of the system itself. Failure to take into account design goals such as reuse, graceful versioning, flexibility, and tool support can have serious financial impacts on software development projects.

21.2. Uses for schemas

When designing a schema, it is first important to understand what it will be used for. Schemas actually play several roles.

• Validation. Validation is the purpose that is most often associated with schemas. Given an XML document, you can use a schema to automatically determine whether that document is valid or not. Are all of the required elements there, in the right order? Do they contain valid values according to their data types? Schema validation does a good job of checking the basic structure and content of elements.

• A service contract. A schema serves as part of the understanding between two parties. The document provider and the document consumer can both use the schema as a machine-enforceable set of rules describing an interface between two systems or services.

• Documentation. Schemas are used to document the XML structure for the developers and end users that will be implementing or using it. Narrative human-readable annotations can be added to schema components to further document them. Although schemas themselves are not particularly human-readable, they can be viewed by less technical users in a graphical XML editor tool. In addition, there are a number of tools that will generate HTML documentation from schemas, making them more easily understood.

• Providing type information. Schemas contain information about the data types that can affect how the information is processed. For example, if the schema tells an XSLT 2.0 stylesheet that a value is an integer, it will know to sort it and compare to other values as an integer instead of a string.

• Assisted editing. For documents that will be hand-modified by human users, a schema can be used by XML editing software to provide context-sensitive validation, help, and content completion.

• Code generation. Schemas are also commonly used, particularly in web services and other structured data interfaces, to generate classes and interfaces that read and write the XML message payloads. When a schema is designed first, classes can be generated automatically from the schema definitions, ensuring that they match. Other software artifacts can also be generated from schemas, for example, data entry forms.

• Debugging. Schemas can assist in the debugging and testing processes for applications that will process the XML. For example, importing a schema into a XSLT 2.0 stylesheet or an XQuery query can help identify invalid paths and type errors in the code that, otherwise, may not have been found during testing.

As you can see, schemas are an important part of an XML implementation, and can be involved at both design time and run time. Although it is certainly possible to use XML without schemas, valuable functionality would be lost. You would be forced to use a nonstandard method to validate your messages, document your system interfaces, and generate code for web services. You also would not be able to take advantage of the many schema-based tools that implement this functionality at low cost.

The various roles of schemas should be taken into account when designing them. For example, use of obscure schema features can make code generation difficult, and not adequately documenting schemas can impact the usefulness of generated documentation.

21.3. Schema design goals

Designing schemas well is a matter of paying attention to certain important design considerations: flexibility and extensibility, reusability, clarity and simplicity, support for versioning, interoperability, and tool support. This section takes a closer look at each of these design goals.

21.3.1. Flexibility and extensibility

Schema design often requires a balancing act between flexibility, on the one hand, versus rigidity on the other. For example, suppose I am selling digital cameras that have a variety of features, such as resolution, battery type, and screen size. Each camera model has a different set of features, and the types of features change over time as new technology is developed. When designing a message that incorporates these camera descriptions, I want enough flexibility to handle variations in feature types, without having to redesign my message every time a new feature comes along. On the other hand, I want to be able to accurately and precisely specify these features.

To allow for total flexibility in the camera features, I could declare a features element whose type contains an element wildcard, which means that any well-formed XML is allowed. This would have the advantage of being extremely versatile and adaptable to change. The disadvantage is that the message structure is very poorly defined. A developer trying to write an application to process the message would have no idea what features to expect and what format they might have.

On the other hand, I can declare highly constrained elements for each feature, with no opportunity for variation. This has the benefit of making the features well defined, easy to validate, and much more predictable. Validation is more effective because certain features can be required and their values can be constrained by specific data types. However, the schema is brittle because it must be changed every time a new feature is introduced. When the schema changes, the applications that process the documents must also often change.

The ideal design is usually somewhere in the middle. A balanced approach in the case of the camera features might be to create a repeating feature element that contains the name of the feature as an attribute and the value of the feature as its content. This eliminates the brittleness while still providing a predictable structure for implementers.

21.3.2. Reusability

Reuse is an important goal in the design of any software. Schemas that reuse XML components across multiple kinds of documents are easier for developers and users to learn, are more consistent, and save development and maintenance time that could be spent writing redundant software components.

Using XML Schema, reuse can be achieved in a number of ways.

• Reusing types. It is highly desirable to reuse complex and simple types in multiple element and attribute declarations. For example, you can define a complex type named AddressType that represents a mailing address, and then use it for both BillingAddress and ShippingAddress elements. Only named, global types can be reused, so types in XML Schema should generally be named.

• Type inheritance. In XML Schema, complex types can be specialized from other types using type extensions. For example, I can create a more generic type ProductType and derive types named CameraType and LensType from it. This is a form of reuse because CameraType and LensType inherit a shared set of properties from ProductType.

• Named model groups and attribute groups. Through the use of named model groups, it is possible to define reusable pieces of content models. This is a useful alternative to type inheritance for types that are semantically different but just happen to share some properties with other types.

• Reusing schema documents. Entire schema documents can be reused by taking advantage of the include and import mechanisms of XML Schema. This is useful for defining components that might be used in several different contexts or services. In order to plan for reuse, schema documents should be broken down into logical components by subject area. Having schema documents that are too large and all-encompassing tends to inhibit reuse because it forces other schema documents to take all or nothing when importing them. It is also good practice to create a “core components” schema that has low-level building blocks, such as types for Address and Quantity, that are imported by all other schema documents.

21.3.3. Clarity and simplicity

When human users are creating and updating XML documents, clarity is of the utmost importance. If users have difficulty understanding the document structure, it will take far more time to edit a document, and the editing process will be much more prone to errors. Even when XML documents are both written and read by software applications, they still should be designed so that they are easy to conceptualize and process. Implementers on both sides—those who create XML documents and those who consume them—are writing and maintaining applications to process these messages, and they must understand them. Overly complex message designs lead to overly complex applications that create and process them, and both are hard to learn and maintain.

21.3.3.1. Naming and documentation

Properly and consistently naming schema components—elements, attributes, types, groups—can go a long way toward making the documents comprehensible. Using a common set of terms rather than multiple synonymous terms is good practice, as is the avoidance of obscure acronyms. In XML Schema, it is helpful to identify the kind of component in its name, for example by using the word “Type” at the end of type names. Namespaces should also be consistently and meaningfully named.

Of course, good documentation is very important to achieving clarity. XML Schema allows components to be documented using annotations. While you probably have other documentation that describes your system, having human-readable definitions of the components in your schema is very useful for people who maintain and use that schema. It also allows you to use tools that automatically generate schema documentation more effectively.

21.3.3.2. Clarity of structure

Consistent structure can also help improve clarity. For example, if many different types have child elements Identifier and Name, put them first and always in the same order. Reuse of components helps to ensure consistent structure.

It is often difficult to determine how many levels of elements to put in a message. Using intermediate elements that group together related properties can help with understanding. For example, embedding all address-related elements (street, city, etc.) inside an Address child element, not directly inside a Customer element, is an obvious choice. It makes the components of the address clearly contained and allows you to make the entire address optional or repeating.

It is also often useful to use intermediate elements to contain lists of list-like elements. For example, it is a good idea to embed a repeating sequence of OrderedItem elements inside an OrderedItems (plural) container, rather than directly inside a PurchaseOrder element. These container elements can make messages easier to process and often work better with code generation tools.

However, there is such a thing as excessive use of intermediate elements. XML messages that are a dozen levels deep can become unwieldy and difficult to process.

21.3.3.3. Simplicity

It is best to minimize the number of ways a particular type of data or content can be expressed. Having multiple ways to represent a particular kind of data or content in your XML documents may seem like a good idea because it is more flexible. However, allowing too many choices is confusing to users, puts more of a burden on applications that process the documents, and can lead to interoperability problems.

21.3.4. Support for graceful versioning

Systems will change over time. Schemas should be designed with a plan for how to handle changes in a way that causes minimum impact on the systems that create and process XML documents.

A typical schema versioning strategy differentiates between major versions and minor versions. Major versions, such as 1.0, 2.0, or 3.0, are by definition disruptive and not backward-compatible; at times this is an unavoidable part of software evolution. On the other hand, minor versions, such as 1.1, 1.2, or 1.3, are backward-compatible. They involve changes to schemas that will still allow old message instances to be valid according to the new schema. For example, a version 1.2 message can be valid according to a version 1.3 schema if the version 1.3 limits itself to backward-compatible changes.

21.3.5. Interoperability and tool compatibility

Schemas are used heavily by tools—not just for validation but also for the generation of code and documentation. In an ideal world, all schema parsers and toolkits would support the exact same schema language, and all schemas would be interoperable. The unfortunate reality is that tools, especially code generation tools, vary in their support for XML Schema, for several reasons.

• Some toolkits incorrectly implement features of XML Schema because the recommendation is complex and in some cases even ambiguous.

• Some web services toolkits deliberately do not support certain features of XML Schema because they do not find them to be relevant or useful to a particular use case, such as data binding.

• Some XML Schema concepts do not map cleanly onto object-oriented concepts. Even if a toolkit attempts to support these features, it may do so in a less than useful way.

In general, it is advisable to stick to a subset of the XML Schema language that is well supported by the kinds of toolkits you will be using in your environment. For example, features of XML Schema to avoid in a web services environment where data-binding toolkits are in use include

• Mixed content (elements that allow text content as well as children)

• choice and all model groups

• Complex content models with nested model groups

• Substitution groups

• Dynamic type substitution using the xsi:type attribute

• Default and fixed values for elements or attributes

• Redefinition of schema documents

It is advisable to test your schemas against a variety of toolkits to be sure that they can handle them gracefully.

21.4. Developing a schema design strategy

Many organizations that are implementing medium- to large-scale XML vocabularies develop enterprise-wide guidelines for schema design, taking into account the considerations described in this chapter. Sometimes these guidelines are organized into documents that are referred to as Naming and Design Rules (NDR) documents.

Having a cohesive schema design strategy has a number of benefits.

• It promotes a standard approach to schema development that improves consistency and therefore clarity.

• It ensures that certain strategies, such as how to approach versioning, are well thought out before too much investment has been made in development.

• It allows the proposed approach to be tested with toolkits in use in the organization to see if they generate manageable code.

• It serves as a basis for design reviews, which are a useful way for centralized data architects to guide or even enforce design standards within an organization.

A schema design strategy should include the following topics:

• Naming standards: standard word separators, upper versus lower case names, a standard glossary of terms, special considerations for naming types and groups. Naming standards are discussed in Section 21.6 on p. 559.

• Namespaces: what they should be named, how many to have, how many schema documents to use per namespace, how they should be documented. See Section 21.7 on p. 564 for namespace guidelines.

• Schema structure strategy: how many schema documents to have, recommended folder structure, global versus local components. Section 21.5 on p. 550 covers these topics.

• Documentation standards: the types of documentation required for schema components, where they are to be documented. Schema documentation is covered in Section 21.8 on p. 580.

• XML Schema features: a list of allowed (or prohibited) XML Schema features, limited to promote simplicity, better tool support, and interoperability.

• Versioning strategy: whether to require forward compatibility (and if so how to accomplish it), rules for backward compatibility of releases, patterns for version numbering. All of Chapter 23 is devoted to versioning, with particular attention paid to developing a versioning strategy in Section 23.4.1 on p. 636.

• Reuse strategy: recommended methods of achieving reuse, an approach for a common component library. Reuse is covered in Section 22.1 on p. 596.

• Extension strategy: which external standards are approved for use, description of the correct way to incorporate or extend them, how other standards should extend yours and under what conditions. Section 22.2 on p. 599 compares and contrasts six methods for extending schemas.

These considerations are covered in the rest of this chapter and the next two chapters.

21.5. Schema organization considerations

There are a number of design decisions that affect the way a schema is organized, without impacting validation of instances. They include whether to use global or local declarations and how to modularize your schemas.

21.5.1. Global vs. local components

Some schema components can be either global or local. Element and attribute declarations, for example, can be scoped entirely with a complex type (local) or at the top level of the schema document (global). Type definitions (both simple and complex) can be scoped to a particular element or attribute declaration, in which case they are anonymous, or at the top level of the schema document, in which case they are named. Sections 6.1.3 on p. 95, 7.2.3 on p. 119, and 8.2.3 on p. 133 cover the pros and cons of global versus local components.

It is possible to decide individually for each component whether it should be global or local, but it is better to have a consistent strategy that is planned in advance. Table 21–1 provides an overview of the four possible approaches to the global/local decision. The names associated with the approaches (with the exception of Garden of Eden) were developed as the result of a discussion on XML-DEV led by Roger Costello, who wrote them up as a set of best practices at www.xfront.com/GlobalVersusLocal.pdf.

Table 21–1. Schema structure patterns

This section provides an overview of the advantages and disadvantages of each approach. All four of these approaches will validate the same instance, so the question is more one of schema design than XML document design.

In all four approaches, the attribute declarations are locally declared. This follows the recommended practice of allowing unqualified attribute names when the attributes are part of the vocabulary being defined by the schema.

21.5.1.1. Russian Doll

The Russian Doll approach is characterized by all local definitions, with the exception of the root element declaration. All types are anonymous, and all element and attribute declarations are local. Example 21–1 is a Russian Doll schema.

The main disadvantage of this approach is that neither the elements nor the types are reusable. This can result in code that is redundant and hard to maintain. It can also be cumbersome to read. With all the indenting it is easy to lose track of where you are in the hierarchy.

There are a few advantages of this approach but they are less compelling.

• Since the elements are locally declared, it is possible to have more than one element with the same name but a different type or other characteristics. For example, there can be a number child of product that has a format different from a number child of order.

Example 21–1. Schema for Russian Doll approach

Table of Contents for Chapter 21. Schema design and documentation

Create new playlist

Sign In

Sign Up

Chapter 21. Schema design and documentation

21.1. The importance of schema design

21.2. Uses for schemas

21.3. Schema design goals

21.3.1. Flexibility and extensibility

21.3.2. Reusability

21.3.3. Clarity and simplicity

21.3.3.1. Naming and documentation

21.3.3.2. Clarity of structure

21.3.3.3. Simplicity

21.3.4. Support for graceful versioning

21.3.5. Interoperability and tool compatibility

21.4. Developing a schema design strategy

21.5. Schema organization considerations

21.5.1. Global vs. local components

21.5.1.1. Russian Doll

21.5.1.2. Salami Slice

21.5.1.3. Venetian Blind

21.5.1.4. Garden of Eden

21.5.2. Modularizing schema documents

21.6. Naming considerations

21.6.1. Rules for valid XML names

21.6.2. Separators

21.6.3. Name length

21.6.4. Standard terms and abbreviations

21.6.5. Use of object terms

21.7. Namespace considerations

21.7.1. Whether to use namespaces

21.7.2. Organizing namespaces

21.7.2.1. Same namespace

21.7.2.2. Different namespaces

21.7.2.3. Chameleon namespaces

21.7.3. Qualified vs. unqualified forms

21.7.3.1. Qualified local names

21.7.3.2. Unqualified local names

21.7.3.3. Using form in schemas

21.7.3.4. Form and global element declarations

21.7.3.5. Default namespaces and unqualified names

21.7.3.6. Qualified vs. unqualified element names

21.7.3.7. Qualified vs. unqualified attribute names

21.8. Schema documentation

21.8.1. Annotations

21.8.2. User documentation

21.8.2.1. Documentation syntax

21.8.2.2. Data element definitions

21.8.2.3. Code documentation

21.8.2.4. Section comments

21.8.3. Application information

21.8.4. Non-native attributes

21.8.4.1. Design hint: Should I use annotations or non-native attributes?

21.8.5. Documenting namespaces

Table of Contents for
Chapter 21. Schema design and documentation