Identity constraints allow you to uniquely identify nodes in a document and ensure the integrity of references between them. This chapter explains how to define and use identity constraints.
There are three categories of identity constraints.
• Uniqueness constraints enforce that a value (or combination of values) is unique within a specified scope. For example, all product numbers must be unique within a catalog.
• Key constraints also enforce uniqueness, and additionally require that all values be present. For example, every product must have a number and it must be unique within a catalog.
• Key references enforce that a value (or combination of values) corresponds to a value represented by a key or uniqueness constraint. For example, for every product number that appears as an item in a purchase order, there must be a corresponding product number in the product description section.
The identity constraints described in this chapter are much more powerful than using attributes of types ID
and IDREF
. Limitations of ID
and IDREF
include:
• They are recommended for use only for attributes, not elements.
• They are scoped to the entire document only.
• They are based on one value, as opposed to multifield keys.
• They require ID
or IDREF
to be the type of the attribute, precluding data validation of that attribute.
• They are based on string equality, as opposed to value equality.
• They require that the values be based on XML names, meaning they must start with a letter and can only contain letters, digits, and a few punctuation marks.
However, if ID
and IDREF
fulfill your requirements, there is no reason not to use them, particularly when representing simple cross-references in narrative documents or converting DTDs that are already in use.
The three categories of identity constraints are similar in their definitions and associated rules. This section describes the basic structure of identity constraints. Example 17–1 shows an instance that contains product catalog information.
<catalog>
<department number="021">
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<price currency="USD">49.99</price>
</product>
</department>
</catalog>
Example 17–2 shows the definition of a uniqueness constraint that might be applied to the instance in Example 17–1.
<xs:element name="catalog" type="CatalogType">
<xs:unique name="prodNumKey">
<xs:selector xpath="*/product"/>
<xs:field xpath="number"/>
</xs:unique>
</xs:element>
All three categories of identity constraints are defined entirely within an element declaration. It can be either a global or local element declaration, but it cannot be an element reference. Identity constraints must be defined at the end of the element declaration, after any simpleType
or complexType
child. There can be multiple identity constraints in a single element declaration.
Every identity constraint has a name, which takes on the target namespace of the schema document. The qualified name must be unique among all identity constraints of all categories within the entire schema. For example, it would be illegal to have a key constraint named customerNumber
and a uniqueness constraint named customerNumber
in the same schema, even if they were scoped to different elements.
There are three parts to an identity constraint definition.
1. The scope is an element whose declaration contains the constraint. In our example, a catalog
element is the scope. It is perfectly valid to have two products with the same number if they are contained in two different catalog
elements.
2. The selector serves to select all the nodes to which the constraint applies. In our example, the selector value is */product
, which selects all the product
grandchildren of catalog
.
3. The one or more fields are the element and attribute values whose combination must be unique among the selected nodes. There can be only one instance of the field per selected node. In our example, there is one field specified: the number
child of each product
element.
A uniqueness constraint is used to validate that the values of certain elements or attributes are unique within a particular scope. This is represented by a unique
element, whose syntax is shown in Table 17–1.
In Example 17–2, we used a uniqueness constraint to ensure that all the product numbers in the catalog are unique. It is also possible to ensure uniqueness of a combination of multiple fields. In the instance shown in Example 17–3, each product may have an effective date.
<catalog>
<department number="021">
<product effDate="2000-02-27">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product effDate="2001-04-12">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">39.99</price>
</product>
<product effDate="2001-04-12">
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<price currency="USD">49.99</price>
</product>
</department>
</catalog>
It is valid for two products to have the same number, as long as they have different effective dates. In other words, we want to validate that the combinations of number
and effDate
are unique. Example 17–4 shows the uniqueness constraint that accomplishes this.
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
Note that this example works because both number
and effDate
are subordinate to the product
elements. Using the instance in Example 17–3, it would be invalid to define a multifield uniqueness constraint on the department number and the product number. If you defined the selector to select all departments, the product/number
field would yield more than one field node per selected node, which is not permitted. If you defined the selector to select all products, you would have to access an ancestor node to get the department number, which is not permitted.
You can get around this by defining two uniqueness constraints: one in the scope of catalog
to ensure that all department numbers are unique within a catalog
, and another in the scope of department
to ensure that all product numbers are unique within a department
.
A key constraint is similar to a uniqueness constraint in that the combined fields in the key must be unique. Key constraints have an additional requirement that all of the field values must be present in the document. Therefore, you should not define keys on elements or attributes that are optional. In addition, the fields on which the key is defined cannot be nillable.
Key constraints are represented by key
elements, whose syntax is shown in Table 17–2. It is identical to that of the unique
elements.
Example 17–5 changes Example 17–2 to be a key constraint instead of a uniqueness constraint. In this case, every product
element in the instance would be required to have a number
child, regardless of whether the complex type of product
requires it. The values of those number
children have to be unique within the scope of catalog
.
<xs:element name="catalog" type="CatalogType">
<xs:key name="prodNumKey">
<xs:selector xpath="*/product"/>
<xs:field xpath="number"/>
</xs:key>
</xs:element>
Key references are used to ensure that there is a match between two sets of values in an instance. They are similar to foreign keys in databases. Key references are represented by keyref
elements, whose syntax is shown in Table 17–3.
The refer
attribute is used to reference a key or uniqueness constraint by its qualified name. If the constraint is defined in a schema document with a target namespace, the refer
attribute must reference a name that is either prefixed or in the scope of a default namespace declaration.
Suppose we have an order for three items: two shirts and one sweater, as shown in Example 17–6. The two shirts are the same except for their color, so they both have the same product number. All the descriptive product information appears at the end of the order. We want a way to ensure that every item in the order has a corresponding product description in the document.
<order>
<number>123ABBCC123</number>
<items>
<shirt number="557">
<quantity>1</quantity>
<color value="blue"/>
</shirt>
<shirt number="557">
<quantity>1</quantity>
<color value="sage"/>
</shirt>
<hat number="563">
<quantity>1</quantity>
</hat>
</items>
<products>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
</products>
</order>
Example 17–7 shows the definition of a key reference and its associated key. In this example, the number
attribute of any child of items
must match a number
child of a product
element. The meaning of the XPath syntax will be described in detail later in this chapter.
Note that the key reference field values are not required to be unique; that is not their purpose. It is valid to have duplicate shirt numbers in the items
section.
As with key and uniqueness constraints, key references can be on multiple fields. There must be an equal number of fields in the key reference as there are in the key or uniqueness constraint that it references. The fields are matched in the same order, and they must have related types.
<xs:element name="order" type="OrderType">
<xs:keyref name="prodNumKeyRef" refer="prodNumKey">
<xs:selector xpath="items/*"/>
<xs:field xpath="@number"/>
</xs:keyref>
<xs:key name="prodNumKey">
<xs:selector xpath=".//product"/>
<xs:field xpath="number"/>
</xs:key>
</xs:element>
There is an additional constraint on the scope of key references and key constraints. The key
referenced by a keyref
must be defined in the same element declaration or in a declaration of one of its descendants. It is not possible for a keyref
to reference a key
that is defined in a sibling or ancestor element declaration. In our example, the key
and keyref
were both defined in the declaration of order
. It would also have been valid if the key
had been defined in the products
declaration. However, it would have been invalid if the keyref
had been defined in the items
declaration, because items
is a child of order
.
When defining key references, it is important to understand XML Schema’s concept of equality. When determining whether two values are equal, their type is taken into account. Values with unrelated types will never be considered equal. For example, a value 2
of type string
is not equal to a value 2
of type integer
. However, if two types are related by restriction, such as integer
and positiveInteger
, they can have equal values. When you define a key reference, make sure that the types of its fields are related to the types of the fields in the referenced key or uniqueness constraint. In Example 17–7, if the number
attribute of shirt
were declared as an integer
and the number
child of product
were declared as a string
, there would have been no matches. For more information on type equality, see Section 11.7 on p. 253.
All three categories of identity constraints are specified in terms of a selector and one or more fields. This section explains selectors and fields in more detail.
The purpose of a selector is to identify the set of nodes to which the constraint applies. The selector is relative to the scoping element. In Example 17–2, our selector was */product
. This selects all the product
grandchildren of catalog
. There may be other grandchildren of catalog
, or other product
elements elsewhere in the document, but the constraint does not apply to them.
The selector is represented by a selector
element, whose syntax is shown in Table 17–4.
Each field must identify a single node relative to each node selected by the selector. The key reference in Example 17–7 works because there can only ever be one number
attribute per selected node. In the instance in Example 17–6, the selector selects three nodes (the three children of items
), and there is only one number
attribute per node.
You might have been tempted to define a uniqueness constraint as shown in Example 17–8. This would not work because the selector would select one node (the single department
element) and there would be three product/number
nodes relative to it.
<xs:element name="catalog" type="CatalogType">
<xs:unique name="prodNumKey">
<xs:selector xpath="department"/>
<xs:field xpath="product/number"/>
</xs:unique>
</xs:element>
The elements or attributes that are used as fields must have simple content and cannot be declared nillable.
Fields are represented by field
elements, whose syntax is shown in Table 17–5.
All values of the xpath
attribute in the selector
and field
tags must be legal XPath expressions. However, they must also conform to a subset of XPath that is defined specifically for identity constraints.
XPath expressions are made up of paths, separated by vertical bars. For example, the XPath expression department/product/name| department/product/price
uses two paths to select all the nodes that are either name
or price
children of product
elements whose parent is department
.
Each path may begin with the .//
literal, which means that the matching nodes may appear anywhere among the descendants of the current scoping element. If it is not included, it is assumed that matching nodes may appear only as direct children of the scoping element.
Each path is made up of steps, separated by forward slashes. For example, the path department/product/name
is made up of three steps: department
, product
, and name
. Table 17–6 lists the types of steps that may appear in the identity constraint XPath subset.
The context node of the selector expression is the element in whose declaration the identity constraint is defined. The context node of the field expression is the result of evaluating the selector expression.
Table 17–7 shows some legal XPath expressions for selectors and fields. They assume that the scope of the identity constraint is the catalog
element, as shown in Example 17–3.
Technically, any of the XPath expressions in Table 17–7 is legal for a field. However, since the field XPath can only identify a node that appears once relative to the selected node, most of the expressions that contain wildcards to select multiple nodes are inappropriate for fields. The field XPath will usually consist of a single child element or a single attribute.
Table 17–8 shows some expressions that, while they are legal XPath, are not in the identity constraint XPath subset.
Special consideration must be given to namespaces when defining identity constraints. By default, qualified element names and attribute names used in the XPath expressions must be prefixed in order to be legal. Let’s take another look at our uniqueness constraint from Example 17–4. That definition assumed that the schema document had no target namespace. If we add a target namespace, it looks like Example 17–9.
Each of the element names in the XPath is prefixed with prod
, mapping it to the http://datypic.com/prod namespace. In our example, all element declarations (department
, product
, and number
) are global, and therefore their names must be prefixed. Let’s assume that the attribute effDate
is locally declared and unqualified, so its name is not prefixed in the XPath expression.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="prod:CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="prod:department/prod:product"/>
<xs:field xpath="prod:number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="prod:DepartmentType"/>
<xs:element name="product" type="prod:ProductType"/>
<xs:element name="number" type="xs:integer"/>
<!--...-->
</xs:schema>
The names that must be qualified in an XPath expression are those that must be qualified in an instance, namely:
• All element names and attribute names in global declarations
• Element names and attribute names in local declarations whose form is qualified
, either directly, using the form
attribute, or indirectly through elementFormDefault
or attributeFormDefault
Note that the target namespace is mapped to a prefix, rather than being the default namespace. This is because XPath expressions are not affected by default namespace declarations. Unprefixed names in XPath expressions are assumed to be in no namespace, even if a default namespace declaration is in scope.
Therefore, if you want to use identity constraints in a schema document that has a target namespace, you must map the target namespace to a prefix. Example 17–10 uses unprefixed names in the XPath expressions, assuming that these names take on the default namespace. This is not the case; in fact, these elements will not be found because the processor will be looking for elements with unqualified names when evaluating the XPath expressions.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="DepartmentType"/>
<xs:element name="product" type="ProductType"/>
<xs:element name="number" type="xs:integer"/>
<!--...-->
</xs:schema>
In version 1.1, this problem is alleviated somewhat because you can specify an xpathDefaultNamespace
attribute, which designates the default namespace for all unprefixed element names that are used in the XPath. It does not affect attribute names.
Example 17–11 uses the xpathDefaultNamespace
attribute on the schema
element. This means that the element names department
, product
, and number
used in the selector and field XPaths are interpreted as being in the http://datypic.com/prod namespace.
Instead of specifying a namespace name, the xpathDefaultNamespace
attribute can contain one of three special keywords: ##targetNamespace
, ##defaultNamespace
, or ##local
. These are described in detail in Section 14.1.3.1 on p. 373.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="DepartmentType"/>
<xs:element name="product" type="ProductType"/>
<xs:element name="number" type="xs:integer"/>
<!--...-->
</xs:schema>
In version 1.1, identity constraints can be defined once and referenced from multiple elements. This is true for all three kinds of identity constraints: uniqueness constraints, key constraints, and key references. This is useful if you have the same constraints in multiple scopes and want to reuse the code.
The syntax for referencing an identity constraint is shown in Table 17–9. It is the same for all three kinds of identity constraints. Instead of a name
attribute, it has a ref
attribute that references the identity constraint by its qualified name. References to identity constraints do not contain selector
or field
elements; they take their definition from the constraint they reference.
Example 17–12 shows a new element declaration discontinuedProductList
that has the same uniqueness constraint as catalog
. To indicate this, it contains a unique
element, but with a ref
attribute instead of a name
. Note that the two element declarations specify the same type; this is not a requirement, but it is common since most identity constraints would only be shared among elements that contain a similar structure.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="discontinuedProductList" type="CatalogType">
<xs:unique ref="dateAndProdNumKey"/>
</xs:element>
<!--...-->
</xs:schema>
Being able to reference identity constraints is also useful when restricting types. In version 1.0, if you used a local element declaration that contained an identity constraint, it was impossible to restrict the complex type that contained it because there was no formal definition of a valid restriction of an identity constraint. Now that it can be named and referenced, there is a formal way of indicating that an identity constraint is the same as the identity constraint in the base type. This is shown in Example 17–13, where the catalog
element declaration in the base type has an identity constraint, and the catalog
element declaration in the derived type references that identity constraint.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:complexType name="CatalogListType">
<xs:sequence>
<xs:element name="catalog" type="CatalogType"
maxOccurs="unbounded">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:complexType name="RestrictedCatalogListType">
<xs:complexContent>
<xs:restriction base="CatalogListType">
<xs:sequence>
<xs:element name="catalog" type="CatalogType"
maxOccurs="1">
<xs:unique ref="dateAndProdNumKey"/>
</xs:element>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<!--...-->
</xs:schema>