We’ll start with the basics of XML Schemas and XML Namespaces. It’s assumed that you already understand how to use basic XML elements and attributes. If you don’t, you should probably read a primer on XML before proceeding. I recommend the book Learning XML by Erik T. Ray (O’Reilly). If you already understand how XML Schema and XML Namespaces work, skip ahead to the section on SOAP.
An XML Schema is similar in purpose to a DTD (Document Type Definition), which validates the structure of an XML document. To illustrate some of the basic concepts of XML Schema, let’s start with an XML document with address information:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?> <address> <street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address>
In order to ensure that the XML document contains the proper type of elements and data, the Address information must be evaluated for correctness . There are two ways that the correctness of an XML document can be measured: if it is well formed and if it is valid . To be well formed, an XML document must obey the syntactic rules of the XML markup language: it must use proper attribute declarations, the correct characters to denote the start and end of elements, and so on. Most XML parsers based on standards like SAX and DOM detect documents that aren’t well formed automatically.
In addition to being well formed, it’s sometimes important to check that the document uses the right types of elements and attributes in the correct order and structure. A document that meets these criteria is called valid. However, the criteria for validity have nothing to do with XML itself; they have more to do with application in which the document is used. For example, the Address document would not be valid if it didn’t include the Zip code or state elements. In order to validate an XML document, you need a way to represent these application-specific constraints.
The XML Schema for the Address XML document looks like this:
<?xml version='1.0' encoding='UTF-8' ?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:titan="http://www.titan.com/Reservation" targetNamespace="http://www.titan.com/Reservation"> <element name="address" type="titan:AddressType"/> <complexType name="AddressType"> <sequence> <element name="street" type="string"/> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </complexType> </schema>
The first thing to focus on in this XML Schema is the
<complexType>
element, which declares a type
of element in much the same way that a Java class declares a type of
object. The <complexType>
element explicitly
declares the names, types, and order of elements that an
AddressType
element may contain. In this case, it
may contain five elements of type string
in the
following order: street
, city
,
state
, and zip
. Validation is
pretty strict, so any XML document that claims conformance with this
XML Schema must contain exactly the right elements with the right
data types, in the correct order.
There are about two dozen simple data types that are automatically supported by XML Schema, called built-in types. Built-in types are a part of the XML Schema language and are automatically supported by any XML Schema-compliant parser. Table 14-1 shows a short list of some of the built-in types. It also shows Java types that correspond to each built-in type. (Table 14-1 presents only a subset of all the XML Schema (XSD) built-in types, but it’s more than enough for this book.)
Table 14-1. XML Schema built-in types and their corresponding Java types
XML Schema built-in type |
Java primitive type |
---|---|
byte |
byte |
boolean |
boolean |
short |
short |
int |
int |
long |
long |
float |
float |
double |
double |
string |
java.lang.String |
dateTime |
java.util.Calendar |
integer |
java.math.BigInteger |
decimal |
java.math.BigDecimal |
By default, each element declared by a
<complexType>
must occur once in an XML document, but
you can specify that an element is optional or that it must occur
more than once by using the occurrence attributes. For example, we
can say that the street element must occur once but may occur two
times:
<complexType name="AddressType">
<sequence>
<element name="street" type="string" maxOccurs="2" minOccurs="1" />
<element name="city" type="string"/>
<element name="state" type="string"/>
<element name="zip" type="string"/>
</sequence>
</complexType>
By default, the maxOccurs
and
minOccurs
attributes are always
1
, indicating that the element must occur exactly
once. Setting the maxOccurs
to
"2
" allows an XML document to have two street
elements or just one. You can also set the
maxOccurs
to "unbounded
“, which
means the element may occur as many times as needed. Setting
minOccurs
to "0
" means the
element is optional and can be omitted.
The <element>
declarations are nested under
a <sequence>
element, which indicates that
the elements must occur in the order they are declared. You can also
nest the elements under an <all>
declaration, which allows the elements to appear in any order. The
following shows the AddressType
declared with an
<all>
element instead of a
<sequence>
element:
<complexType name="AddressType"><all>
<element name="street" type="string" maxOccurs="2" minOccurs="1" /> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/></all>
</complexType>
In addition to declaring elements of XSD built-in types, you can
declare elements based on complex types. This is similar to how Java
class types declare fields that are other Java class types. For
example, we can define a CustomerType
that makes
use of the AddressType
:
<?xml version='1.0' encoding='UTF-8' ?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:titan="http://www.titan.com/Reservation" targetNamespace="http://www.titan.com/Reservation"> <element name="customer" type="titan:CustomerType"/> <complexType name="CustomerType
"> <sequence> <element name="last-name" type="string"/> <element name="first-name" type="string"/><element name="address" type="titan:AddressType"/>
</sequence> </complexType> <complexType name="AddressType
"> <sequence> <element name="street" type="string" /> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </complexType> </schema>
This XSD tells us that an element of CustomerType
must contain a <last-name>
and
<first-name>
element of built-in type
string
, and an element of type
AddressType
. This is pretty straightforward,
except for the titan
: prefix on
AddressType
. That prefix identifies the XML
Namespace of the AddressType
;
we’ll discuss namespaces later in the chapter. For
now, just think of it as declaring that the
AddressType
is a custom type defined by Titan
Cruises; it’s not a standard XSD built-in type. An
XML document that conforms to the Customer XSD would look like this:
<?xml version='1.0' encoding='UTF-8' ?> <customer> <last-name>Jones</last-name> <first-name>Sara</first-name> <address> <street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address> </customer>
Building on what you’ve learned so far, we can
create a Reservation schema, using the
CustomerType
and the
AddressType
, and a new
CreditCardType
:
<?xml version='1.0' encoding='UTF-8' ?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:titan="http://www.titan.com/Reservation" targetNamespace="http://www.titan.com/Reservation"> <element name="reservation" type="titan:ReservationType"/> <complexType name="ReservationType
"> <sequence> <element name="customer" type="titan:CustomerType
"/> <element name="cruise-id" type="int
"/> <element name="cabin-id" type="int
"/> <element name="price-paid" type="double
"/> </sequence> </complexType> <complexType name="CustomerType
"> <sequence> <element name="last-name" type="string"/> <element name="first-name" type="string"/> <element name="address" type="titan:AddressType
"/> <element name="credit-card" type="titan:CreditCardType
"/> </sequence> </complexType> <complexType name="CreditCardType
"> <sequence> <element name="exp-date" type="dateTime"/> <element name="number" type="string"/> <element name="name" type="string"/> <element name="organization" type="string"/> </sequence> </complexType> <complexType name="AddressType
"> <sequence> <element name="street" type="string"/> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </complexType> </schema>
An XML document that conforms to the Reservation XSD would include information describing the customer (name and address), credit card information, and the identity of the cruise and cabin that is being reserved. This document might be sent to Titan Cruises from a travel agency that cannot access the TravelAgent EJB to make reservations. Here’s an XML document that conforms to the Reservation XSD:
<?xml version='1.0' encoding='UTF-8' ?> <reservation> <customer> <last-name>Jones</last-name> <first-name>Sara</first-name> <address> <street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address> <credit-card> <exp-date>09-2005</exp-date> <number>0394029302894028930</number> <name>Sara Jones</name> <organization>VISA</organization> </credit-card> </customer> <cruise-id>123</cruise-id> <cabin-id>333</cabin-id> <price-paid>6234.55</price-paid> </reservation>
At runtime, the XML parser compares the document to its Schema, ensuring that the document conforms to the rules set down by the Schema. If the document doesn’t adhere to the Schema, it is considered invalid, and the parser produces error messages. An XML Schema checks that XML documents received by your system are properly structured, so you won’t encounter errors while parsing the documents and extracting the data. For example, if someone sent your application a Reservation document that omitted the credit-card element, the XML parser could reject the document as invalid before your code even sees it: you don’t have to worry about errors in your code caused by missing information in the document.
This brief overview represents only the tip of the iceberg. XML Schema is a very rich XML typing system and can only be given sufficient attention in a text dedicated to the subject. For an in-depth and insightful coverage of XML Schema, read XML Schema: The W3C’s Object-Oriented Descriptions for XML by Eric van der Vlist (O’Reilly) or read the XML Schema specification, starting with the primer at the W3C (World Wide Web Consortium) web site (http://www.w3.org/TR/xmlschema-0/).
The Reservation schema defines an
XML markup language that describes the structure
of a specific kind of XML document. Just as a Class is a type of Java
object, an XML markup language, defined by an XML Schema, is a type
of XML document. In some cases, it’s convenient to
combine two or more XML markup languages into a single document, so
that the elements from each markup language can be validated
separately using different XML Schemas. This is especially useful
when you want to reuse a markup language in many difference contexts.
For example, the AddressType
defined in the
previous section is useful in a variety of contexts, not just the
Reservation XSD, so it could be defined as a separate markup language
in its own XML Schema.
<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.titan.com/Address">
<complexType name="AddressType
">
<sequence>
<element name="street" type="string"/>
<element name="city" type="string"/>
<element name="state" type="string"/>
<element name="zip" type="string"/>
</sequence>
</complexType>
</schema>
In order to use different markup languages in the same XML document, you must clearly identify the markup language to which each element belongs. Here is an XML document for a reservation, but this time we are using XML Namespaces to separate the address information from the reservation information:
<?xml version='1.0' encoding='UTF-8' ?> <res:reservationxmlns:res="http://www.titan.com/Reservation
" > <res:customer> <res:last-name>Jones</res:last-name> <res:first-name>Sara</res:first-name> <addr:addressxmlns:addr="http://www.titan.com/Address
"> <addr:street>3243 West 1st Ave.</addr:street> <addr:city>Madison</addr:city> <addr:state>WI</addr:state> <addr:zip>53591</addr:zip> </addr:address> <res:credit-card> <res:exp-date>09-2005</res:exp-date> <res:number>0394029302894028930</res:number> <res:name>Sara Jones</res:name> <res:organization>VISA</res:organization> </res:credit-card> </res:customer> <res:cruise-id>123</res:cruise-id> <res:cabin-id>333</res:cabin-id> <res:price-paid>6234.55</res:price-paid> </res:reservation>
All the elements for the address information are prefixed with
characters addr
:, and all the reservation elements
are prefixed with res
:. These prefixes allow
parsers to identify and separate the elements that belong to the
Address markup from those that belong to the Reservation markup. As a
result, the address elements can be validated against the Address XSD
while the reservation elements are validated against the Reservation
XSD. The prefixes are assigned using XML Namespace declarations,
which are shown in bold in the previous listing. An XML Namespace
declaration follows this format:
xmlns:prefix="URI"
The prefix can be anything you like, as long as it does not include
blanks or any special characters. We use prefixes that are
abbreviations for the name of the markup language:
res
stands for Reservation XSD and
addr
stands for Address XSD. This is the
convention that most XML documents follow, but it’s
not a requirement; you could use prefixes like foo
or bar
or anything else you fancy.
While the prefix can be any arbitrary token, the URI must be very specific. A URI (Universal Resource Identifier) is an identifier that is a superset of the URL (Universal Resource Locator) that you use every day to look up web pages. In most cases, people use the stricter URL format for XML Namespaces because URLs are familiar and easy to understand. The URI used in the XML Namespace declaration identifies the exact markup language that is employed. It doesn’t have to point at a web page or an XML document; it just needs to be unique to that markup language. For example, the XML Namespace used by the Address markup is different from the URL used for the Reservation markup.
xmlns:addr="http://www.titan.com/Address
" xmlns:res="http://www.titan.com/Reservation
"
The URI in the XML
Namespace declaration should match the target namespace declared by
an XML Schema. Here is the Address XSD with the target namespace
declaration shown in bold. The URL value of the
targetNamespace
attribute is identical to the URL
assigned to the add
: prefix in the reservation
document, shown earlier.
<?xml version='1.0' encoding='UTF-8' ?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.titan.com/Address
">
<complexType name="AddressType">
<sequence>
<element name="street" type="string"/>
<element name="city" type="string"/>
<element name="state" type="string"/>
<element name="zip" type="string"/>
</sequence>
</complexType>
</schema>
The targetNamespace
attribute identifies the
unique URI of the markup language; it is the permanent identifier for
that XML Schema. Whenever elements from the Address XSD are used in
some other document, the document must use an XML Namespace
declaration to identify those elements as belonging to the Address
markup language.
Prefixing every element in an XML document with its namespace
identifier is a bit tedious, so XML Namespace allows you to declare
a default namespace that applies to all elements that are not
prefixed. The default namespace is simply an XML Namespace
declaration that has no prefix
(xmlns=
"URL"
)
.
For example, we can use a default name in the reservation document
for all Reservation elements:
<?xml version='1.0' encoding='UTF-8' ?>
<reservation xmlns="http://www.titan.com/Reservation
" >
<customer>
<last-name>Jones</last-name>
<first-name>Sara</first-name>
<addr:address xmlns:addr="http://www.titan.com/Address">
<addr:street>3243 West 1st Ave.</addr:street>
<addr:city>Madison</addr:city>
<addr:state>WI</addr:state>
<addr:zip>53591</addr:zip>
</addr:address>
<credit-card>
<exp-date>09-2005</exp-date>
<number>0394029302894028930</number>
<name>Sara Jones</name>
<organization>VISA</organization>
</credit-card>
</customer>
<cruise-id>123</cruise-id>
<cabin-id>333</cabin-id>
<price-paid>6234.55</price-paid>
</reservation>
None of the Reservation elements names are prefixed. Any nonprefixed element belongs to the default namespace. The Address elements do not belong to the Reservation namespace, so they are prefixed to indicate which namespace they belong to. The default namespace declaration has scope; in other words, it applies to the element in which it is declared (if that element has no namespace prefix), and to all nonprefixed elements nested under that element. We can use the scoping rules of namespace to further simplify the Reservation document by allowing the Address elements to override the default namespace with their own default namespace.
<?xml version='1.0' encoding='UTF-8' ?> <reservationxmlns="http://www.titan.com/Reservation
" > <customer> <last-name>Jones</last-name> <first-name>Sara</first-name> <addressxmlns="http://www.titan.com/Address
"> <street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address> <credit-card> <exp-date>09-2005</exp-date> <number>0394029302894028930</number> <name>Sara Jones</name> <organization>VISA</organization> </credit-card> </customer> <cruise-id>123</cruise-id> <cabin-id>333</cabin-id> <price-paid>6234.55</price-paid> </reservation>
The Reservation default namespace applies to the
<reservation>
element and all of its
children except for the Address elements. The
<address>
element and its children have
defined their own default namespace, which overrides the default
namespace of the <reservation>
element.
Default namespaces do not apply to attributes. As a result, any
attributes used in an XML document should be prefixed with a
namespace identifier. The only exceptions to this rule are attributes
defined by the XML language itself, such as the
xmlns
attribute, which establishes an XML
Namespace declaration. This attribute doesn’t need
to be prefixed because it is part of XML language.
XML Namespaces are just URIs that uniquely identify a namespace, but
do not actually point at a resource. In other words, you
don’t normally use the URI of a XML Namespace to
look something up. It’s usually just an identifier.
However, you might want to indicate the location of the XML Schema
associated with an XML Namespace so that a parser can upload it and
use it in validation. This is accomplished using the
schemaLocation
attribute:
<?xml version='1.0' encoding='UTF-8' ?> <reservation xmlns="http://www.titan.com/Reservation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance" xsi:schemaLocation="http://www.titan.com/Reservation
http://www.titan.com/schemas/reservation.xsd">
<customer> <last-name>Jones</last-name> <first-name>Sara</first-name> <address xmlns="http://www.titan.com/Address" xsi:schemaLocation="http://www.titan.com/Address
http://www.titan.com/schemas/address.xsd">
<street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address> <credit-card> <exp-date>09-2005</exp-date> <number>0394029302894028930</number> <name>Sara Jones</name> <organization>VISA</organization> </credit-card> </customer> <cruise-id>123</cruise-id> <cabin-id>333</cabin-id> <price-paid>6234.55</price-paid> </reservation>
The schemaLocation
attribute provides a list of
values as Namespace-Location value pairs. The first value is the URI
of the XML Namespace; the second is the physical location (URL) of
the XML Schema. The following schemaLocation
attribute states that all elements belonging to the Reservation
Namespace (http://www.titan.com/Reservation) can be
validated against a XML Schema located at the URL http://www.titan.com/reservation.xsd:
xsi:schemaLocation="http://www.titan.com/Reservation http://www.titan.com/schemas/reservation.xsd">
The schemaLocation
attribute is not a part of the
XML language, so we’ll actually need to prefix it
with the appropriate namespace in order to use it. The XML Schema
specification defines a special namespace that can be used for
schemaLocation
(as well as other attributes). That
namespace is
http://www.w3.org/2001/XMLSchema-Instance
. In
order to properly declare the schemaLocation
attribute, declare its XML namespace and prefix it with the
identifier for that namespace as shown in the following snippet:
<?xml version='1.0' encoding='UTF-8' ?> <reservation xmlns="http://www.titan.com/Reservation"xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance
"xsi:
schemaLocation="http://www.titan.com/Reservation http://www.titan.com/schemas/reservation.xsd">
A namespace
declaration only needs to be defined once; it applies to all elements
nested under the element in which it’s declared. The
convention is to use the prefix xsi
for the XML
Schema Instance namespace (http://www.w3.org/2001/XMLSchema-Instance).
XML Schemas also use XML Namespaces. Let’s look at XML Schema for the Address markup language with a new focus on the use of XML Namespaces:
<?xml version='1.0' encoding='UTF-8' ?> <schemaxmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.titan.com/Address"
xmlns:addr="http://www.titan.com/Address"
> <element name="address" type="addr:AddressType"/> <complexType name="AddressType
"> <sequence> <element name="street" type="string"/> <element name="city" type="string"/> <element name="state" type="string"/> <element name="zip" type="string"/> </sequence> </complexType>
In this file, namespaces are used in three separate declarations. The
first namespace declaration states that the default namespace is
http://www.w3c.org/2001/XMLSchema
, which is the
namespace of the XML Schema specification. This declaration makes it
easier to read the XSD because most of the elements do not need to be
prefixed. The second declaration states that the target namespace of
the XML Schema is the namespace of the Address markup. This tells us
that all the types and elements defined in this XSD belong to that
namespace. Finally, the third namespace declaration assigns the
prefix addr
to the target namespace so that types
can be referenced exactly. For example, the top level
<element>
definition uses the name
addr:AddressType
to say that the element is of
type AddressType
, belonging to the namespace
http://www.titan.com/Address.
Why do you have to declare a prefix for the target namespace? The reason is clearer when you examine the XSD for the Reservation XSD:
<?xml version='1.0' encoding='UTF-8' ?> <schema xmlns="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance
"xmlns:addr="http://www.titan.com/Address"
xmlns:res="http://www.titan.com/Reservation"
targetNamespace="http://www.titan.com/Reservation"
><import namespace="http://www.titan.com/Address"
xsi:schemaLocation="http://www.titan.com/Address.xsd" />
<element name="reservation" type="res:ReservationType
"/> <complexType name="ReservationType
"> <sequence> <element name="customer" type="res:CustomerType
"/> <element name="cruise-id" type="int"/> <element name="cabin-id" type="int"/> <element name="price-paid" type="double"/> </sequence> </complexType> <complexType name="CustomerType
"> <sequence> <element name="last-name" type="string"/> <element name="first-name" type="string"/> <element name="address" type="addr:AddressType
"/> <element name="credit-card" type="res:CreditCardType
"/> </sequence> </complexType> <complexType name="CreditCardType
"> <sequence> <element name="exp-date" type="dateTime"/> <element name="number" type="string"/> <element name="name" type="string"/> <element name="organization" type="string"/> </sequence> </complexType> </schema>
The Reservation XSD imports the Address XSD so that the
AddressType
can be used to define the
CustomerType
. You can see the use of namespaces in
the definition of the CustomerType
, which
references types from both the Reservation and Address namespace,
prefixed by addr
and res
:
<?xml version='1.0' encoding='UTF-8' ?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-Instance"xmlns:addr="http://www.titan.com/Address"
xmlns:res="http://www.titan.com/Reservation"
targetNamespace="http://www.titan.com/Reservation"
> ... <complexType name="CustomerType
"> <sequence> <element name="last-name" type="string"/> <element name="first-name" type="string"/> <element name="address" type="addr:AddressType
"/> <element name="credit-card" type="res:CreditCardType
"/> </sequence> </complexType>
Assigning a prefix to the Reservation namespace allowed us to
distinguish between elements that are defined as Reservation types
(e.g., credit-card
) and elements that are defined
as Address types (e.g., address
). All the
type
attributes that reference built-in types
string
and int
also belong to
the XML Schema namespace, so we don’t need to prefix
them. We could, though, for clarity. That is, we’d
replace string
and int
with
xsd:string
and xsd:int
. The
prefix xsd
references the XML Schema namespace;
using it allows us to identify built-in types defined as XML Schema
more clearly. It’s not a problem that the default
namespace is the same as the namespace prefixed by
xsd
. By convention, the xsd
prefix is the one used in most XML schemas.