DTDs validate XML data formats as do XML schemas, which include XDR and XSD. This section covers DTD versus XDR schemas in detail.
First look at the DTD, as shown in Listing A.4.
<?xml version='1.0' encoding='UTF-8' ?> <!ELEMENT Transactions (Transaction+)> <!ATTLIST Transactions location CDATA #REQUIRED type CDATA #REQUIRED > <!ELEMENT Transaction (Amount , Type , Facility , Location)> <!ATTLIST Transaction date CDATA #REQUIRED id CDATA #REQUIRED > <!ELEMENT Amount (#PCDATA)*> <!ATTLIST Amount currency (usd ) #REQUIRED > <!ELEMENT Type (#PCDATA)*> <!ELEMENT Facility (#PCDATA)*> <!ELEMENT Location (Name , Address)> <!ELEMENT Name (#PCDATA)*> <!ELEMENT Address (#PCDATA)*> |
The first line in the DTD is an XML processing instruction. This tells the application that it is dealing with an XML 1.0 DTD, rather than an SGML DTD, and that the characters are encoded in 8-bit Unicode.
The second line dictates that the first element in the document must be a <Transactions> element and that one or more <Transaction> tags will follow it. The plus (+) sign in the second line is the operator that determines the number of elements allowed in a particular node list. A node list is a record of the number of elements meeting a certain criteria. For instance, the length of the node list for <Transaction> elements in our XML instance is 2 (or 1, on a zero-based scale, for those programmers who will hold me to that). This is because there are two <Transaction> nodes in the document. If we add another <Transaction> element to the document, the length will be three.
The third and fourth lines define attributes that belong to the <Transactions> element. In this case, you have two attributes, named location and type. Both are required, and each consists of CDATA, or Character Data, which means that the characters in the attributes will be ignored by the XML parser and treated as normal text.
The fifth through the fourteenth lines convey much of the same to the XML application as the second, third, and fourth lines. The application, in this case, should expect a <Transaction> element with an <Amount> child node, a <Type> child node, a <Facility> child node, and a <Location> child node. Each of the child nodes contains PCDATA—Parsed Character Data, which means that the XML parser will search the text for things such as additional elements and any possible entity references.
At this point, you should be grasping how a DTD defines the element and attribute names in an XML document, as well as the markup's structure. Now look at Listing A.5, which shows how the same XML is defined using XDR.
<?xml version = "1.0" encoding = "UTF-8"?> <Schema name = "Transaction.xdr" xmlns = "urn:schemas-microsoft-com:xml-data" xmlns:dt = "urn:schemas-microsoft-com:datatypes"> <ElementType name = "Transactions" content = "eltOnly" order = "seq" model =_ "closed"> <AttributeType name = "location" dt:type = "string" required = "yes"/> <AttributeType name = "type" dt:type = "string" required = "yes"/> <attribute type = "location"/> <attribute type = "type"/> <element type = "Transaction" minOccurs = "1" maxOccurs = "*"/> </ElementType> <ElementType name = "Transaction" content = "eltOnly" order = "seq"_ model = "closed"> <AttributeType name = "date" dt:type = "string" required = "yes"/> <AttributeType name = "id" dt:type = "string" required = "yes"/> <attribute type = "date"/> <attribute type = "id"/> <element type = "Amount"/> <element type = "Type"/> <element type = "Facility"/> <element type = "Location"/> </ElementType> <ElementType name = "Amount" content = "textOnly" dt:type = "fixed.14.4"_ model = "closed"> <AttributeType name = "currency" dt:type = "string" dt:value = "usd "_required = "yes"/> <attribute type = "currency"/> </ElementType> <ElementType name = "Type" order = "many" model = "closed"/> <ElementType name = "Facility" order = "many" model = "closed"/> <ElementType name = "Location" content = "eltOnly" order = "seq" model_= "closed"> <element type = "Name"/> <element type = "Address"/> </ElementType> <ElementType name = "Name" order = "many" model = "closed"/> <ElementType name = "Address" order = "many" model = "closed"/> </Schema> |
The first thing you might notice is that the XDR version is longer than the DTD and that the XDR version is written using the XML syntax. You will soon see that this extra robustness—and the decision to use XML as the syntax of choice—adds enough benefits to make it worth its heavy verbiage.
The first line of Listing A.5 is exactly the same as the first line in the DTD version, and the purpose of the processing instruction is similar as well.
The second line of Listing A.5 is where the similarity ends. Except for the fact that the XDR schema is going to define the structure of the XML documents—as the DTD did—the XDR language allows us to be much more definitive as to the data that will be encapsulated in the XML.
The following code from Listing A.5 represents the document element of the schema and the namespaces that will be used:
<Schema name = "Transaction.xdr" xmlns = "urn:schemas-microsoft-com:xml-data" xmlns:dt = "urn:schemas-microsoft-com:datatypes">
Every XML document requires a document element, and in this case, the schema's document element is the <Schema> element. The name attribute is of little importance, but it should be known that it can help to identify the schema's name to an application. The xmlns attributes are namespace identifiers and allow the processor to know what to do with particular pieces of the XML document. In this case, the default namespace is the Uniform Resource Name (URN), a unique identifier tied programmatically to the XDR schema.
Note
It is probably important to pause for a moment and discuss what namespaces actually represent. A namespace provides a simple method for qualifying element and attribute names used in [XML] documents by associating them with namespaces identified by URI references. This means that a namespace is a way to ensure that the elements and attributes that you are using in your XML are unique.
In xmlns = "urn:schemas-microsoft-com:xml-data", the default namespace is set to the namespace for XDR. Therefore, each of the elements in the schema will be unique to XDR, unless otherwise directed to use another namespace. The fourth line of Listing A.5 defines another namespace, and this one is for the specification that defines how the XML (within the scope of the XDR language) uses data types. As you work with XML, you will learn that any element or attribute prefixed with a dt: will be referenced to a particular data type.
Before diving much deeper into the XDR schema language, you can use Table A.1 as a guide through the rest of this section.
Schema Element | Description |
---|---|
attribute | Refers to a declared attribute type that can appear within the scope of the named ElementType element |
AttributeType | Defines an attribute type for use within the Schema element |
datatype | Specifies the data type for the ElementType or AttributeType element |
description | Provides documentation about an ElementType or AttributeType element |
element | Refers to a declared element type that can appear within the scope of the named ElementType element |
ElementType | Defines an element type for use within the Schema element |
group | Organizes content into a group to specify a sequence |
Schema | Identifies the start of a schema definition |
The following lines of code from Listing A.5 define the Transactions element—that is, the <Transactions> tag.
<ElementType name = "Transactions" content = "eltOnly" order = "seq" model = "closed"> <AttributeType name = "location" dt:type = "string" required = "yes"/> <AttributeType name = "type" dt:type = "string" required = "yes"/> <attribute type = "location"/> <attribute type = "type"/> <element type = "Transaction" minOccurs = "1" maxOccurs = "*"/> </ElementType>
In the lines
<ElementType name = "Transactions" content = "eltOnly" order = "seq" model = "closed">
the properties of an XML element are defined by the <ElementType>. The element type is used to describe the specific details of a particular tag in an XML document, which in this case happens to be the <Transactions> tag. The name of the tag is defined by the name attribute. The tag name in this case will be "Transactions", as shown in Listing A.6.
Note
The words “tag” and “element” are used synonymously throughout this appendix.
<ElementType name="idref" content="{empty | textOnly | eltOnly | mixed}" dt:type="datatype" model="{open | closed}" order="{one | seq | many}"> |
Next, because the content attribute is set to eltOnly, the inclusion of text is disallowed between the opening and closing <Transactions> </Transactions> tags. In other words, according to the content model, putting the words “Here is some text” between the <Transactions> tags is invalid, but adding other elements—for example, the <Transaction> tag—to the element list is completely valid. Table A.2 shows some other properties of the content attribute.
Value | Description |
---|---|
empty | The element cannot contain content. |
textOnly | The element can contain only text, not elements. If the model attribute is set to “open,” the element can contain text and other unnamed elements. |
eltOnly | The element can contain only the specified elements. It cannot contain any free text. |
Mixed | The element can contain both elements and attributes. |
The order and model attributes are also vital to the classification of the element type. The order attribute specifies how the elements will be sequenced within the context of this element. For instance, think of HTML. The order is important within the context of the <HTML> element because <HEAD> must go before <BODY>. However, within the context of the <P> tag, the order is not as pressing with regards to certain elements. For example, a <DIV> tag might come before a <P> tag, under which there's another <P> tag. There is no set order in which they must occur under the <BODY> element—that is, the order is set to “many.” Table A.3 shows possible attribute values for the element <ElementType>.
The model attribute is necessary for controlling the extensibility of the document. Remember, we are working with XML here, which means that extensibility is the key. A document can be expanded at the whim of the developer because XML models are “open” by default. However, if the model is “closed,” then a developer is not allowed to add custom elements to the model. In the case of our transactions document, a developer would not be allowed to add any elements under the <Transactions> tag because the model is closed.
The attribute types of the <Transactions> element are defined in Listing A.7. The <AttributeType> element defines the properties of attributes.
<AttributeType default="default-value" dt:type="primitive-type" dt:values="enumerated-values" name="idref" required="{yes | no}"> |
In the attribute type, you can identify the default value of the attribute. For example, if you wanted to create a schema for an XML document describing apple trees, you could set the default value for the color attribute to “red.” However, it is important to remember that the default value must be legal for that attribute instance, according to its data type.
The attribute's data type values may be classified using dt:type (remember that the dt: namespace identifier corresponds to Microsoft's implementation of XML data types). As you can see, in Listing A.7, the data types have been set to the type "string". Here, we could set a default value for the attribute to either "bank", or "checking", or whatever else would make sense in light of the application. Note that if you use the "enumerator" data type, then the enumerated values must go into the dt:value attribute, with the default value being listed first.
The final property to remember is the required attribute. Fairly self-explanatory, this attribute specifies whether the attribute must exist in the selected element.
The following code lines contain the <attribute> element:
<attribute type = "location"/> <attribute type = "type"/>
This element refers to a declared attribute type that can appear within the scope of the named <ElementType> element. In other words, you build an attribute definition with the <AttributeType> and effectively “place” it at one or more locations throughout the schema by using the <attribute> element.
You reference an attribute type within the attribute element by using the type property. For instance
<attribute type = "location"/>
references the attribute type in
<AttributeType name = "location" dt:type = "string" required = "yes"/>
because the value of its type property is "location". In a sense, the <AttributeType> element is the real representation of the actual attribute, and the <attribute> element is the pointer.
There are two other properties for the <attribute> element, the value property and the required property. The value property represents a default value for the attribute that supersedes the default value set in the <AttributeType> element.
The required property simply states whether the attribute is required in the XML document. Two things to keep in mind with regard to the required property are that when the required attribute is set to "yes" and the default attribute specifies a default value, the supplied default value must always be the value, and documents containing other attribute values are invalid. When the required attribute is set to "yes" and no default is specified, each element whose type is declared to have the attribute must supply its value.
The following code line introduces the <element> element:
<element type = "Transaction" minOccurs = "1" maxOccurs = "*"/>
Much like the <attribute> element you just learned about, the <element> is simply a pointer to the <ElementType> and tells the parser where a particular element needs to be within the XML document. In the preceding code line, the parser knows that the <Transaction> element is required under the <Transactions> node. This is also referred to as being nested within a particular node. The value of the type property is a pointer to the element type. Therefore, when you see
<element type = "Transaction" minOccurs = "1" maxOccurs = "*"/>
it is pointing to the following <ElementType>:
<ElementType name = "Transaction" content = "eltOnly" order = "seq"_ model = "closed">
So, all you need is one <ElementType> for all your <element> declarations, as long as you want them to inherit their properties from that particular <ElementType>.
Only two other properties belong to the <element>. The minOccurs and the maxOccurs properties. The values of these attributes can be "0", "1", or "*". The asterisk represents more than one. This means that an element can be used in a number of different ways, as you can see in Table A.4.
minOccurs : maxOccurs | Description |
---|---|
1 : * | 1 or more times |
0 : * | 0 or more times |
0 : 1 | 0 or 1 time |
1 : 1 | Only 1 time |
If you have been able to stick with me so far, you are going to have a much better understanding of how the BizTalk Editor works because many of the values you fill in on the Editor will correspond directly to the elements that we've just reviewed. However, we have covered only a portion of what a complete BizTalk specification represents. After we discuss a few other variations of XML schema languages, we will move on to a thorough discussion of the BizTalk specification and how it uses XDR.
Note
If you aren't completely satisfied with this introduction to XDR, visit the Microsoft MSDN site at http://msdn.microsoft.com/library/psdk/xmlsdk/xmls5gkl.htm. You will find it helpful in learning more about XML-Data Reduced and other XML-related technologies.
Even though it is highly recommended that our focus should remain on XDR and the W3C XML Schema language (XSD), we want to share a few other languages with you to illustrate the diverse nature from which schemas have evolved.