The Fundamentals of XML Technologies

This section covers the fundamentals of XML and how markup technologies will revolutionize the way we build business applications for a long time to come.

XML Background

Let's begin with a short history lesson about where XML began and then discuss where it will take us. A few years back (the late 1960s), some IBM scientists were interested in ways to abstract the data in their documents from their proprietary formatting codes. In other words, they wanted to separate their precious data from the proprietary codes normally dispersed throughout the data in the document. They also wanted a document that could add context to their data—that is, add meaning to the document. More important, they wanted computer programs to be able to interpret a document as well as any human being could.

One way to accomplish this is with the use of metadata. Metadata is data used to describe information or put information into a proper context. For instance, if I wanted you to understand the string “Washington,” it might help if I were to add some context to it. How about if I prefaced it with another string, “George.”That might put things in a better perspective for you, right? If, instead, I added the string “D.C.” to the end of the original string, it would bring about an entirely different connotation. However, the same does not hold true for a computer application. Even though our minds can decipher what is meant when we read the words, “George Washington,” or “Washington, D.C.,” any computer application would be hard-pressed to make sense of it.

XML has an answer to this problem. XML adds context to data in a standard, ubiquitous way, so that applications can interpret contextual data in the same way that humans were meant to interpret it. For instance, if we wrap the string “George Washington” in XML, it might look something like this:

<President country="US" lineage="1">George Washington</President> 

Now, not only would a human being be able to determine that George Washington was the first President of the United States, but a computer application could also come to the same conclusion using a minimal amount of programming.

So what does this have to do with BizTalk? Let's think about that for a minute. At the core, BizTalk is a business document router and business process/application integration server. Ask yourself this question: Would BizTalk be of any value to the enterprise if it could process only Microsoft Word documents or SQL Server calls? No, it wouldn't, because not everyone uses Microsoft Word and SQL Server to share and store data. Therefore, Microsoft needed to come up with a way to share and store information using standard, nonproprietary formats that everyone could agree to use. XML is quickly becoming the syntax for defining those formats. The argument goes, if it is in XML format, then an application should be able to interpret and process the data.

But there's more to the story than just semantics. BizTalk also adds value to your enterprise because it does not confine you to proprietary transports, such as COM, DCOM, or IIOP. In fact, to be successful, Microsoft needed to create a server that would provide the underlying plumbing and tools necessary to integrate applications regardless of their platform or how the data is stored. It would need to be able to communicate using standard Internet protocols and data formats as well as proprietary and/or legacy protocols and data formats. This standardization requirement led to the development of something known as the Simple Object Access Protocol, or SOAP. We will discuss SOAP later in this chapter, but for now, it is only important to understand that SOAP was designed as an open format for accessing data over the Internet.

Would BizTalk be of any value if it could only route documents using proprietary transports such as COM, DCOM, or IIOP? Absolutely not. To be successful, Microsoft needed to create a server that would provide the underlying plumbing and tools necessary to integrate applications regardless of where they lived or how they store their data. It would need to be able to communicate using standard Internet protocols and data formats as well as proprietary and/or legacy protocols and data formats. The server would also need to define a standard format for describing documents.

When developing BizTalk Server, Microsoft needed to find a way to aggregate data in a standard format that was both ubiquitous and robust enough to be useful but not proprietary or required. Microsoft couldn't “own” or require the use of the format because then not everyone would use it. Microsoft's answer was XML. XML could be used to describe not only the contents of the document and the data inside it but also routing parameters and data types. But the best thing about it was that everyone was willing to work with XML. Sun, IBM, Microsoft—as unbelievable as it might seem—all agreed on something. That something was (and is) XML.

Microsoft realized that although the movement towards XML, for most, seems inevitable, it's not going to happen over night. Most integration solutions need to include access to legacy applications using legacy data formats and legacy protocols. With that in mind, BizTalk utilizes a number of XML-related technologies to help bridge the XML-to-legacy application gap as well as facilitate the entrance into an XML-centric world. If BizTalk was going to be successful, it needed to use the power of XML without requiring the developer to know how to use it. To accommodate this requirement, Microsoft built an engine that is XML-centric and an XML framework to help standardize the way messages are formatted and transported. So Microsoft designed a server that can accept any type of flat file document and translate it to any other flat file document. Internally, it parses the inbound document into its XML representation, applies an XSLT style sheet to transform it into its outbound XML document (also known as mapping), and then serializes it into its native format for transport. It also made sure that the server supports most all of the standard Internet protocols including HTTP, HTTPs, SMTP, and proprietary protocols such as COM.

Note

In later chapters we will discuss how BizTalk accomplishes these tasks in great detail but hopefully this gives you a glimpse at the role XML plays in the BizTalk Server environment.


Now that you know a little about how BizTalk the server uses XML, let's talk a little about XML in general. In the perfect XML world, when an application needs to pass data to another application that it knows nothing about, it requires a standard way to do it and a self-describing format so that the application on the other side can be told how to handle it.

The BizTalk Framework is a specification that aims at identifying itself to systems that are ambiguous at the time of design or indefinable altogether. By enveloping documents in an XML “wrapper” that conforms to the BizTalk Framework spec, you are enabling disparate systems that are BizTalk compliant to process those documents correctly. This is because the BizTalk compliant application can parse the BizTalk message and strip out the necessary information it needs to process the message.

Listing 2.1 shows a BizTalk Framework 2.0 message and illustrates how this works.

Listing 2.1. BizTalk 2.0 Purchase Order
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
<SOAP-ENV:Header>
<eps:endpoints SOAP-ENV:mustUnderstand="1"
xmlns:eps="http://schemas.biztalk.org/btf-2-0/endpoints"
xmlns:agr="http://www.trading-agreements.org/types/">
<eps:to>
<eps:address xsi:type="agr:organization">Big Books, Inc.</eps:address>
</eps:to>
<eps:from>
<eps:address xsi:type="agr:organization">Book Lovers Book Store</eps:address>
</eps:from>
</eps:endpoints>
<prop:properties SOAP-ENV:mustUnderstand="1"
xmlns:prop="http://schemas.biztalk.org/btf-2-0/properties">
<prop:identity>uuid:74b9f5d0-33fb-4a81-b02b-5b760641c1d6</prop:identity>
<prop:sentAt>2000-05-14T03:00:00+08:00</prop:sentAt>
<prop:expiresAt>2000-05-15T04:00:00+08:00</prop:expiresAt>
<prop:topic>http://electrocommerce.org/purchase_order/</prop:topic>
</prop:properties>
</SOAP-ENV:Header>
<SOAP-ENV:Body>
<po:PurchaseOrder xmlns:po="http://electrocommerce.org/purchase_order/">
<po:Title>BizTalk Unleashed</po:Title>
</po:PurchaseOrder>
</SOAP-ENV:Body>
28 </SOAP-ENV:Envelope>

Listing 2.1 describes data being sent to Big Books, Inc from Book Lover's Book Store. Without getting into too much detail, let's think about the vital data that is trying to be conveyed here.

You need to go all the way down to lines <po:PurchaseOrder xmlns:po="http://electrocommerce.org/purchase_order/"> through </po:PurchaseOrder> from Listing 2.1 to really understand what this message is all about. It's a purchase order for your book, BizTalk Unleashed. Everything else you see is metadata that could be used by BizTalk Server to figure out where this document needs to go, where it is from, and how it should get there. The purchase order information is the meat, but the metadata contains information that ensures the safe and secure handling of the meat.

Now imagine trying to share this information with another company, without using XML. How would you do it? Would you send the data in an Excel spreadsheet? But what if the company doesn't own the Microsoft Excel software? It won't be able to process the data. How about a simple text document? The problem with that is that the receiving application would have to be deliberately programmed to process the message to know intrinsically what to do with it. But then, even if the application could interpret the message, you would still have to get everyone to agree on the message format. Seem familiar? Are you really ready for that kind of an administrative nightmare once again? Probably not. In this case, a human being would have to read the message and then manually enter the data into the application. With BizTalk messaging, the human being is bypassed, which means that the time and effort associated with such a transaction is minimized—as is the chance for human error fouling the process. Using XML, you have an open format that everyone has agreed to use to exchange data.

The goal of this chapter is for you to see how XML can be used to convey important information about data in a programmatically accessible fashion, making it perfect for business-to-business (B2B) processing and enterprise application integration (EAI). But what is XML exactly? XML architects prefer to think of XML as more of a family of technologies than as a single entity. For instance, learning XSLT would be useless without a rudimentary understanding of XML because XSLT is most often used to transform XML documents into another format. So, as you will now see, each member of the XML family is firmly tied to one or more of the other members.

XML 1.0

The XML 1.0 specification defines the syntax for defining your own Markup Language. For instance, SOAP uses XML syntax to define how it will be expressed. HTML uses a similar syntax, but one described by the SGML (Standard Generalized Markup Language) specification, the precursor to XML. SGML is a much more robust language than XML, but that is exactly why it is not as useful in defining Web-based languages. It is simply too heavy—there are too many rules. XML is a slimmed-down version of SGML that will define application languages for many years to come.

A typical XML document is made up of many nodes. Each node can represent items in a document such as elements, attributes, and text. There are other types of nodes, but for the sake of simplicity, we will deal with only these three. It might help to think of an XML document in terms of a tree structure, much like the directory structure in the Windows Explorer. Each folder could be likened to an element, with additional folders (elements) inside. These are known as container or parent elements. Each file inside a folder could be considered an element as well (for example, a text element), and the statistics relating to that file would be regarded as being attributes of the file.

Schemas are used to define the structure of an XML document but are not required by most applications. Some typical schema languages are Document Type Definitions (DTD), XML Data-Reduced (XDR) schemas, and the W3C Schema (XSD) language. If a schema defines an XML document, and that document can be verified against the schema, it is said to be “valid.” Validity is generally not required; however, an XML document is required to be well-formed. This is to say that it conforms to a basic tree structure, uses quotes around attribute values, and is sensitive to case—meaning that if you open an element in ALL CAPS, make sure that you close the element in ALL CAPS as well.

When you parse—or process—an XML document, the purpose is generally to ascertain the value of one of those particular nodes. In addition to this, it is possible that you want to update or delete the value of a particular node in the XML document. Next, you will take a look at some standard ways to parse through an XML document.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset