Why Use the DOM?

We already agreed that we don't want to reinvent the wheel. So, from a programming perspective, the DOM fits the bill. However, it's not the only language-independent API we can use. There is one other predominate API, called SAX (Simple API for XML, not the musical instrument that John Coltrane and a certain past U.S. president played). SAX, like the DOM, specifies an interface that can be used to process XML documents in a somewhat platform-independent fashion. It doesn't matter a lot in practical terms since it's becoming customary for most XML libraries to support both, but the DOM is an “official” W3C Recommendation and SAX isn't.

How are the DOM and SAX different, and why have I chosen the DOM? The DOM handles a complete document as an object in memory, specifically a tree. SAX, on the other hand, is an event-driven API. It calls a specific method (via callbacks) for each type of XML construct it encounters as it reads in an XML instance document. The most important difference between the two APIs is that the DOM makes it fairly easy to build or manipulate a document in memory. Some SAX implementations offer ways to build documents, but there isn't a standard SAX approach. So, SAX is not as well suited in this respect. On the other hand, SAX is well suited to parsing very large XML documents that might cause performance problems (or even crashes) if we tried to read them into memory using the DOM. Since it's much simpler to use just one API if possible, I'm using the DOM in this book. I'll discuss this topic again in Chapter 12, but the choice of the DOM for this book's utilities will become more understandable as the overall design approach progresses.

I should note, though, that while the current versions of the DOM make it fairly easy to build and manipulate XML documents in memory, through Level 2 the DOM doesn't specify how XML (or HTML) documents are actually read or written. Curious but true. I would have thought those things to be fairly fundamental, but I guess the HTML heritage as well as different priorities in the W3C left those details to the implementers. The draft Level 3 requirements do finally deal with such things. The fact that actually reading or writing XML documents isn't specified in Level 2 really isn't too much of a problem since most XML libraries provide the functionality. However, because it isn't specified there are often differences in the particular methods used. We'll see this as we develop the C++ and Java code later in the book.

One final note on the DOM: While many implementations use the exact W3C method and object names in their libraries, Microsoft's MSXML library sometimes slightly modifies the names specified by the DOM. It also allows many of the DOM object properties (or attributes) to be accessed directly rather than just through get methods. This means that the C++ code in this book may have to be modified if you want to use it with a different C++ DOM library. (At least a global search and replace will be required. Other changes will probably be required as well, but those are beyond the scope of this book.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset