1.7. Testing XML

Quality control is an important feature of XML. If XML is to be a universal language, working the same way everywhere and every time, the standards for data integrity have to be high. Writing an XML document from start to finish without making any mistakes in markup syntax is just about impossible, as any markup error can trip up an XML processor and lead to unpredictable results. Fortunately, there are tools available to test and diagnose problems in your document.

The first level of error checking determines whether a document is well-formed. Documents that fail this test usually have simple problems such as a misspelled tag or missing delimiting character. A well-formedness checker, or parser, is a program that sniffs out such mistakes and tells you in which file and at what line number they occur. When editing an XML document, use a well-formedness checker to make sure you haven't left behind any broken markup; then, if the parser finds errors, go back, fix them, and test again.

Of course, well-formedness checking can't catch mistakes like forgetting the cast list for a play or omitting your name on an essay you've written. Those aren't syntactic mistakes, but rather contextual ones. Consequently, your well-formedness checker will tell you the document is well-formed, and you won't know your mistake until it's too late.

The solution is to use a document model validator, or validating parser. A validating parser goes beyond well-formedness checkers to find mistakes you might not catch, such as missing elements or improper order of elements. As mentioned earlier, a document model is a description of how a document should be structured: which elements must be included, what the elements can contain, and in what order they occur. When used to test documents for contextual mistakes, the validating parser becomes a powerful quality-control tool.

The following listing shows an example of the output from a validating parser after it has found several mistakes in a document:

% nsgmls -sv /usr/local/sp/pubtext/xml.dcl book.xml
/usr/local/prod/bin/nsgmls:I: SP version "1.3.3"
/usr/local/prod/bin/nsgmls:ch01.xml:54:13:E: document type does not
allow element "itemizedlist" here
/usr/local/prod/bin/nsgmls:ch01.xml:57:0:W: character "<" is the first
character of a delimiter but occurred as data
/usr/local/prod/bin/nsgmls:ch01.xml:57:0:E: character data is not
allowed here

The first error message complains that an <itemizedlist> (a bulleted list) appears where it shouldn't (in this case, inside a paragraph). This is an example of a contextual error that a well-formedness checker would not report. The second error indicates that a special markup character (<) was found among content characters instead of in a markup tag. This is a syntactic error that a well-formedness checker would find, too.

Most of the best validating parsers are free, so you can't go wrong. For more information, read Michael Classen's excellent article for a comparison of the most popular parsers (http://webreference.com/xml/column22). A few common validating parsers are described here:


Xerces

Produced by the Apache XML Project (the same folks who brought you the Apache web server), Xerces is a validating parser with both Java and C++ versions. It supports DTDs as well as the newer XML Schema standard for document models.


nsgmls

Created by the prolific developer James Clark, nsgmls is a freeware validating parser that is fast and multi-featured. Originally written for SGML document parsing, it is also compatible with XML.


XML4J and XML4C

Developed by IBM's alphaWorks R&D Labs, these are powerful validating parsers that are written in Java and C++, respectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset