At runtime, one means of validating XML documents from
Python is using xmlproc
in
conjunction with its callback interfaces and parser API. By implementing
both the ErrorHandler
and DTDConsumer
interfaces, you can capture events
about validity errors within the document (via ErrorHandler
) and events about the DTD’s
structure (via DTDConsumer
).
To catch errors in the validity of the document, you can implement
the ErrorHandler
interface and
provide it to the XMLValidator
, all
part of xmlproc
. Create the file
xpHandlers.py and add the BadOrderErrorHandler
class to it, as shown in
Example 7-4.
from xml.parsers.xmlproc.xmlapp import DTDConsumer from xml.parsers.xmlproc.xmlapp import ErrorHandler """ BadOrderErrorHandler -- implement xmlproc's ErrorHandler Interface """ class BadOrderErrorHandler(ErrorHandler): def warning(self,msg): print "Warning received!:", msg def error(self,msg): print "Error received!: ", msg def fatal(self,msg): print "Fatal Error received!: ", msg
To catch events related to the construction of the DTD itself, you
can implement the DTDConsumer
interface. In order to do this, add the class to xpHandlers.py, as shown in Example 7-5.
""" DTDHandler -- implements xmlproc's DTDConsumer Interface """ class DTDHandler(DTDConsumer): def __init__(self,parser): self.parser=parser def dtd_start(self): print "Starting DTD..." def dtd_end(self): print "Finished DTD..." def new_general_entity(self,name,val): print "General Entity Received: ", name def new_external_entity(self,ent_name,pub_id,sys_id,ndata): print "External Entity Received: ", ent_name def new_element_type(self,elem_name,elem_cont): print "New Element Type Declaration: ", elem_name, "Content Model: ", elem_cont def new_attribute(self,elem,attr,a_type,a_decl,a_def): print "New Attribute Declaration: ", attr
Example 7-5 is self-explanatory. Each method represents an event related to the parsing of the DTD. Your methods can capture and utilize this information in any way you see fit.
Implementing the interfaces is where the real work
happens. To actually do productive work and use the validator
, you can create an instance, provide
it your interface objects, and set it to work on a particular resource.
The file val.py, shown in Example 7-6, contains the simple
amount of code to parse a document.
""" xml validation """ import sys from xml.parsers.xmlproc import xmlval from xpHandlers import BadOrderErrorHandler, DTDHandler xv = xmlval.XMLValidator( ) dt = DTDHandler(xv.parser) bh = BadOrderErrorHandler(xv.app.locator) xv.set_error_handler(bh) xv.set_dtd_listener(dt) xv.parse_resource(sys.argv[1])
You can use val.py to see if XML documents pass muster against their DTDs from the command line:
$ python val.py order.xml New Element Type Declaration: customer_name Content Model: ('', [('#PCDATA', '')], '') New Element Type Declaration: sku Content Model: ('', [('#PCDATA', '')], '') New Element Type Declaration: qty C ontent Model: ('', [('#PCDATA', '')], '') New Element Type Declaration: unit_price Content Model: ('', [('#PCDATA', '')], '') New Element Type Declaration: product_name Content Model: ('', [('#PCDATA', '')], '') New Element Type Declaration: order Content Model: (',', [('customer_name', ''), ('sku', ''), ('qty', ''), ('unit_price', ''), ('product_name', '')], '') Finished DTD...
By supplying xmlproc
’s
XMLValidator
with handlers, you can
capture the information related to a document’s validity to suit your
needs. In the next section, we put validation to the test by creating a
translation and validation example that runs on a web server.