Input Validation in XMLToCSVBasic.java

JAXP performs validation in the DocumentBuilder's parse method. There is no special “validate” method as such. We tell JAXP that we want validation by specifying the type of DocumentBuilder the DocumentBuilderFactory should make. This is kind of like knowing that you want to build a barbecue out of titanium instead of stainless steel and calling the toolmaker to have him make tools to work with the former rather than the latter.

So, what options do we want to set on the DocumentBuilderFactory? Aside from the obvious one of telling it we want to validate, there are a few others. Here's the relevant code added to XMLToCSVBasic.java.

Validation Code in XMLToCSVBasic.java
//  Set up DOM XML environment
DocumentBuilderFactory Factory =
  DocumentBuilderFactory.newInstance();

//  Set the factory to create a Document Builder that
//    is:
//  Namespace aware - necessary for schema validation
Factory.setNamespaceAware(true);
//  Ignores whitespace on Element only nodes
Factory.setIgnoringElementContentWhitespace(true);
//  Ignores comments
Factory.setIgnoringComments(true);
//  Set the schema language - these attributes are
//  specific to Xerces2
Factory.setAttribute(JAXPConstants.JAXP_SCHEMA_LANGUAGE,
          JAXPConstants.W3C_XML_SCHEMA);
 //  Validating, if requested
if (boValidate)
{
  Factory.setValidating(true);
}

//  Create the new document builder
DocumentBuilder Builder = Factory.newDocumentBuilder();

We set the options on the Factory after creating a new instance. We first set the Factory to be aware of namespaces. To validate an instance document against a schema the DocumentBuilder must at least be able to handle the XMLSchema-instance namespace where the noNamespaceSchemaLocation Attribute lives. The next two properties don't directly affect validation, but we're interested in them anyway. As we mentioned in Chapter 2, Xerces returns “ignorable whitespace” as Text Nodes unless you specifically tell it not to. We're telling it not to. We're also telling it to ignore comments since we don't care about them. The setAttribute method on the DocumentBuilderFactory is the next method that directly deals with validation. JAXP provides this mechanism for the purpose of setting parameters that govern the behavior of the underlying parser. In this case we're telling Xerces to use the W3C XML Schema language as its schema language when doing validation. These constants are defined in the org.apache.xerces.jaxp.JAXP Constants interface, and I have chosen to use them from that source. However, literal values are also accepted. JAXP_SCHEMA_LANGUAGE has a value of:

http://java.sun.com/xml/jaxp/properties/schemaLanguage

W3C_XML_SCHEMA has a value of:

http://www.w3.org/2001/XMLSchema

If we don't set this attribute of the Factory, we get validation against a DTD.

The last option we set configures the Factory to build a validating parser via the setValidating method. Note that if we configured a validating parser and an instance document does not specify a schema through either the no Namespace Schema Location or the schemaLocation Attributes, the parser throws an exception.

So, even though there is a bit of configuration that we have to do, input validation in Java with JAXP and Xerces is not very involved. Validation errors are handled in the same fashion as general parsing errors from instance documents that aren't well formed. They throw SAX exceptions that we catch and report using the SAXExceptionHandler.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset