4.4. Implementing XML-Based Applications

You must make some decisions when formulating the implementation of an XML-based application. Briefly, you have to choose the XML programming models for your application. Note that multiple programming models are available and these models may be relevant for different situations—models may be complementary or even competing. As such, your application may use different models, sometimes even in conjunction with one another. That is, you may have to combine programming models in what may be called XML processing pipelines.

You may also have to consider and address other issues. For example, you may have to determine how to resolve external entity references in a uniform way across your application regardless of the programming models used.

4.4.1. Choosing an XML Processing Programming Model

A J2EE developer has the choice of four main XML processing models, available through the following APIs:

  1. Simple API for XML Parsing (SAX), which provides an event-based programming model

  2. Document Object Model (DOM), which provides an in-memory tree-traversal programming model

  3. XML data-binding, which provides an in-memory Java content class-bound programming model

  4. eXtensible Stylesheet Language Transformations (XSLT), which provides a template-based programming model

The most common processing models are SAX and DOM. These two models along with XSLT are available through the JAXP APIs. (See “Java APIs for XML Processing” on page 41.) The XML data binding model is available through the JAXB technology. (See “Emerging Standards” on page 40.)

Processing an XML document falls into two categories. Not only does it encompass parsing a source XML document so that the content is available in some form for an application to process, processing also entails writing or producing an XML document from content generated by an application. Parsing an XML representation into an equivalent data structure usable by an application is often called deserialization, or unmarshalling. Similarly, writing a data structure to an equivalent XML representation is often called serialization or marshalling. Some processing models support both processing types, but others, such as SAX, do not.

Just as you would avoid manually parsing XML documents, you should avoid manually constructing XML documents. It is better to rely on higher-level, reliable APIs (such as DOM, and DOM-like APIs, or JAXB technology) to construct XML documents, because these APIs enforce the construction of well-formed documents. In some instances, these APIs also allow you to validate constructed XML documents.

Now let's take a closer look at these XML processing APIs.

4.4.1.1. SAX Programming Model

When you use SAX to process an XML document, you have to implement event handlers to handle events generated by the parser when it encounters the various tokens of the markup language. Because a SAX parser generates a transient flow of these events, it is advisable to process the source document in the following fashion. Intercept the relevant type of the events generated by the parser. You can use the information passed as parameters of the events to help identify the relevant information that needs to be extracted from the source document. Once extracted from the document, the application logic can process the information.

Typically, with SAX processing, an application may have to maintain some context so that it can logically aggregate or consolidate information from the flow of events. Such consolidation is often done before invoking or applying the application's logic. The developer has two choices when using SAX processing:

  1. The application can “on the fly” invoke the business logic on the extracted information. That is, the logic is invoked as soon as the information is extracted or after only a minimal consolidation. With this approach, referred to as stream processing, the document can be processed in one step.

  2. The application invokes the business logic after it completes parsing the document and has completely consolidated the extracted information. This approach takes two steps to process a document.

Note that what we refer to as consolidated information may in fact be domain-specific objects that can be directly passed to the business logic.

Stream processing (the first approach) lets an application immediately start processing the content of a source document. Not only does the application not have to wait for the entire document to be parsed, but, in some configurations, the application does not have to wait for the entire document to be retrieved. This includes retrieving the document from an earlier processing stage when implementing pipelines, or even retrieving the document from the network when exchanging documents between applications.

Stream processing, while it offers some performance advantages, also has some pitfalls and issues that must be considered. For instance, a document may appear to be well-formed and even valid for most of the processing. However, there may be unexpected errors by the end of the document that cause the document to be broken or invalid. An application using stream processing notices these problems only when it comes across erroneous tokens or when it cannot resolve an entity reference. Or, the application might realize the document is broken if the input stream from which it is reading the document unexpectedly closes, as with an end-of-file exception. Thus, an application that wants to implement a stream processing model may have to perform the document parsing and the application's business logic within the context of a transaction. Keeping these operations within a transaction leverages the container's transaction capabilities: The container's transaction mode accounts for unexpected parsing errors and rolls back any invalidated business logic processing.

With the second approach, parsing the document and applying business logic are performed in two separate steps. Before invoking the application's business logic, the application first ensures that the document and the information extracted from the document are valid. Once the document data is validated, the application invokes the business logic, which may be executed within a transaction if need be.

The SAX programming model provides no facility to produce XML documents. However, it is still possible to generate an XML document by initiating a properly balanced sequence of events—method calls—on a custom serialization handler. The handler intercepts the events and, using an XSLT identity transformation operation, writes the events in the corresponding XML syntax. The difficulty for the application developer lies in generating a properly balanced sequence of events. Keep in mind, though, that generating this sequence of events is prone to error and should be considered only for performance purposes.

SAX generally is very convenient for extracting information from an XML document. It is also very convenient for data mapping when the document structure maps well to the application domain-specific objects—this is especially true when only part of the document is to be mapped. Using SAX has the additional benefit of avoiding the creation of an intermediate resource-consuming representation. Finally, SAX is good for implementing stream processing where the business logic is invoked in the midst of document processing. However, SAX can be tedious to use for more complex documents that necessitate managing sophisticated context, and in these cases, developers may find it better to use DOM or JAXB.

In summary, consider using the SAX processing model when any of the following circumstances apply:

You are familiar with event-based programming.

Your application only consumes documents without making structural modifications to them.

The document must only be processed one time.

You have to effectively extract and process only parts of the document.

Memory usage is an issue or documents may potentially be very large.

You want to implement performant stream processing, such as for dealing with very large documents.

The structure of a document and the order of its information map well to the domain-specific objects or corresponds to the order in which discrete methods of the application's logic must be invoked. Otherwise, you may have to maintain rather complicated contexts.

Note that the SAX model may not be the best candidate for application developers who are more concerned about implementing business logic.

4.4.1.2. DOM Programming Model

With the DOM programming model, you write code to traverse a tree-like data structure created by the parser from the source document. Typically, processing the XML input data is done in a minimum of two steps, as follows:

1.
The DOM parser creates a tree-like data structure that models the XML source document. This structure is called a DOM tree.

2.
The application code walks the DOM tree, searching for relevant information that it extracts, consolidates, and processes further. Developers can use consolidated information to create domain-specific objects. The cycle of searching for, extracting, and processing the information can be repeated as many times as necessary because the DOM tree persists in memory.

There are limitations to the DOM model. DOM was designed to be both a platform- and language-neutral interface. Because of this, the Java binding of the DOM API is not particularly Java friendly. For example, the binding does not use the java.util.Collection API. Generally, DOM is slightly easier to use than the SAX model. However, due to the awkwardness of DOM's Java binding, application developers who are focused on the implementation of the business logic may still find DOM difficult to use effectively. For this reason, similarly with SAX, application developers should be shielded as much as possible from the DOM model.

In addition, the DOM API prior to version level 3 does not support serialization of DOM trees back to XML. Although some implementations do provide serialization features, these features are not standard. Thus, developers should instead rely on XSLT identity transformations, which provide a standard way to achieve serialization back to XML.

Java developers can also use other technologies, such as JDOM and dom4j, which have similar functionality to DOM. The APIs of these technologies tend to be more Java-friendly than DOM, plus they interface well with JAXP. They provide a more elaborate processing model that may alleviate some of DOM's inherent problems, such as its high memory usage and the limitation of processing document content only after a document has been parsed.

Although not yet standard for the Java platform (not until JAXP 1.3), the Xpath API enables using it in conjunction with the DOM programming model. (The Xpath API can be found along with some DOM implementations such as Xerces or dom4j.) Developers use Xpath to locate and extract information from a source document's DOM tree. By allowing developers to specify path patterns to locate element content, attribute values, and subtrees, Xpath not only greatly simplifies, but may even eliminate, tree-traversal code. Since Xpath expressions are strings, they can be easily parameterized and externalized in a configuration file. As a result, developers can create more generic or reusable document processing programs.

To sum up, consider using DOM when any of these circumstances apply:

You want to consume or produce documents.

You want to manipulate documents and need fine-grained control over the document structure that you want to create or edit.

You want to process the document more than once.

You want random access to parts of the document. For example, you may want to traverse back and forth within the document.

Memory usage is not a big issue.

You want to implement data binding but you cannot use JAXB technology because the document either has no schema or it conforms to a DTD schema definition rather than to an XSD schema definition. The document may also be too complex to use SAX to implement data binding. (See “SAX Programming Model” on page 165.)

You want to benefit from the flexibility of Xpath and apply Xpath expressions on DOM trees.

4.4.1.3. XML Data-Binding Programming Model

The XML data-binding programming model, contrary to the SAX and DOM models, allows the developer to program the processing of the content of an XML document without being concerned with XML document representations (infosets).

Using a binding compiler, the XML data-binding programming model, as implemented by JAXB, binds components of a source XSD schema to schema-derived Java content classes. JAXB binds an XML namespace to a Java package. XSD schema instance documents can be unmarshalled into a tree of Java objects (called a content tree), which are instances of the Java content classes generated by compiling the schema. Applications can access the content of the source documents using JavaBeans-style get and set accessor methods. In addition, you can create or edit an in-memory content tree, then marshal it to an XML document instance of the source schema. Whether marshalling or unmarshalling, the developer can apply validation to ensure that either the source document or the document about to be generated satisfy the constraints expressed in the source schema.

The steps for using JAXB technology schema-derived classes to process an incoming XML document are very simple, and they are as follows:

1.
Set up the JAXB context (JAXBContext) with the list of schema-derived packages that are used to unmarshal the documents.

2.
Unmarshal an XML document into a content tree. Validation of the document is performed if enabled by the application.

3.
You can then directly apply the application's logic to the content tree. Or, you can extract and consolidate information from the content tree and then apply the application's logic on the consolidated information. As described later, this consolidated information may very well be domain-specific objects that may expose a more adequate, schema-independent interface.

This programming model also supports serialization to XML, or marshalling a content tree to an XML format. Marshalling of a document has the following steps:

1.
Modify an existing content tree, or create a new tree, from the application's business logic output.

2.
Optionally, validate in-memory the content tree against the source schema. Validation is performed in-memory and can be applied independently of the marshalling process.

3.
Marshal the content tree into an XML document.

There are various ways a developer can design an application with the schema-derived classes that JAXB generates:

  1. The developer may use them directly in the business logic, but, as noted in “Choosing Processing Models” on page 151, this tightly binds the business logic to the schemas from which the classes were generated. This type of usage shares most of the issues of a document-centric processing model.

  2. The developer can use the schema-derived classes in conjunction with an object-centric processing model:

    1. The developer may design domain-specific classes whose instances will be populated from the content objects created by unmarshalling an XML document, and vice versa.

    2. The developer may design domain-specific classes, which inherit from the schema-derived classes, and define additional domain-oriented methods. The problem with this design is that these classes are tightly coupled to the implementation of the schema-derived classes, and they may also expose the methods from the schema-derived classes as part of the domain-specific class. Additionally, as a side effect, if the developer is not careful, this may result in tightly binding the business logic to the schemas from which the classes were generated.

    3. The developer can use aggregation or composition and design domain-specific classes that only expose domain-oriented methods and delegate to the schema-derived classes. Since the domain-specific classes only depend on the interfaces of the schema-derived classes, the interfaces of the domain-specific classes may therefore not be as sensitive to changes in the original schema.

Note that when no well-defined schema or variable in nature (abstract) or in number (including new schema revisions) is available, JAXB may be cumbersome to use due to the tight coupling between the schemas and the schema-derived classes. Also note that the more abstract the schema, the less effective the binding.

Consider using the XML data-binding programming model, such as JAXB, when you have any of the following conditions:

You want to deal directly with plain Java objects and do not care about, nor want to handle, document representation.

You are consuming or producing documents.

You do not need to maintain some aspects of a document, such as comments and entity references. The JAXB specification does not require giving access to the underlying document representation (infoset). For example, the JAXB reference implementation is based on SAX 2.0 and therefore does not maintain an underlying document representation. However, other implementations may be layered on top of a DOM representation. A developer may fall back on this DOM representation to access unexposed infoset elements.

You want to process the content tree more than once.

You want random access to parts of the document. For example, you may want to traverse back and forth within the document.

Memory usage may be less of an issue. A JAXB implementation, such as the standard implementation, creates a Java representation of the content of a document that is much more compact than the equivalent DOM tree. The standard implementation is layered on top of SAX 2.0 and does not maintain an additional underlying representation of the source document. Additionally, where DOM represents all XML schema numeric types as strings, JAXB's standard implementation maps these values directly to much more compact Java numeric data types. Not only does this use less memory to represent the same content, the JAXB approach saves time, because converting between the two representations is not necessary.

You previously were implementing XML data-binding manually with DOM, and an XSD schema definition is available.

4.4.1.4. XSLT Programming Model

XSLT is a higher-level processing model than the SAX, DOM, and XML data- binding models. Although developers can mimic XSLT by implementing transformations programmatically on top of SAX, DOM, or the XML data-binding model, XSLT does not compare with other processing models and should be regarded as complementary, to be used along with these other models.

XSLT implements a functional declarative model as opposed to a procedural model. This requires skills that are quite different from Java programming skills. For the most part, XSLT requires developers to code rules, or templates, that are applied when specified patterns are encountered in the source document. The application of the rules adds new fragments or copies fragments from the source tree to a result tree. The patterns are expressed in the Xpath language, which is used to locate and extract information from the source document.

Instead of writing Java code (as with SAX, DOM, and the XML data-binding model), developers using XSLT principally write style sheets, which are themselves XML documents. (Invoking the XSLT engine, however, does require the developer to write Java code.) Compared to the other programming models, XSLT programming gives developers the sort of flexibility that comes with scripting. In an XML-based application, XSLT processing is usually used along with the other three processing models. The XSLT API available with JAXP provides an abstraction for the source and result of transformations, allowing the developer not only the ability to chain transformations but also to interface with other processing models, such as SAX, DOM, and JAXB technology. To interface with SAX and DOM, use the classes SAXSource, SAXResult, DOMSource, and DOMResult provided by JAXP. To interface with JAXB, use the classes JAXBSource and JAXBResult.

By definition, XSLT supports not only processing XML input documents but it also can output XML documents. (Other output methods include text, HTML, and so forth.) Note that although the DOM version level 2 API does not support serialization—that is, transformation of a DOM tree to an XML document—the JAXP implementation of XSLT addresses the serialization of a DOM tree using an identity transformer. An identity transformer copies a source tree to a result tree and applies the specified output method, thus solving the serialization problem in an easy, implementation-independent manner. For example, to output in XML, the output method is set to xml. XSLT can also be used to serialize to XML from DOM trees, SAX events, and so forth.

Consider using XSLT when any of the following circumstances apply:

You want to change the structure, insert, remove, rename, or filter content of an XML document.

You potentially have more than one transformation for the same document. Although one transformation can be hand coded using another API, multiple transformations, because of the scripting nature of style sheets, are better done using XSLT transformations.

You have to perform complex transformations. Because of XSLT's functional declarative model, it is easier to design complex transformations by coding individual rules or templates than by hard-coding procedures.

You want the ability to be flexible and leave room for future changes in the schemas of documents you are processing.

You want to process documents that contain a significant amount of data to minimize performance overhead.

You need to transform a document for non-interactive presentation or in batch mode. The performance overhead of transformation is usually less of an issue with non-interactive presentations. Such a document might be a purchase order or an invoice, for example.

You must support multiple external schemas but you want to internally program only against a generic schema (schema adapter).

You want to promote the separation of skills between XML transformation style sheet developers and business logic developers.

In general, when you must deal with non-interactive presentation or you must integrate various XML data sources or perform XML data exchanges.

4.4.1.5. Recommendation Summary

In summary, choose the programming model and API processing according to your needs. If you need to deal with the content and structure of the document, consider using DOM and SAX because they provide more information about the document itself than JAXB usually does. On the other hand, if your focus is more on the actual, domain-oriented objects that the document represents, consider using JAXB, since JAXB hides the details of unmarshalling, marshalling, and validating the document. Developers should use JAXB—XML data-binding—if the document content has a representation in Java that is directly usable by the application (that is, close to domain-specific objects).

DOM, when used in conjunction with XPath, can be a very flexible and powerful tool when the focus is on the content and structure of the document. DOM may be more flexible than JAXB when dealing with documents whose schemas are not well-defined.

Finally, use XSLT to complement the three other processing models, particularly in a pre- or post-processing stage.

Figure 4.13 summarizes the different programming models from which the developer can choose and highlights the intermediary representations (which have a direct impact on performance) implied by each of them. Table 4.1 summarizes the features of the three most prevalent XML programming models.

Figure 4.13. Programming Models and Implied Intermediary Representations


Table 4.1. DOM, SAX, and XML Data-Binding Programming Models
DOMSAXXML Data-Binding
Tree traversal modelEvent-based modelJava-bound content tree model
Random access (in-memory data structure) using generic (application independent) APISerial access (flow of events) using parameters passed to eventsRandom access (in-memory data structure) using Java-Beans style accessors
High memory usage (The document is often completely loaded in memory, though some techniques such as deferred or lazy DOM node creation may lower the memory usage.)Low memory usage (only events are generated)Intermediate memory usage (The document is often completely loaded in memory, but the Java representation of the document is more effective than a DOM representation. Nevertheless, some implementations may implement techniques to lower the memory usage.)
To edit the document (processing the in-memory data structure)To process parts of the document (handling relevant events)To edit the document (processing the in-memory data structure)
To process multiple times (document loaded in memory)To process the document only once (transient flow of events)To process multiple times (document loaded in memory)
Processing once the parsing is finishedStream processing (start processing before the parsing is finished, and even before the document is completely read)Processing once the parsing is finished

4.4.2. Combining XML Processing Techniques

The JAXP API provides support for chaining XML processings: The JAXP javax.xml.Source and javax.xml.Result interfaces constitute a standard mechanism for chaining XML processings. There are implementations of these two interfaces for DOM, SAX, and even streams; JAXB and other XML processing technologies, such as JDOM and dom4j, provide their own implementations as well.

Basically, XML processings can be chained according to two designs:

  • The first design wraps the result of one processing step into a Source object that can be processed by the next step. XML processing techniques that can produce an in-memory representation of their results, such as DOM and JAXB, lend themselves to this design. This design is often called “batch sequential” because each processing step is relatively independent and each runs to completion until the next step begins.

  • The second design, called “stream processing” or “pipes and filters,” creates a chain of filters and each filter implements a processing step. XML processing techniques such as SAX work well with this design.

The two designs can, of course, be combined. When combined, transformations and identity transformations become handy techniques to use when the result of a processing step cannot be directly wrapped into a Source object compatible with the next step. JAXP also provides support for chaining transformations with the use of javax.xml.transform.sax.SAXTransformerFactory.

Code Example 4.10 illustrates an XML processing pipeline that combines SAX and XSLT to validate an incoming purchase order document, extract on-the-fly the purchase order identifier, and transform the incoming document from its external, XSD-based schema to the internal, DTD-based schema supported by the business logic. The code uses a SAX filter chain as the Source of a transformation. Alternatively, the code could have used a SAXTransformerFactory to create an org.xml.sax.XMLFilter to handle the transformation and then chain it to the custom XMLFilter, which extracts the purchase order identifier.

Code example 4.10. Combining SAX and XSLT to Perform XML Processing Steps
public class SupplierOrderXDE extends
          XMLDocumentEditor.DefaultXDE {
   public static final String DEFAULT_ENCODING = "UTF-8";
   private XMLFilter filter;
   private Transformer transformer;
   private Source source = null;
   private String orderId = null;

   public SupplierOrderXDE(boolean validating, ...) {
      // Create a [validating] SAX parser
      SAXParser parser = ...;
      filter = new XMLFilterImpl(parser.getXMLReader()) {
         // Implements a SAX XMLFilter that extracts the OrderID
						// element value and assigns it to the orderId attribute
      };
      // Retrieve the style sheet as a stream
      InputStream stream = ...;
      // Create a transformer from the stylesheet
      transformer = TransformerFactory.newInstance()
         .newTransformer(new StreamSource(stream));
   }
   // Sets the document to be processed
   public void setDocument(Source source) throws ... {
      this.source = source;
   }
   // Builds an XML processing pipeline (chaining a SAX parser and
						// a style sheet transformer) which validates the source document
						// extracts its orderId, transforms it into a different format,
						// and copies the resulting document into the Result object
   public void copyDocument(Result result) throws ... {
      orderId = null;
      InputSource inputSource
         = SAXSource.sourceToInputSource(source);
      SAXSource saxSource = new SAXSource(filter, inputSource);
      transformer.transform(transformer, saxSource, result);
   }
   // Returns the processed document as a Source object
   public Source getDocument() throws ... {
      return new StreamSource(new StringReader(
         getDocumentAsString()));
   }
   // Returns the processed document as a String object
   public String getDocumentAsString() throws ... {
      ByteArrayOutputStream stream = new ByteArrayOutputStream();
      copyDocument(new StreamResult(stream));
      return stream.toString(DEFAULT_ENCODING);
   }
   // Returns the orderId value extracted from the source document
   public String getOrderId() {
      return orderId;
   }
}

4.4.3. Entity Resolution

As mentioned earlier in this chapter, XML documents may contain references to other external XML fragments. Parsing replaces the references with the actual content of these external fragments. Similarly, XML schemas may refer to external type definitions, and these definitions must also be accessed for the schema to be completely interpreted. (This is especially true if you follow the modular design recommendations suggested in “Designing Domain-Specific XML Schemas” on page 131.) In both cases, an XML processor, in the course of processing a document or a schema, needs to find the content of any external entity to which the document or schema refers. This process of mapping external entity references to their actual physical location is called entity resolution. Note that entity resolution recursively applies to external entity references within parsed external entities.

Entity resolution is particularly critical for managing the XML schemas upon which your application is based. As noted in “Validating XML Documents” on page 139, the integrity and the reliability of your application may depend on the validation of incoming documents against specific schemas—typically the very same schemas used to initially design your application. Your application usually cannot afford for these schemas to be modified in any way, whether by malicious modifications or even legitimate revisions. (For revisions, you should at a minimum assess the impact of a revision on your application.)

Therefore, you may want to keep your own copies of the schemas underlying your application and redirect references to these copies. Or you may want to redirect any such references to trusted repositories. A custom entity resolution allows you to implement the desired mapping of external entity references to actual trusted physical locations. Moreover, implementing an entity catalog—even as simple as the one presented in Code Example 4.12—gives you more flexibility for managing the schemas upon which your application depends. Note that to achieve our overall goal, the entity catalog must itself be adequately protected. Additionally, redirecting references to local copies of the schemas may improve performance when compared to referring to remote copies, especially for remote references across the Internet. As described in “Reduce the Cost of Referencing External Entities” on page 189, performance can be improved further by caching in memory the resolved entities.

Code Example 4.11 illustrates an entity resolver that implements an interface from the SAX API (org.xml.sax.EntityResolver). The entity resolver uses a simple entity catalog to map external entity references to actual locations. The entity catalog is simply implemented as a Properties file that can be loaded from a URL (see Code Example 4.12). When invoked, this entity resolver first tries to use the catalog to map the declared public identifier or URI of the external entity to an actual physical location. If this fails—that is, if no mapping is defined in the catalog for this public identifier or URI—the entity resolver uses the declared system identifier or URL of the external entity for its actual physical location. In both cases, the resolver interprets the actual physical location either as an URL or, as a fall-back, as a Java resource accessible from the class path of the entity resolver class. The latter case allows XML schemas to be bundled along with their dependent XML processing code. Such bundling can be useful when you must absolutely guarantee the consistency between the XML processing code and the schemas.

Code example 4.11. Entity Resolver Using a Simple Entity Catalog
public class CustomEntityResolver implements EntityResolver {
   private Properties entityCatalog = null;

   public CustomEntityResolver(URL entityCatalogURL)
          throws IOException {
      entityCatalog = new Properties(entityCatalog);
      entityCatalog.load(entityCatalogURL.openStream());
   }
   // Opens the physical location as a plain URL or if this fails, as
						// a Java resource accessible from the class path.
   private InputSource openLocation(String location)
          throws IOException {
      URL url = null;
      InputStream entityStream = null;
      try { // Wellformed URL?
         url = new URL(location);
      } catch (MalformedURLException exception) { ... }
      if (url != null) { // Wellformed URL.
         try { // Try to open the URL.
            entityStream = url.openStream();
         } catch (IOException exception) { ... }
      }
      if (entityStream == null) { // Not a URL or not accessible.
         try { // Resource path?
            String resourcePath = url != null
               ? url.getPath() : location;
            entityStream
               = getClass().getResourceAsStream(resourcePath);
         } catch (Exception exception1) { ... }
      }
      if (entityStream != null) { // Readable URL or resource.
         InputSource source = new InputSource(entityStream);
         source.setSystemId(location);
         return source;
      }
      return null;
   }
   // Maps an external entity URI or Public Identifier to a
						// physical location.
   public String mapEntityURI(String entityURI) {
      return entityCatalog.getProperty(entityURI);
   }

   public InputSource resolveEntity(String entityURI,
          String entityURL) {
      InputSource source = null;
      try {
         // Try first to map its URI/PublicId using the catalog.
         if (entityURI != null) {
            String mappedLocation = mapEntityURI(entityURI);
            if (mappedLocation != null) {
               source = openLocation(mappedLocation);
               if (source != null) { return source; }
            }
         }
         // Try then to access the entity using its URL/System Id.
         if (entityURL != null) {
            source = openLocation(entityURL);
            if (source != null) { return source; }
         }
      }
   } catch (Exception exception) { ... }
      return null; // Let the default entity resolver handle it.
   }
}

Code example 4.12. A Simple Entity Catalog
						# DTD Public Indentifier to physical location (URL or resource path)
-//Sun Microsystems, Inc. - J2EE Blueprints Group//DTD LineItem 1.0//EN: /com/sun/j2ee
/blueprints/xmldocuments/rsrc/schemas/LineItem.dtd
-//Sun Microsystems, Inc. - J2EE Blueprints Group//DTD Invoice 1.0//EN: /com/sun/j2ee
/blueprints/xmldocuments/rsrc/schemas/Invoice.dtd
# XSD URI to physical location (URL or resource path)
http://blueprints.j2ee.sun.com/LineItem: /com/sun/j2ee/blueprints/xmldocuments/rsrc
/schemas/LineItem.xsd
http://blueprints.j2ee.sun.com/Invoice: /com/sun/j2ee/blueprints/xmldocuments/rsrc
/schemas/Invoice.xsd

Although simple implementations such as Code Example 4.11 solve many of the common problems related to external entity management, developers should bear in mind that better solutions are on the horizon. For example, organizations such as the Oasis Consortium are working on formal XML catalog specifications. (See the Web site http://www.oasis-open.org for more information on entity resolution and XML catalogs.)

In summary, you may want to consider implementing a custom entity resolution—or, even better, resort to a more elaborate XML catalog solution—in the following circumstances:

To protect the integrity of your application against malicious modification of external schemas by redirecting references to secured copies (either local copies or those on trusted repositories)

During design and, even more so, during production, to protect your application against unexpected, legitimate evolution of the schemas upon which it is based. Instead, you want to defer accounting for this evolution until after you have properly assessed their impact on your application.

To improve performance by maintaining local copies of otherwise remotely accessible schemas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset