Working with SAX

This first example will show how to work with SAX. In this case, I'll use SAX to count the number of <CUSTOMER> elements in customer.xml, just as the first example in the previous chapter did. Here's customer.xml:

<?xml version = "1.0" standalone="yes"?>
<DOCUMENT>
    <CUSTOMER>
        <NAME>
            <LAST_NAME>Smith</LAST_NAME>
            <FIRST_NAME>Sam</FIRST_NAME>
        </NAME>
        <DATE>October 15, 2001</DATE>
        <ORDERS>
            <ITEM>
                <PRODUCT>Tomatoes</PRODUCT>
                <NUMBER>8</NUMBER>
                <PRICE>$1.25</PRICE>
            </ITEM>
            <ITEM>
                <PRODUCT>Oranges</PRODUCT>
                <NUMBER>24</NUMBER>
                <PRICE>$4.98</PRICE>
            </ITEM>
        </ORDERS>
    </CUSTOMER>
    <CUSTOMER>
        <NAME>
            <LAST_NAME>Jones</LAST_NAME>
            <FIRST_NAME>Polly</FIRST_NAME>
        </NAME>
        <DATE>October 20, 2001</DATE>
        <ORDERS>
            <ITEM>
                <PRODUCT>Bread</PRODUCT>
                <NUMBER>12</NUMBER>
                <PRICE>$14.95</PRICE>
            </ITEM>
            <ITEM>
                <PRODUCT>Apples</PRODUCT>
                <NUMBER>6</NUMBER>
                <PRICE>$1.50</PRICE>
            </ITEM>
        </ORDERS>
    </CUSTOMER>
    <CUSTOMER>
        <NAME>
            <LAST_NAME>Weber</LAST_NAME>
            <FIRST_NAME>Bill</FIRST_NAME>
        </NAME>
        <DATE>October 25, 2001</DATE>
        <ORDERS>
            <ITEM>
                <PRODUCT>Asparagus</PRODUCT>
                <NUMBER>12</NUMBER>
                <PRICE>$2.95</PRICE>
            </ITEM>
            <ITEM>
                <PRODUCT>Lettuce</PRODUCT>
                <NUMBER>6</NUMBER>
                <PRICE>$11.50</PRICE>
            </ITEM>
        </ORDERS>
    </CUSTOMER>
</DOCUMENT>

Here, I'll base the new program on a new class named FirstParserSAX. We'll need an object of that class to pass to the SAX parser so that it can call the methods in that object when it encounters elements, the start of the document, the end of the document, and so on. I begin by creating an object of the FirstParserSAX class named SAXHandler:

import org.xml.sax.*;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX
{
    public static void main(String[] args)
    {
        FirstParserSAX SAXHandler = new FirstParserSAX();
            .
            .
            .
    }
}

Next, I create the actual SAX parser that we'll work with. This parser is an object of the org.apache.xerces.parsers.SAXParser class (just like the DOM parser objects we worked with in the previous chapter were objects of the org.apache.xerces.parsers.DOMParser class). To use the SAXParser class, I import that class and the supporting classes in the org.xml.sax package, and I'm free to create a new SAX parser named parser:

import org.xml.sax.*;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX
{
    public static void main(String[] args)
    {
        FirstParserSAX SAXHandler = new FirstParserSAX();

        SAXParser parser = new SAXParser();
            .
            .
            .
    }
}

The SAXParser class is derived from the XMLParser class, which in turn is based on the java.lang.Object class:

java.lang.Object
  |
  +--org.apache.xerces.framework.XMLParser
        |
        +--org.apache.xerces.parsers.SAXParser

The constructor of the SAXParser class is SAXParser(); the methods of the SAXParser class are listed in Table 12.1. The constructor of the XMLParser class is protectedXMLParser(); the methods of the XMLParser class are listed in Table 12.2.

Table 12.1. SAXParser Methods
MethodDescription
void attlistDecl(int elementTypeIndex, int attrNameIndex, int attType, java.lang.String enumString, int attDefaultType, int attDefaultValue)Callback for attribute type declarations
void characters(char[] ch, int start, int length)Callback for characters that specifies the characters in an array
void comment(int dataIndex)Callback for comments
void commentInDTD(int dataIndex)Callback for comments in a DTD
void elementDecl(int elementType, XMLValidator.ContentSpec contentSpec)Callback for an element type declaration
void endCDATA()Callback for the end of CDATA sections
void endDocument()Callback for the end of a document
void endDTD()Callback for the end of the DTD
void endElement(int elementType)Callback for the end of an Element element
void endEntityReference(int entityName, int entityType, int entityContext)Callback for the end of an entity reference
void endNamespaceDeclScope(int prefix)Callback for the end of the scope of a namespace declaration
void externalEntityDecl(int entityName, int publicId, int systemId)Callback for a parsed external general entity declaration
void externalPEDecl(int entityName, int publicId, int systemId)Callback for a parsed external parameter entity declaration
ContentHandler getContentHandler()Instruction to get the content handler
protected DeclHandler getDeclHandler()Instruction to get the DTD declaration event handler
DTDHandler getDTDHandler()Instruction to get the current DTD handler
boolean getFeature(java.lang.String featureId)Instruction to get the state of a parser feature
java.lang.String[] getFeaturesRecognized()Instruction to get a list of features that the parser recognizes
protected LexicalHandler getLexicalHandler()Instruction to get the lexical handler for this parsers
protected boolean getNamespacePrefixes()Instruction to get the value of namespace prefixes
java.lang.String[] getPropertiesRecognized()Instruction to get a list of properties that the parser recognizes
java.lang.Object getProperty (java.lang.String propertyId)Instruction to get the value of a property
void ignorableWhitespace(char[] ch, int start, int length)Callback for ignorable whitespace
void internalEntityDecl(int entityName, int entityValue)Callback for an internal general entity declaration
void internalPEDecl(int entityName, int entityValue)Callback for an internal parameter entity declaration
void internalSubset(int internalSubset)Callback from DOM Level 2
void notationDecl(int notationName, int publicId, int systemId)Callback for notification of a notation declaration event
void processingInstruction(int piTarget, int piData)Callback for processing instructions
void processingInstructionInDTD (int piTarget, int piData)Callback for processing instructions in DTD
void setContentHandler(ContentHandler handler)Instruction to set a content handler to let an application handle SAX events
protected void setDeclHandler (DeclHandler handler)Instruction to set the DTD declaration event handler
void setDocumentHandler(DocumentHandler handler)Instruction to set the document handler
void setDTDHandler(DTDHandler handler)Instruction to set the DTD handler
void setFeature(java.lang.String featureId, boolean state)Instruction to set the state of any feature
protected void setLexicalHandler(LexicalHandler handler)Instruction to set the lexical event handler
protected void setNamespacePrefixes(boolean process)Specifier for how the parser reports raw prefixed names, as well as if xmlns attributes are reported
void setProperty(java.lang.String propertyId, java.lang.Object value)Instruction to set the value of any property
void startCDATA()Callback for the start of a CDATA section
void startDocument(int versionIndex, int encodingIndex, int standaloneIndex)Callback for the start of the document
void startDTD(int rootElementType, int publicId, int systemId)Callback for a <!DOCTYPE…> declaration
void startElement(int elementType, XMLAttrList attrList, int attrListIndex)Callback for the start of an element
void startEntityReference(int entityName, int entityType, int entityContext)Callback for the start of an entity reference
void startNamespaceDeclScope(int prefix, int uri)Callback for the start of the scope of a namespace declaration
void unparsedEntityDecl(int entityName, int publicId, int systemId, int notationName)Callback for an unparsed entity declaration event

Table 12.2. XMLParser Methods
MethodDescription
void addRecognizer(org.apache.xerces. readers.XMLDeclRecognizer recognizer)Adds a recognizer
abstract void attlistDecl(int elementType, int attrName, int attType, java.lang.String enumString, int attDefaultType, int attDefaultValue)Serves as a callback for an attribute list declaration
void callCharacters(int ch)Calls the characters callback
void callComment(int comment)Calls the comment callback
void callEndDocument()Calls the end document callback
boolean callEndElement(int readerId)Calls the end element callback
void callProcessingInstruction (int target, int data)Calls the processing instruction callback
void callStartDocument(int version, int encoding, int standalone)Calls the start document callback
void callStartElement(int elementType)Calls the start element callback
org.apache.xerces.readers.XMLEntityHandler. EntityReader changeReaders()Is called by the reader subclasses at the end of input
abstract void characters(char[] ch, int start, int length)Serves as a callback for characters
abstract void characters(int data)Serves as a callback for characters using string pools
abstract void comment(int comment)Serves as a callback for comment
void commentInDTD(int comment)Serves as a callback for comment in DTD
abstract void elementDecl(int elementType, XMLValidator.ContentSpec contentSpec)Serves as a callback for an element declaration
abstract void endCDATA()Serves as a callback for the end of the CDATA section
abstract void endDocument()Serves as a callback for the end of a document
abstract void endDTD()Serves as a callback for the end of the DTD
abstract void endElement(int elementType)Serves as a callback for the end of an element
void endEntityDecl()Serves as a callback for the end of an entity declaration
abstract void endEntityReference (int entityName, int entityType, int entityContext)Serves as a callback for an end of entity reference
abstract void endNamespaceDeclScope(int prefix)Serves as a callback for the end of a namespace declaration scope
java.lang.String expandSystemId(java.lang. String systemId)Expands a system ID and method returns the system ID as an URL
abstract void externalEntityDecl (int entityName, int publicId, int systemId)Serve as a callback for a external general entity declaration
abstract void externalPEDecl(int entityName, int publicId, int systemId)Serves as a callback for an external parameter entity declaration
protected boolean getAllowJavaEncodings()Is True if Java encoding names are allowed in the XML document
int getColumnNumber()Gives the column number of the current position in the document
protected boolean getContinueAfterFatalError()Is True if the parser will continue after a fatal error
org.apache.xerces.readers.XMLEntityHandler. EntityReader getEntityReader()Gets the entity reader
EntityResolver getEntityResolver()Gets the current entity resolver
ErrorHandler getErrorHandler()Gets the current error handler
boolean getFeature(java.lang.String featureId)Gets the state of a feature
java.lang.String[] getFeaturesRecognized()Gets a list of features recognized by this parser
int getLineNumber()Gets the current line number in the document
Locator getLocator()Gets the locator used by the parser
protected boolean getNamespaces()Is True if the parser preprocesses namespaces
java.lang.String[] getPropertiesRecognized()Gets the list of recognized properties for the parser
java.lang.Object getProperty(java.lang. String propertyId)Gets the value of a property
java.lang.String getPublicId()Gets the public ID of the InputSource
protected org.apache.xerces.validators.schema. XSchemaValidator getSchemaValidator()Gets the current XML schema validator
java.lang.String getSystemId()Gets the system ID of the InputSource
protected boolean getValidation()Is True if validation is turned on.
protected boolean getValidationDynamic()Is True if validation is determined based on whether a document contains a grammar
protected boolean getValidationWarnOn DuplicateAttdef()Is True if an error is created when an attribute is redefined in the grammar
protected boolean getValidationWarnOn UndeclaredElemdef()Is True if the parser creates an error when an undeclared element is referenced
abstract void ignorableWhitespace(char[] ch, int start, int length)Serves as a callback for ignorable whitespace
abstract void ignorableWhitespace(int data)Serves as a callback for ignorable whitespace based on string pools
abstract void internalEntityDecl (int entityName, int entityValue)Serves as a callback for an internal general entity declaration
abstract void internalPEDecl(int entityName, int entityValue)Serves as a callback for an internal parameter entity declaration
abstract void internalSubset (int internalSubset)Supports DOM Level 2 internalSubsets
boolean isFeatureRecognized(java.lang. String featureId)Is True if the given feature is recognized
boolean isPropertyRecognized(java.lang. String propertyId)Is True if the given property is recognized
abstract void notationDecl(int notationName, int publicId, int systemId)Serves as a callback for a notation declaration
void parse(InputSource source)Parses the given input source
void parse(java.lang.String systemId)Parses the input source given by a system identifier
boolean parseSome()Supports application-driven parsing
boolean parseSomeSetup(InputSource source)Sets up application-driven parsing
void processCharacters(char[] chars, int offset, int length)Processes character data given a character array
void processCharacters(int data)Processes character data
abstract void processingInstruction (int target, int data)Serves as a callback for processing instructions
void processingInstructionInDTD (int target, int data)Serves as a callback for processing instructions in a DTD
void processWhitespace(char[] chars, int offset, int length)Processes whitespace
void processWhitespace(int data)Processes whitespace based on string pools
void reportError(Locator locator, java.lang.String errorDomain, int majorCode, int minorCode, java.lang.Object[] args, int errorType)Reports errors
void reset()Resets the parser so that it can be reused
protected void resetOrCopy()Resets or copies the parser
int scanAttributeName(org.apache.xerces. readers.XMLEntityHandler.EntityReader entityReader, int elementType)Scans an attribute name
int scanAttValue(int elementType, int attrName)Scans an attribute value
void scanDoctypeDecl(boolean standalone)Scans a doctype declaration
int scanElementType(org.apache.xerces.readers.XMLEntityHandler.EntityReader entityReader, char fastchar)Scans an element type
boolean scanExpectedElementType(org.apache. xerces.readers.XMLEntityHandler.EntityReader entityReader, char fastchar)Scans an expected element type
protected void setAllowJavaEncodings(boolean allow)Supports the use of Java encoding names
protected void setContinueAfterFatalError(boolean continueAfterFatalError)Lets the parser continue after fatal errors
void setEntityResolver(EntityResolver resolver)Specifies the resolver (resolves external entities)
void setErrorHandler(ErrorHandler handler)Sets the error handler
void setFeature(java.lang.String featureId, boolean state)Sets the state of a feature
void setLocale(java.util.Locale locale)Sets the locale
void setLocator(Locator locator)Sets the locator
protected void setNamespaces(boolean process)Specifies whether the parser preprocesses namespaces
void setProperty(java.lang.String propertyId, java.lang.Object value)Sets the value of a property
void setReaderFactory(org.apache.xerces.readers.XMLEntityReaderFactory readerFactory)Sets the reader factory
protected void setSendCharDataAsCharArray (boolean flag)Sets character data processing preferences
void setValidating(boolean flag)Indicates to the parser that you are validating
protected void setValidation(boolean validate)Specifies whether the parser validates
protected void setValidationDynamic (boolean dynamic)Lets the parser validate a document only if it contains a grammar
protected void setValidationWarnOn DuplicateAttdef(boolean warn)Specifies whether an error is created when attributes are redefined in the grammar
protected void setValidationWarnOn UndeclaredElemdef(boolean warn)Specifies whether the parser causes an error when an element's content model references an element by name that is not declared
abstract void startCDATA()Serves as a callback for the start of the CDATA section
abstract void startDocument(int version, int encoding, int standAlone)Serves as a callback for the start of the document
abstract void startDTD(int rootElementType, int publicId, int systemId)Serves as a callback for the start of the DTD
abstract void startElement(int elementType, XMLAttrList attrList, int attrListHandle)Serves as a callback for the start of an element
boolean startEntityDecl(boolean isPE, int entityName)Serves as a callback for the start of an entity declaration
abstract void startEntityReference (int entityName, int entityType, int entityContext)Serves as a callback for the start of an entity reference
abstract void startNamespaceDeclScope (int prefix, int uri)Serves as a callback for the start of a namespace declaration scope
boolean startReadingFromDocument (InputSource source)Starts reading from a document
boolean startReadingFromEntity (int entityName, int readerDepth, int context)Starts reading from an external entity
void startReadingFromExternalSubset (java.lang.String publicId, java.lang. String systemId, int readerDepth)Starts reading from an external DTD subset
void stopReadingFromExternalSubset()Stops reading from an external DTD subset
abstract void unparsedEntityDecl (int entityName, int publicId, int systemId, int notationName)Serves as a callback for unparsed entity declarations
boolean validEncName(java.lang. String encoding)Is True if the given encoding is valid
boolean validVersionNum(java.lang. String version)Is True if the given version is valid

We have a SAXParser object now, and we need to register the SAXHandler object we created with the SAXParser object so the methods of the SAXHandler object are called when the parser starts the document, finds a new element, and so forth. SAX parsers call quite a number of methods, such as those for elements, processing instructions, declarations in DTDs, and so on. The methods a SAX parser calls to inform you that a new item has been found in the document are called callback methods, and you must register those methods with the SAX parser.

Four core SAX interfaces support the various callback methods:

  • EntityResolver implements customized handling for external entities.

  • DTDHandler handles DTD events.

  • ContentHandler handles the content of a document, such as elements and processing instructions.

  • ErrorHandler handles errors that occur while parsing.

There are many callback methods in these interfaces; if you want to use an interface, you have to implement all those methods. The XML for Java package makes it easier for you by creating a class called DefaultHandler, which has default implementations for all the required callback methods. The constructor of the DefaultHandler class is DefaultHandler(); its methods are listed in Table 12.3.

Table 12.3. DefaultHandler Methods
MethodDescription
void characters(char[] ch, int start, int length)Callback for character data inside an element
void endDocument()Callback for the end of the document
void endElement(java.lang.String uri, java.lang.String localName, java.lang.String rawName)Callback for the end of an element
void endPrefixMapping(java.lang.String prefix)Callback for the end of a namespace mapping
void error(SAXParseException e)Callback for a recoverable parser error
void fatalError(SAXParseException e)Callback for a fatal XML parsing error
void ignorableWhitespace(char[] ch, int start, int length)Callback for ignorable whitespace in element content
void notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)Callback for a notation declaration
void processingInstruction(java.lang.String target, java.lang.String data)Callback for a processing instruction
InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)Callback for an external entity
void setDocumentLocator(Locator locator)Sets a Locator object for document events
void skippedEntity(java.lang.String name)Callback for a skipped entity
void startDocument()Callback for the beginning of the document
void startElement(java.lang.String uri, java.lang.String localName, java.lang.String rawName, Attributes attributes)Callback for the start of an element
void startPrefixMapping(java.lang.String prefix, java.lang.String uri)Callback for the start of a namespace mapping
void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName)Callback for an unparsed entity declaration
void warning(SAXParseException e)Callback for parser warnings

If you base your program on the DefaultHandler interface, you need to implement only the callback methods you're interested in, so I'll derive the main class of this program, FirstParserSAX, on the DefaultHandler interface, which you do with the extends keyword:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    public static void main(String[] args)
    {
        FirstParserSAX SAXHandler = new FirstParserSAX();

        SAXParser parser = new SAXParser();
            .
            .
            .
    }
}

Now we're ready to register the FirstParserSAX class with the SAX parser. In this case, I'm not going to worry about handling DTD events or resolving external entities; I'll just handle the document's content and any errors with the setContentHandler and setErrorHandler methods:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    public static void main(String[] args)
    {
        FirstParserSAX SAXHandler = new FirstParserSAX();

        SAXParser parser = new SAXParser();
        parser.setContentHandler(SAXHandler);
        parser.setErrorHandler(SAXHandler);
            .
            .
            .
    }
}

This registers the SAXHandler object so that it will receive SAX content and error events. I'll add the methods that will be called after finishing the main method.

To actually parse the XML document, you use the parse method of the parser object. I'll let the user specify the name of the document to parse on the command by parsing args[0]. (Note that you don't need to pass the name of a local file to the parse method—you can pass the URL of a document on the Internet, and the parse method will retrieve that document.) The parse method can throw Java exceptions, which means that you have to enclose it in a try block, which has a subsequent catch block:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    public static void main(String[] args)
    {
        try {
            FirstParserSAX SAXHandler = new FirstParserSAX();

            SAXParser parser = new SAXParser();
            parser.setContentHandler(SAXHandler);
            parser.setErrorHandler(SAXHandler);
            parser.parse(args[0]);
        }
        catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }
}

That completes the main method, so I'll implement the methods that are called when the SAX parser parses the XML document. In this case, the goal is to determine how many <CUSTOMER> elements the document has, so I implement the startElement method like this:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    public void startElement(String uri, String localName, String rawName,
        Attributes attributes)
    {
    .
    .
    .
    }
    
    public static void main(String[] args)
    {
        try {
            FirstParserSAX SAXHandler = new FirstParserSAX();

            SAXParser parser = new SAXParser();
            parser.setContentHandler(SAXHandler);
            parser.setErrorHandler(SAXHandler);
            parser.parse(args[0]);
        }
        catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }
}

The startElement method is called each time the SAX parser sees the start of an element, and the endElement method is called when the SAX parser sees the end of an element.

Note that two element names are passed to the startElement method: localName and rawName. You use the localName argument with namespace processing; this argument holds the name of the element without any namespace prefix. The rawName argument holds the full, qualified name of the element, including any namespace prefix.

We're just going to count the number of <CUSTOMER> elements, so I'll take a look at the element's rawName argument. If that argument equals "CUSTOMER", I'll increment a variable named customerCount:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    int customerCount = 0;

    public void startElement(String uri, String localName, String rawName,
        Attributes attributes)
    {
        if (rawName.equals("CUSTOMER")) {
            customerCount++;
        }
    }

    public static void main(String[] args)
    {
        try {
            FirstParserSAX SAXHandler = new FirstParserSAX();

            SAXParser parser = new SAXParser();
            parser.setContentHandler(SAXHandler);
            parser.setErrorHandler(SAXHandler);
            parser.parse(args[0]);
        }
        catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }
}

How do you know when you've reached the end of the document and there are no more <CUSTOMER> elements to count? You use the endDocument method, which is called when the end of the document is reached. I'll display the number of tallied <CUSTOMER> elements in that method:

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.xerces.parsers.SAXParser;

public class FirstParserSAX extends DefaultHandler
{
    int customerCount = 0;

    public void startElement(String uri, String localName, String rawName,
        Attributes attributes)
    {
        if (rawName.equals("CUSTOMER")) {
            customerCount++;
        }
    }

    public void endDocument()
    {
        System.out.println("The document has "
        + customerCount + " <CUSTOMER> elements.");
    }

    public static void main(String[] args)
    {
        try {
            FirstParserSAX SAXHandler = new FirstParserSAX();

            SAXParser parser = new SAXParser();
            parser.setContentHandler(SAXHandler);
            parser.setErrorHandler(SAXHandler);
            parser.parse(args[0]);
        }
        catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }
}

You can compile and run this program like this:

%java FirstParserSAX customer.xml
The document has 3 <CUSTOMER> elements.

And that's all it takes to get started with SAX.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset