This first example will show how to work with SAX. In this case, I'll use SAX to count the number of <CUSTOMER> elements in customer.xml, just as the first example in the previous chapter did. Here's customer.xml:
<?xml version = "1.0" standalone="yes"?> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>$1.25</PRICE> </ITEM> <ITEM> <PRODUCT>Oranges</PRODUCT> <NUMBER>24</NUMBER> <PRICE>$4.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Jones</LAST_NAME> <FIRST_NAME>Polly</FIRST_NAME> </NAME> <DATE>October 20, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Bread</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$14.95</PRICE> </ITEM> <ITEM> <PRODUCT>Apples</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$1.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Weber</LAST_NAME> <FIRST_NAME>Bill</FIRST_NAME> </NAME> <DATE>October 25, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$2.95</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$11.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT>
Here, I'll base the new program on a new class named FirstParserSAX. We'll need an object of that class to pass to the SAX parser so that it can call the methods in that object when it encounters elements, the start of the document, the end of the document, and so on. I begin by creating an object of the FirstParserSAX class named SAXHandler:
import org.xml.sax.*; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); . . . } }
Next, I create the actual SAX parser that we'll work with. This parser is an object of the org.apache.xerces.parsers.SAXParser class (just like the DOM parser objects we worked with in the previous chapter were objects of the org.apache.xerces.parsers.DOMParser class). To use the SAXParser class, I import that class and the supporting classes in the org.xml.sax package, and I'm free to create a new SAX parser named parser:
import org.xml.sax.*; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); . . . } }
The SAXParser class is derived from the XMLParser class, which in turn is based on the java.lang.Object class:
java.lang.Object | +--org.apache.xerces.framework.XMLParser | +--org.apache.xerces.parsers.SAXParser
The constructor of the SAXParser class is SAXParser(); the methods of the SAXParser class are listed in Table 12.1. The constructor of the XMLParser class is protectedXMLParser(); the methods of the XMLParser class are listed in Table 12.2.
We have a SAXParser object now, and we need to register the SAXHandler object we created with the SAXParser object so the methods of the SAXHandler object are called when the parser starts the document, finds a new element, and so forth. SAX parsers call quite a number of methods, such as those for elements, processing instructions, declarations in DTDs, and so on. The methods a SAX parser calls to inform you that a new item has been found in the document are called callback methods, and you must register those methods with the SAX parser.
Four core SAX interfaces support the various callback methods:
EntityResolver implements customized handling for external entities.
ContentHandler handles the content of a document, such as elements and processing instructions.
There are many callback methods in these interfaces; if you want to use an interface, you have to implement all those methods. The XML for Java package makes it easier for you by creating a class called DefaultHandler, which has default implementations for all the required callback methods. The constructor of the DefaultHandler class is DefaultHandler(); its methods are listed in Table 12.3.
If you base your program on the DefaultHandler interface, you need to implement only the callback methods you're interested in, so I'll derive the main class of this program, FirstParserSAX, on the DefaultHandler interface, which you do with the extends keyword:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); . . . } }
Now we're ready to register the FirstParserSAX class with the SAX parser. In this case, I'm not going to worry about handling DTD events or resolving external entities; I'll just handle the document's content and any errors with the setContentHandler and setErrorHandler methods:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); . . . } }
This registers the SAXHandler object so that it will receive SAX content and error events. I'll add the methods that will be called after finishing the main method.
To actually parse the XML document, you use the parse method of the parser object. I'll let the user specify the name of the document to parse on the command by parsing args[0]. (Note that you don't need to pass the name of a local file to the parse method—you can pass the URL of a document on the Internet, and the parse method will retrieve that document.) The parse method can throw Java exceptions, which means that you have to enclose it in a try block, which has a subsequent catch block:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
That completes the main method, so I'll implement the methods that are called when the SAX parser parses the XML document. In this case, the goal is to determine how many <CUSTOMER> elements the document has, so I implement the startElement method like this:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public void startElement(String uri, String localName, String rawName, Attributes attributes) { . . . } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
The startElement method is called each time the SAX parser sees the start of an element, and the endElement method is called when the SAX parser sees the end of an element.
Note that two element names are passed to the startElement method: localName and rawName. You use the localName argument with namespace processing; this argument holds the name of the element without any namespace prefix. The rawName argument holds the full, qualified name of the element, including any namespace prefix.
We're just going to count the number of <CUSTOMER> elements, so I'll take a look at the element's rawName argument. If that argument equals "CUSTOMER", I'll increment a variable named customerCount:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { int customerCount = 0; public void startElement(String uri, String localName, String rawName, Attributes attributes) { if (rawName.equals("CUSTOMER")) { customerCount++; } } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
How do you know when you've reached the end of the document and there are no more <CUSTOMER> elements to count? You use the endDocument method, which is called when the end of the document is reached. I'll display the number of tallied <CUSTOMER> elements in that method:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { int customerCount = 0; public void startElement(String uri, String localName, String rawName, Attributes attributes) { if (rawName.equals("CUSTOMER")) { customerCount++; } } public void endDocument() { System.out.println("The document has " + customerCount + " <CUSTOMER> elements."); } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
You can compile and run this program like this:
%java FirstParserSAX customer.xml The document has 3 <CUSTOMER> elements.
And that's all it takes to get started with SAX.