XmlReader

The .NET Framework does not provide a SAX parser implementation. While the SAX API is the de facto standard for event-driven XML processing in Java, Microsoft has chosen a fundamentally different approach for .NET.

As a SAX parser processes XML input, application callbacks are used to signal events such as the start or end of an XML element; this approach is considered a push model.

The .NET approach, based on the abstract System.Xml.XmlReader class, provides a pull model, wherein the application invokes XmlReader members to control how and when the parser progresses through the XML document. This is analogous to a forward-only cursor that provides read-only access to the XML source. As with SAX, XmlReader is a noncaching parser.

The .NET Framework provides three concrete implementations of the XmlReader class: XmlTextReader, XmlValidatingReader, and XmlNodeReader. All classes are members of the System.Xml namespace.

XmlTextReader

The XmlTextReader class provides the most direct .NET alternative to a Java nonvalidating SAXParser. The XmlTextReader ensures that an XML document is well formed but will not perform validation against a DTD or an XML schema.

The XmlTextReader is a concrete implementation of the abstract XmlReader class but also provides a number of nonoverridden members, which we highlight as they are discussed.

Opening an XML Source

The XmlTextReader class provides a set of overloaded constructors offering flexibility for specifying the source of the XML document to be parsed. For example, the following statement creates a new XmlTextReader using the SomeXmlFile.xml file located in the assembly directory as a source:

XmlTextReader xmlReader = new XmlTextReader("SomeXmlFile.xml");

Alternatively, if the XML data were contained in a String variable named SomeXml, we could use a StringReader as the XmlTextReader source, as in the following example:

XmlTextReader xmlReader = new XmlTextReader(new StringReader(SomeXml));

The principal XmlTextReader constructors are summarized in Table 11-3.

Table 11-3. The Principal XmlTextReader Constructors

Constructor

Comment

XmlTextReader(Stream)

Creates an XmlTextReader pulling XML from a System.IO.Stream derivative such as FileStream, MemoryStream, or NetworkStream. Streams are discussed in Chapter 10.

XmlTextReader(String)

Creates an XmlTextReader pulling XML from a file with the specified URL.

XmlTextReader(TextReader)

Creates an XmlTextReader pulling XML from a System.IO.TextReader such as StreamReader or StringReader. Readers are discussed in Chapter 10.

Following creation, the XmlTextReader cursor is positioned before the first XML node in the source.

XmlTextReader Properties

The XmlTextReader class exposes properties that both control the behavior of the reader and give the programmer access to the state of the reader; these properties are discussed in the following sections.

Reader state

The XmlTextReader.ReadState property provides read-only access to the current state of the XmlTextReader. Upon creation, the XmlTextReader has a state of Initial. The state changes to Interactive once read operations are performed. The XmlTextReader will maintain an Interactive state until the end of the input file is reached or an error occurs. The ReadState property returns one of the following values from the System.Xml.ReadState enumeration, listed in Table 11-4.

Table 11-4. The System.Xml.ReadState Enumeration

Value

Comment

Closed

The XmlTextReader has been closed using the Close method.

EndOfFile

The end of the input source has been reached. This state can also be tested using the XmlTextReader.EOF property.

Error

An error has occurred that prevents further read operations.

Initial

The XmlTextReader has been created, but no read operations have been called.

Interactive

Read operations have been called at least once, and further read operations can be attempted.

If a source stream contains more than one XML document, the ResetState method must be used to reinitialize the XmlTextReader prior to parsing the second and subsequent documents; the ResetState method sets the ReadState property to ReadState.Initialized.

Controlling parsing behavior

The XmlTextReader has a number of properties that control the way XML files are parsed. Table 11-5 summarizes these properties.

Table 11-5. The XmlTextReader Properties

Property

Comments

Namespaces

Controls whether the XmlTextReader supports namespaces in accordance with the W3C "Namespaces in XML" recommendation. The default value is true.

WhitespaceHandling

Controls how the XmlTextReader handles white space. The property must be set to a value of the System.Xml.WhitespaceHandling enumeration. Valid values are

 
  • All—returns Whitespace and SignificantWhitespace nodes.

  • None—returns no Whitespace or SignificantWhitespace nodes.

  • Significant—returns SignificantWhitespace nodes only.

 

The Whitespace and SignificantWhitespace node types are described later in the Working with XML Nodes section. The default value is All. Not inherited from XmlReader.

Normalization

Controls whether the XmlTextReader normalizes white space and attribute values in accordance with the "Attribute-Value Normalization" section of the W3C XML 1.0 specification.

 

The default value is false.

 

Not inherited from XmlReader.

XmlResolver

Controls the System.Xml.XmlResolver to use for resolving DTD references. By default, an instance of the System.Xml.XmlUrlResolver is used.

 

Not inherited from XmlReader.

The Namespaces property must be set before the first read operation (when the ReadState property is ReadState.Initial), or an InvalidOperationException will be thrown; the other properties can be set at any time while the XmlTextReader is not in a ReadState.Closed state and will affect future read operations.

Working with XML Nodes

The XML source represents a hierarchy of nodes that the XmlTextReader retrieves sequentially. Progress through the XML is analogous to the use of a cursor that moves through the XML nodes. The node currently under the cursor is the current node.

The XmlTextReader exposes information about the current node through the properties of the XmlTextReader instance, although not all properties apply to all node types.

Node types

As the XmlTextReader reads a node, it identifies the node type. Each node type is assigned a value from the System.Xml.XmlNodeType enumeration. The XmlTextReader.NodeType property returns the type of the current node.

The node types include those defined in the W3C "DOM Level 1 Core" specification and five nonstandard extension types added by Microsoft. Node type values that can be returned by XmlTextReader.NodeType include the following—those that are not defined in DOM are italicized: Attribute, CDATA, Comment, DocumentType, Element, EndElement, EntityReference, None, ProcessingInstructions, SignificantWhitespace, Text, Whitespace, and XmlDeclaration. The additional node types defined by Microsoft are summarized in Table 11-6.

Table 11-6. The Microsoft-Specific Node Types

XmlNodeType

Description

None

Indicates that there is no current node. Either no read operations have been executed or the end of the XML input has been reached.

EndElement

Represents the end tag of an XML element—for example, </book>.

SignificantWhitespace

Represents white space between markup in a mixed content mode or white space within an xml:space= ‘preserve’ scope.

Whitespace

Represents white space in the content of an element.

XmlDeclaration

Represents the declaration node <?xml version="1.0"...>.

Node names

The Name and LocalName properties of the XmlTextReader return names of the current node. The Name property returns the qualified node name, including any namespace prefix. The LocalName property returns the node name with any namespace prefix stripped off. The name returned by the Name and LocalName properties depends on the current node type, summarized by the following list:

  • Attribute. The name of the attribute

  • DocumentType. The document type name

  • Element. The tag name

  • EntityReference. The entity reference name

  • ProcessingInstruction. The processing instruction target

  • XmlDeclaration. The string literal xml

  • Other node typesString.Empty

The XmlTextReader.Prefix property returns the namespace prefix of the current node, or String.Empty if it doesn’t have one.

Node values and contents

The XmlTextReader.Value property returns the text value of the current node. The value of a node depends on the node type and is summarized in the following list:

  • Attribute. The value of the attribute

  • CDATA. The content of the CDATA section

  • Comment. The content of the comment

  • DocumentType. The internal subset

  • SignificantWhitespace. The white space within an xml:space=‘preserve’ scope

  • Text. The content of the text node

  • Whitespace. The white space between markup

  • ProcessingInstruction. The entire content excluding the target

  • XmlDeclaration. The content of the declaration

  • Other node typesString.Empty

The XmlTextReader.HasValue property returns true if the current node is a type that returns a value; otherwise, it returns false.

Other node properties

Other information about the current node available through XmlTextReader properties is summarized in Table 11-7.

Table 11-7. Other Node Properties Available Through XmlTextReader

Property

Comments

AttributeCount

Gets the number of attributes on the current node.

 

Valid node types: Element, DocumentType, XmlDeclaration.

BaseURI

Gets a String containing the base Uniform Resource Identifier (URI) of the current node.

 

Valid node types: All.

CanResolveEntity

Always returns false for an XmlTextReader. See the Unimplemented Members section later in this chapter for details.

Depth

Gets the depth of the current node in the XML source.

 

Valid node types: All.

HasAttributes

Returns true if the current node has attributes. Will always return false for element types other than Element, DocumentType, and XmlDeclaration.

IsEmptyElement

Returns true if the current node is an empty Element type ending in /> (for example: <SomeElement/>). For all other node types and nonempty Element nodes, IsEmptyElement returns false.

LineNumber

Gets the current line number of the XML source. Line numbers begin at 1.

 

Not inherited from XmlReader; provides implementation of the System.Xml.IXmlLineInfo.LineNumber interface member.

 

Valid node types: All.

LinePosition

Gets the current line position of the XML source. Line positions begin at 1.

 

Not inherited from XmlReader; provides implementation of the System.Xml.IXmlLineInfo.LinePosition interface member.

 

Valid node types: All.

NamespaceURI

Gets the namespace URI of the current node.

 

Valid node types: Element and Attribute.

QuoteChar

Gets the quotation mark character used to enclose the value of an Attribute node. For nonattribute nodes, QuoteChar always returns a double quotation mark (").

Read operations

Operations that change the location of the cursor are collectively referred to as read operations. All read operations move the cursor relative to the current node. Note that while attributes are nodes, they are not returned as part of the normal node stream and never become the current node using the read operations discussed in this section; accessing attribute nodes is covered in the following section.

The simplest cursor operation is the XmlTextReader.Read method, which attempts to move the cursor to the next node in the XML source and returns true if successful. If there are no further nodes, Read returns false. The following code fragment visits every node in an XmlTextReader and displays the node name on the console:

XmlTextReader rdr = new XmlTextReader("MyXmlFile.xml");
while (rdr.Read()) {
    System.Console.WriteLine("Inspecting node : {0}", rdr.Name);
}

Using the Read method is the simplest way to process the nodes in an XML document but is often not the desired behavior because all nodes, including noncontent and EndElement nodes, are returned.

The MoveToContent method determines whether the current node is a content node. Content nodes include the following node types: Text, CDATA, Element, EndElement, EntityReference, and EndEntity. If the current node is a noncontent node, the cursor skips over all nodes until it reaches a content node or the end of the XML source. The MoveToContent method returns the XmlNodeType of the new current node (XmlNodeType.None if the end of the input is reached).

The Skip method causes the cursor to be moved to the next sibling of the current node; all child nodes will be ignored. For nodes with no children, the Skip method is equivalent to calling Read.

The IsStartElement method calls MoveToContent and returns true if the current node is a start tag or an empty element. Overloaded versions of the IsStartElement method support the provision of a name or local name and a namespace URI; if names are specified, the method will return true if the new current node name matches the specified name. The following code fragment demonstrates the use of IsStartElement to display the name of each element start tag:

XmlTextReader rdr = new XmlTextReader("MyXmlFile.xml");
while (rdr.Read()) {
    if (rdr.IsStartElement()) {
        System.Console.WriteLine("Inspecting node : {0}", rdr.Name);
    }
}

The ReadStartElement method calls IsStartElement followed by the Read method; if the result of the initial call to IsStartElement is false, an XmlException is thrown. ReadStartElement provides the same set of overloads as IsStartElement, allowing an element name to be specified.

The ReadEndElement method checks that the current node is an end tag and then advances the cursor to the next node; an XmlException is thrown if the current node is not an end tag.

The XmlTextReader also includes a number of methods to return content from the current node and its descendants; these are summarized in Table 11-8.

Table 11-8. XmlTextReader Methods That Return Content from the Current Node

Method

Comments

ReadInnerXml()

Returns a String containing the raw content (including markup) of the current node. The start and end tags are excluded.

ReadOuterXml()

The same as ReadInnerXml except that the start and end tags are included.

ReadString()

Returns the contents of an element or text node as a String. Nonelement and text nodes return String.Empty. The cursor is not moved.

ReadChars()

Reads the text contents (including markup) of an element node into a specified char array a section at a time and returns the number of characters read. Subsequent calls to ReadChars continue reading from where the previous call finished. ReadChars returns 0 when no more content is available. Nonelement nodes always return 0. The cursor is not moved.

ReadBase64()

Like ReadChars but reads and decodes Base64-encoded content.

ReadBinHex()

Like ReadChars but reads and decodes BinHex-encoded content.

Accessing Attributes

Three types of node support attributes: Elements, XmlDeclarations, and DocumentType declarations. The XmlTextReader doesn’t treat attributes as normal nodes; attributes are always read as part of the containing node.

The XmlTextReader class offers two mechanisms to access the attributes of the current node; we discuss both approaches in the following sections.

Direct attribute value access

The attribute values of the current node can be accessed through the XmlTextReader class using both methods and indexers; the attribute is specified either by name or by index position. A URI can be specified for attributes contained in a namespace. The members used to access attribute values are summarized in Table 11-9.

Table 11-9. XmlTextReader Members Used to Access Attribute Values

Member

Comment

<XmlTextReader>[int]GetAttribute(int)

Indexer and method alternative that get the value of an attribute based on its index.

<XmlTextReader>[String]GetAttribute(String)

Indexer and method alternative that get the value of an attribute by name.

<XmlTextReader>[String, String]GetAttribute(String, String)

Indexer and method alternative that get the value of an attribute in a specific namespace by name.

Attribute node access

The XmlTextReader class provides support for accessing attributes as independent nodes, allowing attribute information to be accessed using the XmlTextReader class properties described earlier in this section.

If the current node has attributes, the XmlTextReader.MoveToFirstAttribute method returns true and moves the cursor to the first attribute of the current node; the first attribute becomes the current node. If the current node has no attributes, the method returns false and the cursor remains unmoved.

Once the cursor is positioned on an attribute node, calling MoveToNextAttribute will move it to the next attribute node. If another attribute exists, the method will return true; otherwise, the method returns false and the position of the cursor remains unchanged. If the current node is not an attribute, but a node with attributes, the MoveToNextAttribute method has the same effect as MoveToFirstAttribute.

The MoveToAttribute method provides three overloads for moving the cursor directly to a specific attribute node. These overloads are summarized in Table 11-10.

Table 11-10. The Overloaded Versions of the MoveToAttribute Method

Method

Comments

MoveToAttribute(int)

Moves the cursor to the attribute node at the specified index. If there is no attribute at the specified index, an ArgumentOutOfRangeException is thrown and the cursor is not moved.

MoveToAttribute(String)

Moves the cursor to the attribute node with the specified name. This method returns true if the named attribute exists; otherwise, it returns false and the cursor doesn’t move.

MoveToAttribute(String, String)

Same as MoveToAttribute(String) but also allows a namespace URI to be specified for the target attribute.

The MoveToElement method moves the cursor back to the node containing the attributes.

The following example demonstrates attribute access using both direct and node access. The example class parses an XML file loaded from the http://localhost/test.xml URL and determines whether any element type node contains an attribute named att1. If the att1 attribute is present, the program displays its value on the console; otherwise, the program displays a list of all attributes and values of the current element.

XML Input (assumed to be located at http://localhost/test.xml):

<?xml version='1.0'?>
<root>
    <node1 att1='abc' att2='def'/>
    <node2 att2='ghi' att3='jkl' att4='mno'/>
    <node3 att1='uvw' att2='xyz'/>
</root>

Example code:

using System;
using System.Xml;

public class xmltest {

    public static void Main () {

        String myFile = "http://localhost/test.xml";
        XmlTextReader rdr = new XmlTextReader(myFile);

        while (rdr.Read()) {
            if (rdr.NodeType == XmlNodeType.Element) {
                Console.WriteLine("Inspecting node : {0}",rdr.Name);


                if (rdr["att1"] != null){
                    Console.WriteLine("	att1 =  {0}",rdr["att1"]);
                } else {
                    while(rdr.MoveToNextAttribute()) {
                        Console.WriteLine("	{0} = {1}",
                            rdr.Name, rdr.Value);
                    }
                    rdr.MoveToElement();
                }
            }
        }
    }
}

Output:

Inspecting node : root
Inspecting node : node1
        att1 =  abc
Inspecting node : node2
        att2 = ghi
        att3 = jkl
        att4 = mno
Inspecting node : node3
        att1 =  uvw

Closing an XmlTextReader

The XmlTextReader.GetRemainder method returns a System.IO.TextReader containing the remaining XML from a partially parsed source. Following the GetRemainder call, the XmlTextReader.ReadState property is set to EOF.

Instances of XmlTextReader should be closed using the Close method. This releases any resources used while reading and sets the ReadState property to the value ReadState.Closed.

Unimplemented Members

Because XmlTextReader doesn’t validate XML, the IsDefault, CanResolveEntity, and ResolveEntity members inherited from XmlReader exhibit default behavior as described in Table 11-11.

Table 11-11. The Default Behavior of Unimplemented XmlReader Methods in XmlTextReader

Member

Comments

IsDefault

Always returns false. XmlTextReader doesn’t expand default attributes defined in schemas.

CanResolveEntity

Always returns false. XmlTextReader cannot resolve entity references.

ResolveEntity()

Throws a System.InvalidOperationException. XmlTextReader cannot resolve general entity references.

XmlValidatingReader

The XmlValidatingReader class is a concrete implementation of XmlReader that validates an XML source against one of the following:

  • Document type definitions as defined in the W3C Recommendation "Extensible Markup Language (XML) 1.0"

  • MSXML Schema specification for XML-Data Reduced (XDR) schemas

  • XML Schema as defined in the W3C Recommendations "XML Schema Part 0: Primer," "XML Schema Part 1: Structures," and "XML Schema Part 2: Datatypes," collectively referred to as XML Schema Definition (XSD)

The functionality of XmlValidatingReader is predominantly the same as XmlTextReader, described in the "XmlTextReader" section earlier in this chapter. However, XmlValidatingReader includes a number of new members and some members that operate differently than in XmlTextReader; these differences are the focus of this section.

Creating an XmlValidatingReader

The most commonly used XmlValidatingReader constructor takes an XmlReader instance as the source of XML. The following statements demonstrate the creation of an XmlValidatingReader from an XmlTextReader:

XmlTextReader rdr = new XmlTextReader("SomeXmlFile.xml");
XmlValidatingReader vRdr = new XmlValidatingReader(rdr);

Specifying a Validation Type

The XmlValidatingReader.ValidationType property gets and sets the type of validation the reader will perform. This property must be set before execution of the first read operation; otherwise, an InvalidOperationException will be thrown.

The ValidationType property must be set to a value from the ValidationType enumeration; Table 11-12 summarizes the available values.

Table 11-12. The ValidationType Enumeration

Value

Comments

Auto

Validates based on the DTD or schema information the parser finds. This is the default value.

DTD

Validates according to a DTD.

None

Performs no validation. The only benefit of XmlValidatingReader in this mode is that general entity references can be resolved and default attributes are reported.

Schema

Validates according to an XSD schema.

XDR

Validates according to an XDR schema.

Validation Events

If the ValidationType is set to Auto, DTD, Schema, or XDR and validation errors occur when parsing an XML document, an XmlSchemaException is thrown and parsing of the current node stops. Parsing cannot be resumed once an error has occurred.

Alternatively, the ValidationEventHandler member of XmlValidatingReader allows the programmer to specify a delegate that is called to handle validation errors, suppressing the exception that would be raised. The arguments of the delegate provide access to information about the severity of the validation error, the exception that would have occurred, and a textual message associated with the error.

Use of the ValidationEventHandler member allows the programmer to determine whether to resume or terminate the parser.

Cached Schemas

The read-only XmlValidatingReader.Schemas property can be used in conjunction with the XmlSchemaCollection class to cache XSD and XDR schemas in memory, saving the reader from having to reload schema files. However, XmlValidatingReader doesn’t automatically cache schemas; any caching must be explicitly performed by the programmer. Once cached, schemas cannot be removed from an XmlSchemaCollection.

The XmlValidatingReader maintains an XmlSchemaCollection that is accessed via the Schemas property; the most common way to add new schema files to the collection is by using the Add method. The XmlSchemaCollection class implements the ICollection and IEnumerable interfaces and provides indexer access to schemas based on a namespace URI.

An important feature of the XmlSchemaCollection is the ValidationEventHandler event; this member is unrelated to the ValidationEventHandler member of the XmlValidatingReader class. This event specifies a method called to handle errors that occur when validating a schema loaded into the collection. XmlSchemaCollection throws an XmlSchemaException if no event handler is specified.

The following example demonstrates the steps necessary to configure the XmlSchemaCollection validation event handler and to cache a schema.

using System;
using System.Xml;
using System.Xml.Schema;

public class schematest {
    public static void Main() {
        // Create the validating reader
        XmlTextReader rdr = new XmlTextReader("MyXmlDocument.xml");
        XmlValidatingReader valRdr = new XmlValidatingReader(rdr);

        // Get the schema collection from the validating reader
        XmlSchemaCollection sCol = valRdr.Schemas;

        // Set the validation event handler for the schema collection
        sCol.ValidationEventHandler +=
            new ValidationEventHandler(ValidationCallBack);

        // Cache a schema in the schema collection
        sCol.Add("urn:mynamespace","myschema.xsd");

    }

    // Create handler for validation events
    public static void ValidationCallBack(object sender,
        ValidationEventArgs args) {
            Console.WriteLine("Schema error : "
+ args.Exception.Message);
    }
}

Differences from XmlTextReader

As mentioned at the start of this section, XmlValidatingReader has a number of new members or members with behavior different from that of the members in XmlTextReader; these members are summarized in Table 11-13.

Table 11-13. Differences Between XmlValidatingReader and XmlTextReader

Member

Comments

Different

 

CanResolveEntity

Always returns true.

IsDefault

Returns true if the current node is an attribute whose value was generated from a default specified in a DTD or a schema.

LineNumber

While XmlValidatingReader implements the IXmlLineInfo interface, explicit interface implementation has been used to implement the LineNumber property. The XmlValidatingReader must be explicitly cast to an IXmlLineInfo type before LineNumber can be called.

LinePosition

Same as LineNumber.

ResolveEntity()

This method resolves the entity reference if the current node is an EntityReference.

New

 

Reader

Returns the XmlReader used to instantiate the XmlValidatingReader.

XmlNodeReader

The System.Xml.XmlNodeReader class is a concrete implementation of XmlReader that provides read-only, forward-only cursor style access to a Document Object Model (DOM) node or subtree.

The XmlNodeReader provides predominantly the same behavior and functionality as the XmlTextReader described in the "XmlTextReader" section earlier in this chapter. However, XmlNodeReader offers a single constructor with the following signature:

public XmlNodeReader(XmlNode node);

The node argument provides the root element of the XmlNodeReader. Given that XmlDocument derives from XmlNode, the XmlNodeReader can be used to navigate a partial or full DOM tree.

The following code fragment demonstrates the creation of an XmlNodeReader using an XmlDocument as the source. We then iterate through the nodes and display the names of any Element type nodes on the console.

XmlDocument doc = new XmlDocument();
doc.Load("SomeXmlFile.xml");
XmlNodeReader rdr = new XmlNodeReader(doc);

while (rdr.Read()) {
    if (rdr.NodeType == XmlNodeType.Element) {
        System.Console.WriteLine("Node name = {0}", rdr.LocalName);
    }
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset