System.Xml Document Support

The System.Xml namespace implements a variety of objects that support standards-based XML processing. The XML-specific standards facilitated by this namespace include XML 1.0, Document Type Definition (DTD) support, XML namespaces, XML schemas, XPath, XQuery, XSLT, DOM Level 1 and DOM Level 2 (Core implementations), as well as SOAP 1.1, SOAP 1.2, SOAP Contract Language, and SOAP Discovery. The System.Xml namespace exposes over 30 separate classes in order to facilitate this level of the XML standard's compliance.

To generate and navigate XML documents, there are two styles of access:

1. Stream-basedSystem.Xml exposes a variety of classes that read XML from and write XML to a stream. This approach tends to be a fast way to consume or generate an XML document, because it represents a set of serial reads or writes. The limitation of this approach is that it does not view the XML data as a document composed of tangible entities, such as nodes, elements, and attributes. An example of where a stream could be used is when receiving XML documents from a socket or a file.
2. Document Object Model (DOM)-basedSystem.Xml exposes a set of objects that access XML documents as data. The data is accessed using entities from the XML document tree (nodes, elements, and attributes). This style of XML generation and navigation is flexible but may not yield the same performance as stream-based XML generation and navigation. This is because loading a document into the DOM loads the entire file into memory. DOM is an excellent technology for editing and manipulating documents. For example, the functionality exposed by DOM could simplify merging your checking, savings, and brokerage accounts.

XML Stream-Style Parsers

Stream-based parsers read a block of XML in a forward-only manner, only keeping the current node in memory. When an XML document is parsed using a stream parser, the parser always points to the current node in the document. To provide you more insight into what these nodes are, look at the following small XML example and refer to Table 8.2, which provides the specifics.

Table 8.2 Additional Source Code Attributes Available

Element Node
XmlDeclaration <?xml version=”1.0” encoding=”utf-8”?>
XmlAttribute Version
XmlAttribute Encoding
XmlElement FilmOrder
XmlAttribute FilmId
XmlElement Name
XmlText Grease
XmlEndElement Name
XmlElement Quantity
XmlText 10
XmlEndElement Quantity
XmlWhitespace Nothing
XmlEndElement FilmOrder
<?xml version="1.0" encoding="utf-8"?>
<FilmOrder filmId="101">
  <Name>Grease</Name>
  <Quantity>10</Quantity>
</FilmOrder>

The following classes that access a stream of XML (read XML) and generate a stream of XML (write XML) are contained in the System.Xml namespace:

  • XmlWriter—This abstract class specifies a noncached, forward-only stream that writes an XML document (data and schema).
  • XmlReader—This abstract class specifies a noncached, forward-only stream that reads an XML document (data and schema).

The diagram of the classes associated with the XML stream-style parser refers to one other class, XslTransform. This class is found in the System.Xml.Xsl namespace and is not an XML stream-style parser. Rather, it is used in conjunction with XmlWriter and XmlReader. This class is covered in detail later.

The System.Xml namespace exposes a plethora of additional XML manipulation classes in addition to those shown in the architecture diagram. The classes shown in the diagram include the following:

  • XmlResolver—This abstract class resolves an external XML resource using a Uniform Resource Identifier (URI). XmlUrlResolver is an implementation of an XmlResolver.
  • XmlNameTable—This abstract class provides a fast means by which an XML parser can access element or attribute names.

Writing an XML Stream

An XML document can be created programmatically in .NET. One way to perform this task is by writing the individual components of an XML document (schema, attributes, elements, and so on) to an XML stream. Using a unidirectional write-stream means that each element and its attributes must be written in order—the idea is that data is always written at the end of the stream. To accomplish this, you use a writable XML stream class (a class derived from XmlWriter). Such a class ensures that the XML document you generate correctly implements the W3C Extensible Markup Language (XML) 1.0 specification and the namespaces in the XML specification.

Why is this necessary when you have XML serialization? You need to be very careful here to separate interface from implementation. XML serialization works for a specific class, such as the FilmOrder class used in the earlier samples. This class is a proprietary implementation and not the format in which data is exchanged. For this one specific case, the XML document generated when FilmOrder is serialized just so happens to be the XML format used when placing an order for some movies. You can use Source Code Style attributes to help it conform to a standard XML representation of a film order summary, but the eventual structure is tied to that class.

In a different application, if the software used to manage an entire movie distribution business wants to generate movie orders, then it must generate a document of the appropriate form. The movie distribution management software achieves this using the XmlWriter object.

Before reviewing the subtleties of XmlWriter, note that this class exposes over 40 methods and properties. The example in this section provides an overview that touches on a subset of these methods and properties. This subset enables the generation of an XML document that corresponds to a movie order.

The example, located in the FilmOrdersWriter project, builds the module that generates the XML document corresponding to a movie order. It uses an instance of XmlWriter, called FilmOrdersWriter, which is actually a file on disk. This means that the XML document generated is streamed to this file directly. Because the FilmOrdersWriter variable represents a file, you have to take a few actions against the file. For instance, you have to ensure the file is:

  • Created—The instance of XmlWriter, FilmOrdersWriter, is created by using the Create method, as well as by assigning all the properties of this object by using the XmlWriterSettings object.
  • Opened—The file the XML is streamed to, FilmOrdersProgrammatic.xml, is opened by passing the filename to the constructor associated with XmlWriter.
  • Generated—The process of generating the XML document is described in detail at the end of this section.
  • Closed—The file (the XML stream) is closed using the Close method of XmlWriter or by simply making use of the Using keyword, which ensures that the object is closed at the end of the Using statement.

Before you create the XmlWriter object, you first need to customize how the object operates by using the XmlWriterSettings object. This object, introduced in .NET 2.0, enables you to configure the behavior of the XmlWriter object before you instantiate it, as seen here:

Dim myXmlSettings As New XmlWriterSettings()
myXmlSettings.Indent = True
myXmlSettings.NewLineOnAttributes = True

You can specify a few settings for the XmlWriterSettings object that define how XML creation will be handled by the XmlWriter object.

Once the XmlWriterSettings object has been instantiated and assigned the values you deem necessary, the next steps are to invoke the XmlWriter object and make the association between the XmlWriterSettings object and the XmlWriter object.

The basic infrastructure for managing the file (the XML text stream) and applying the settings class is either

Dim FilmOrdersWriter As XmlWriter = _
   XmlWriter.Create("..FilmOrdersProgrammatic.xml", myXmlSettings)
FilmOrdersWriter.Close()

or the following, if you are utilizing the Using keyword, which is the recommended approach:

Using FilmOrdersWriter As XmlWriter = _
   XmlWriter.Create("..FilmOrdersProgrammatic.xml", myXmlSettings)
End Using

With the preliminaries completed, file created, and formatting configured, the process of writing the actual attributes and elements of your XML document can begin. The sequence of steps used to generate your XML document is as follows:

1. Write an XML comment using the WriteComment method. This comment describes from whence the concept for this XML document originated and generates the following code:
 <!-- Same as generated by serializing, FilmOrder -->
2. Begin writing the XML element, <FilmOrder>, by calling the WriteStartElement method. You can only begin writing this element, because its attributes and child elements must be written before the element can be ended with a corresponding </FilmOrder>. The XML generated by the WriteStartElement method is as follows:
 <FilmOrder>
3. Write the attributes associated with <FilmOrder> by calling the WriteAttributeString method twice, specifying a different attribute each time. The XML generated by calling the WriteAttributeString method twice adds to the <FilmOrder> XML element that is currently being written to the following:
 <FilmOrder FilmId="101" Quantity="10">
4. Using the WriteElementString method, write the child XML element <Title>. The XML generated by calling this method is as follows:
 <Title>Grease</Title>
5. Complete writing the <FilmOrder> parent XML element by calling the WriteEndElement method. The XML generated by calling this method is as follows:
 </FilmOrder>

The complete code for accomplishing this is shown here (code file: Main.vb):

Imports System.Xml
         
Module Main
    Sub Main()
        Dim myXmlSettings As New XmlWriterSettings
        myXmlSettings.Indent = True
        myXmlSettings.NewLineOnAttributes = True
        Using FilmOrdersWriter As XmlWriter =
            XmlWriter.Create("FilmOrdersProgrammatic.xml", myXmlSettings)
            FilmOrdersWriter.WriteComment(" Same as generated " &
               "by serializing, FilmOrder ")
            FilmOrdersWriter.WriteStartElement("FilmOrder")
            FilmOrdersWriter.WriteAttributeString("FilmId", "101")
            FilmOrdersWriter.WriteAttributeString("Quantity", "10")
            FilmOrdersWriter.WriteElementString("Title", "Grease")
            FilmOrdersWriter.WriteEndElement() ' End  FilmOrder
        End Using
    End Sub
End Module

Once this is run, you will find the XML file FilmOrdersProgrammatic.xml created in the same folder as where the application was executed from, which is most likely the bin directory. The content of this file is as follows:

<?xml version="1.0" encoding="utf-8"?>
<!-- Same as generated by serializing, FilmOrder -->
<FilmOrder
  FilmId="101"
  Quantity="10">
  <Title>Grease</Title>
</FilmOrder>

At a closer look, you should see that the XML document generated by this code is virtually identical to the one produced by the serialization example. Also, notice that in the previous XML document, the <Title> element is indented two characters and that each attribute is on a different line in the document. This formatting was achieved using the XmlWriterSettings class.

The sample application covers only a small portion of the methods and properties exposed by the XML stream-writing class, XmlWriter. Other methods implemented by this class manipulate the underlying file, such as the Flush method; and some methods allow XML text to be written directly to the stream, such as the WriteRaw method.

The XmlWriter class also exposes a variety of methods that write a specific type of XML data to the stream. These methods include WriteBinHex, WriteCData, WriteString, and WriteWhiteSpace.

You can now generate the same XML document in two different ways. You have used two different applications that took two different approaches to generating a document that represents a standardized movie order. The XML serialization approach uses the “shape” of the class to generate XML, whereas the XmlWriter allows you more flexibility in the output, at the expense of more effort.

However, there are even more ways to generate XML, depending on the circumstances. Using the previous scenario, you could receive a movie order from a store, and this order would have to be transformed from the XML format used by the supplier to your own order format.

Reading an XML Stream

In .NET, XML documents can be read from a stream as well. Data is traversed in the stream in order (first XML element, second XML element, and so on). This traversal is very quick because the data is processed in one direction, and features such as write and move backward in the traversal are not supported. At any given instance, only data at the current position in the stream can be accessed.

Before exploring how an XML stream can be read, you need to understand why it should be read in the first place. Returning to your movie supplier example, imagine that the application managing the movie orders can generate a variety of XML documents corresponding to current orders, preorders, and returns. All the documents (current orders, preorders, and returns) can be extracted in stream form and processed by a report-generating application. This application prints the orders for a given day, the preorders that are going to be due, and the returns that are coming back to the supplier. The report-generating application processes the data by reading in and parsing a stream of XML.

One class that can be used to read and parse such an XML stream is XmlReader. The .NET Framework includes more specific XML readers, such as XmlTextReader, that are derived from the XmlReader class. XmlTextReader provides the functionality to read XML from a file, a stream, or another XmlReader. This example, found in the FilmOrdersReader project, uses an XmlReader to read an XML document contained in a file. Reading XML from a file and writing it to a file is not the norm when it comes to XML processing, but a file is the simplest way to access XML data. This simplified access enables you to focus on XML-specific issues.

The first step in accessing a stream of XML data is to create an instance of the object that will open the stream. This is accomplished with the following code (code file: Main.vb):

Dim myXmlSettings As New XmlReaderSettings()
Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)

This code creates a new XmlReader, called readMovieInfo, using the specified filename and XmlReaderSettings instance. As with the XmlWriter, the XmlReader also has a settings class. You will use this class a little later.

The basic mechanism for traversing each stream is to move from node to node using the Read method. Node types in XML include element and white space. Numerous other node types are defined, but this example focuses on traversing XML elements and the white space that is used to make the elements more readable (carriage returns, line feeds, and indentation spaces). Once the stream is positioned at a node, the MoveToNextAttribute method can be called to read each attribute contained in an element. The MoveToNextAttribute method only traverses attributes for nodes that contain attributes (nodes of type element). You accomplish this basic node and attribute traversal using the following code (code file: Main.vb):

   While readMovieInfo.Read()
      ' Process node here.
      While readMovieInfo.MoveToNextAttribute()
         ' Process attribute here.
      End While
   End While

This code, which reads the contents of the XML stream, does not utilize any knowledge of the stream's contents. However, a great many applications know exactly how the stream they are going to traverse is structured. Such applications can use XmlReader in a more deliberate manner and not simply traverse the stream without foreknowledge. This would mean you could use the GetAttribute method as well as the various ReadContentAs and ReadElementContentAs methods to retrieve the contents by name, rather than just walking through the XML.

Once the example stream has been read, it can be cleaned up using the End Using call:

End Using

The complete code for the method that reads the data is shown here (code file: Main.vb):

Private Sub ReadMovieXml(ByVal fileName As String)
   Dim myXmlSettings As New XmlReaderSettings()
   Using readMovieInfo As XmlReader = XmlReader.Create(fileName, _
      myXmlSettings)
      While readMovieInfo.Read()
         ' Process node here.
         ShowXmlNode(readMovieInfo)
         While readMovieInfo.MoveToNextAttribute()
           ' Process attribute here.
            ShowXmlNode(readMovieInfo)
         End While
      End While
   End Using
End Sub

The ReadMovieXml method takes a string parameter that specifies the name of the XML file to be read. For each node encountered after a call to the Read method, ReadMovieXml calls the ShowXmlNode subroutine. Similarly, for each attribute traversed, the ShowXmlNode subroutine is called. The code for the following ShowXmlNode method (code file: Main.vb):

Private Sub ShowXmlNode(ByVal reader As XmlReader)
  If reader.Depth > 0 Then
     For depthCount As Integer = 1 To reader.Depth
        Console.Write(" ")
     Next
  End If
  If reader.NodeType = XmlNodeType.Whitespace Then
     Console.Out.WriteLine("Type: {0} ", reader.NodeType)
  ElseIf reader.NodeType = XmlNodeType.Text Then
     Console.Out.WriteLine("Type: {0}, Value: {1} ", _
                          reader.NodeType, _
                          reader.Value)
  Else
     Console.Out.WriteLine("Name: {0}, Type: {1}, " & _
                          "AttributeCount: {2}, Value: {3} ", _
                          reader.Name, _
                          reader.NodeType, _
                          reader.AttributeCount, _
                          reader.Value)
  End If
End Sub

This subroutine breaks down each node into its subentities:

  • Depth—This property of XmlReader determines the level at which a node resides in the XML document tree. To understand depth, consider the following XML document composed solely of elements:
 <A>
     <B></B>
     <C>
         <D></D>
     </C>
 </A>.
Element <A> is the root element, and when parsed would return a depth of 0. Elements <B> and <C> are contained in <A> and hence reflect a depth value of 1. Element <D> is contained in <C>. The Depth property value associated with <D> (depth of 2) should, therefore, be one more than the Depth property associated with <C> (depth of 1).
  • Type—The type of each node is determined using the NodeType property of XmlReader. The node returned is of enumeration type, XmlNodeType. Permissible node types include Attribute, Element, and Whitespace. (Numerous other node types can also be returned, including CDATA, Comment, Document, Entity, and DocumentType.)
  • Name—The name of each node is retrieved using the Name property of XmlReader. The name of the node could be an element name, such as <FilmOrder>, or an attribute name, such as FilmId.
  • Attribute Count—The number of attributes associated with a node is retrieved using the AttributeCount property of XmlReader NodeType.
  • Value—The value of a node is retrieved using the Value property of XmlReader. For example, the element node <Title> contains a value of Grease.

The subroutine ShowXmlNode is implemented as follows. Within the ShowXmlNode subroutine, each level of node depth adds two spaces to the output generated:

If reader.Depth > 0 Then
  For depthCount As Integer = 1 To reader.Depth
    Console.Write(" ")
  Next
End If

You add these spaces in order to create human-readable output (so you can easily determine the depth of each node displayed). For each type of node, ShowXmlNode displays the value of the NodeType property. The ShowXmlNode subroutine makes a distinction between nodes of type Whitespace and other types of nodes. The reason for this is simple: a node of type Whitespace does not contain a name or an attribute count. The value of such a node is any combination of white-space characters (space, tab, carriage return, and so on). Therefore, it doesn't make sense to display the properties if the NodeType is XmlNodeType.WhiteSpace. Nodes of type Text have no name associated with them, so for this type, subroutine ShowXmlNode displays only the properties NodeType and Value. For all other node types (including elements and attributes), the Name, AttributeCount, Value, and NodeType properties are displayed.

To finalize this module, add a Sub Main as follows:

Sub Main(ByVal args() As String)
   ReadMovieXml("MovieManage.xml")
End Sub

The MovieManage.xml file, used as input for the example, looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<MovieOrderDump>
  <FilmOrderList>
    <multiFilmOrders>
      <FilmOrder filmId="101">
        <name>Grease</name>
        <quantity>10</quantity>
      </FilmOrder>
      <FilmOrder filmId="102">
        <name>Lawrence of Arabia</name>
        <quantity>10</quantity>
      </FilmOrder>
      <FilmOrder filmId="103">
        <name>Star Wars</name>
        <quantity>10</quantity>
      </FilmOrder>
    </multiFilmOrders>
  </FilmOrderList>
  <PreOrder>
    <FilmOrder filmId="104">
      <name>Shrek III - Shrek Becomes a Programmer</name>
      <quantity>10</quantity>
    </FilmOrder>
  </PreOrder>
  <Returns>
    <FilmOrder filmId="103">
      <name>Star Wars</name>
      <quantity>2</quantity>
    </FilmOrder>
  </Returns>
</MovieOrderDump>

Running this module produces the following output (a partial display, as it would be rather lengthy):

Name: xml, Type: XmlDeclaration, AttributeCount: 2, Value: version="1.0"
encoding="utf-8"
 Name: version, Type: Attribute, AttributeCount: 2, Value: 1.0
 Name: encoding, Type: Attribute, AttributeCount: 2, Value: utf-8
Type: Whitespace
Name: MovieOrderDump, Type: Element, AttributeCount: 0, Value:
Type: Whitespace
 Name: FilmOrderList, Type: Element, AttributeCount: 0, Value:
 Type: Whitespace
  Name: multiFilmOrders, Type: Element, AttributeCount: 0, Value:
  Type: Whitespace
   Name: FilmOrder, Type: Element, AttributeCount: 1, Value:
    Name: filmId, Type: Attribute, AttributeCount: 1, Value: 101
    Type: Whitespace
    Name: name, Type: Element, AttributeCount: 0, Value:
     Type: Text, Value: Grease
    Name: name, Type: EndElement, AttributeCount: 0, Value:
    Type: Whitespace
    Name: quantity, Type: Element, AttributeCount: 0, Value:
     Type: Text, Value: 10
    Name: quantity, Type: EndElement, AttributeCount: 0, Value:
    Type: Whitespace
   Name: FilmOrder, Type: EndElement, AttributeCount: 0, Value:
   Type: Whitespace

This example managed to use three methods and five properties of XmlReader. The output generated was informative but far from practical. XmlReader exposes over 50 methods and properties, which means that you have only scratched the surface of this highly versatile class. The remainder of this section looks at the XmlReaderSettings class, introduces a more realistic use of XmlReader, and demonstrates how the classes of System.Xml handle errors.

The XmlReaderSettings Class

Just like the XmlWriter object, the XmlReader.Create method allows you to specify settings to be applied for instantiation of the object. This means that you can provide settings specifying how the XmlReader object behaves when it is reading whatever XML you might have for it. This includes settings for dealing with white space, schemas, and other common options. An example of using this settings class to modify the behavior of the XmlReader class is as follows:

Dim myXmlSettings As New XmlReaderSettings()
myXmlSettings.IgnoreWhitespace = True
myXmlSettings.IgnoreComments = True
Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)
   ' Use XmlReader object here.
End Using

In this case, the XmlReader object that is created ignores the white space that it encounters, as well as any of the XML comments. These settings, once established with the XmlReaderSettings object, are then associated with the XmlReader object through its Create method.

Traversing XML Using XmlReader

In cases where the format of the XML is known, the XmlReader can be used to parse the document in a more deliberate manner rather than hitting every node. In the previous section, you implemented a class that serialized arrays of movie orders. The next example, found in the FilmOrdersReader2 project, takes an XML document containing multiple XML documents of that type and traverses them. Each movie order is forwarded to the movie supplier via fax. The general process for traversing this document is outlined by the following pseudo code:

Read root element: <MovieOrderDump>
    Process each <FilmOrderList> element
        Read <multiFilmOrders> element
            Process each <FilmOrder>
                Send fax for each movie order here

The basic outline for the program's implementation is to open a file containing the XML document, parse, then traverse it from element to element as follows (code file: Main.vb):

Dim myXmlSettings As New XmlReaderSettings()
Using readMovieInfo As XmlReader = XmlReader.Create(fileName, myXmlSettings)
      readMovieInfo.Read()
      readMovieInfo.ReadStartElement("MovieOrderDump")
      Do While (True)
         '****************************************************
         '* Process FilmOrder elements here                  *
         '****************************************************
      Loop
      readMovieInfo.ReadEndElement() '  </MovieOrderDump>
End Using

The preceding code opened the file using the constructor of XmlReader, and the End Using statement takes care of shutting everything down for you. The code also introduced two methods of the XmlReader class:

1. ReadStartElement(String)—This verifies that the current node in the stream is an element and that the element's name matches the string passed to ReadStartElement. If the verification is successful, then the stream is advanced to the next element.
2. ReadEndElement()—This verifies that the current element is an end tag; and if the verification is successful, then the stream is advanced to the next element.

The application knows that an element, <MovieOrderDump>, will be found at a specific point in the document. The ReadStartElement method verifies this foreknowledge of the document format. After all the elements contained in element <MovieOrderDump> have been traversed, the stream should point to the end tag </MovieOrderDump>. The ReadEndElement method verifies this.

The code that traverses each element of type <FilmOrder> similarly uses the ReadStartElement and ReadEndElement methods to indicate the start and end of the <FilmOrder> and <multiFilmOrders> elements. The code that ultimately parses the list of movie orders and then faxes the movie supplier (using the FranticallyFaxTheMovieSupplier subroutine) is as follows (code file: Main.vb):

    Private Sub ReadMovieXml(ByVal fileName As String)
        Dim myXmlSettings As New XmlReaderSettings()
        Dim movieName As String
        Dim movieId As String
        Dim quantity As String
         
        Using readMovieInfo As XmlReader =
            XmlReader.Create(fileName, myXmlSettings)
            'position to first element
            readMovieInfo.Read()
            readMovieInfo.ReadStartElement("MovieOrderDump")
            Do While (True)
                readMovieInfo.ReadStartElement("FilmOrderList")
                readMovieInfo.ReadStartElement("multiFilmOrders")
         
                'for each order
                Do While (True)                    
                    readMovieInfo.MoveToContent()
                    movieId = readMovieInfo.GetAttribute("filmId")
                    readMovieInfo.ReadStartElement("FilmOrder")
         
                    movieName = readMovieInfo.ReadElementString()
                    quantity = readMovieInfo.ReadElementString()
                    readMovieInfo.ReadEndElement() ' clear </FilmOrder>
         
                    FranticallyFaxTheMovieSupplier(movieName, movieId, quantity)
         
                    ' Should read next FilmOrder node
                    ' else quits
                    readMovieInfo.Read()
                    If ("FilmOrder" <> readMovieInfo.Name) Then
                        Exit Do
                    End If
                Loop
                readMovieInfo.ReadEndElement() ' clear </multiFilmOrders>
                readMovieInfo.ReadEndElement() ' clear </FilmOrderList>
                ' Should read next FilmOrderList node
                ' else you quit
                readMovieInfo.Read() ' clear </MovieOrderDump>
                If ("FilmOrderList" <> readMovieInfo.Name) Then
                    Exit Do
                End If
            Loop
            readMovieInfo.ReadEndElement() '  </MovieOrderDump>
        End Using
    End Sub

The values are read from the XML file using the ReadElementString and GetAttribute methods. Notice that the call to GetAttribute is done before reading the FilmOrder element. This is because the ReadStartElement method advances the location for the next read to the next element in the XML file. The MoveToContent call before the call to GetAttribute ensures that the current read location is on the element, and not on white space.

While parsing the stream, it was known that an element named name existed and that this element contained the name of the movie. Rather than parse the start tag, get the value, and parse the end tag, it was easier to get the data using the ReadElementString method.

The intended output of this example is a fax, which is not implemented in order to focus on XML. The format of the document is still verified by XmlReader as it is parsed.

The XmlReader class also exposes properties that provide more insight into the data contained in the XML document and the state of parsing: IsEmptyElement, EOF, HasAttributes, and IsStartElement.

.NET CLR–compliant types are not 100 percent interchangeable with XML types. The .NET Framework includes methods in the XmlReader class to make the process of casting from one of these XML types to .NET types easier.

Using the ReadElementContentAs method, you can easily perform the necessary casting required, as seen here:

Dim username As String = _
   myXmlReader.ReadElementContentAs(GetType(String), Nothing)
Dim myDate As DateTime = _
   myXmlReader.ReadElementContentAs(GetType(DateTime), Nothing)

In addition to the general ReadElementContentAs method, there are specific ReadElementContentAsX methods for each of the common data types; and in addition to these methods, the raw XML associated with the document can also be retrieved, using ReadInnerXml and ReadOuterXml. Again, this only scratches the surface of the XmlReader class, a class quite rich in functionality.

Handling Exceptions

XML is text and could easily be read using mundane methods such as Read and ReadLine. A key feature of each class that reads and traverses XML is inherent support for error detection and handling. To demonstrate this, consider the following malformed XML document found in the file named Malformed.xml, also included in the FilmOrdersReader2 project:

<?xml version="1.0" encoding="IBM437" ?>
<FilmOrder FilmId="101", Qty="10">
   <Name>Grease</Name>
<FilmOrder>

This document may not immediately appear to be malformed. By wrapping a call to the method you developed (ReadMovieXml), you can see what type of exception is raised when XmlReader detects the malformed XML within this document as shown in Sub Main(). Comment out the line calling the MovieManage.xml file, and uncomment the line to try to open the malformed.xml file:

Try
    'ReadMovieXml("MovieManage.xml")
    ReadMovieXml("Malformed.xml")
Catch xmlEx As XmlException
    Console.Error.WriteLine("XML Error: " + xmlEx.ToString())
Catch ex As Exception
    Console.Error.WriteLine("Some other error: " + ex.ToString())
End Try

The methods and properties exposed by the XmlReader class raise exceptions of type System.Xml.XmlException. In fact, every class in the System.Xml namespace raises exceptions of type XmlException. Although this is a discussion of errors using an instance of type XmlReader, the concepts reviewed apply to all errors generated by classes found in the System.Xml namespace. The XmlException extends the basic Exception to include more information about the location of the error within the XML file.

The error displayed when subroutine ReadMovieXML processes Malformed.xml is as follows:

XML Error: System.Xml.XmlException: The ',’ character, hexadecimal value 0x2C,
 cannot begin a name. Line 2, position 49.

The preceding snippet indicates that a comma separates the attributes in element <FilmOrder FilmId=”101”, Qty=”10”>. This comma is invalid. Removing it and running the code again results in the following output:

XML Error: System.Xml.XmlException: This is an unexpected token. Expected
'EndElement'. Line 5, position 27.

Again, you can recognize the precise error. In this case, you do not have an end element, </FilmOrder>, but you do have an opening element, <FilmOrder>.

The properties provided by the XmlException class (such as LineNumber, LinePosition, and Message) provide a useful level of precision when tracking down errors. The XmlReader class also exposes a level of precision with respect to the parsing of the XML document. This precision is exposed by the XmlReader through properties such as LineNumber and LinePosition.

Document Object Model (DOM)

The Document Object Model (DOM) is a logical view of an XML file. Within the DOM, an XML document is contained in a class named XmlDocument. Each node within this document is accessible and managed using XmlNode. Nodes can also be accessed and managed using a class specifically designed to process a specific node's type (XmlElement, XmlAttribute, and so on). XML documents are extracted from XmlDocument using a variety of mechanisms exposed through such classes as XmlWriter, TextWriter, Stream, and a file (specified by a filename of type String). XML documents are consumed by an XmlDocument using a variety of load mechanisms exposed through the same classes.

A DOM-style parser differs from a stream-style parser with respect to movement. Using the DOM, the nodes can be traversed forward and backward; and nodes can be added to the document, removed from the document, and updated. However, this flexibility comes at a performance cost, since the entire document is read into memory. It is faster to read or write XML using a stream-style parser.

The DOM-specific classes exposed by System.Xml include the following:

  • XmlDocument—Corresponds to an entire XML document. A document is loaded using the Load or LoadXml methods. The Load method loads the XML from a file (the filename specified as type String), TextReader, or XmlReader. A document can be loaded using LoadXml in conjunction with a string containing the XML document. The Save method is used to save XML documents. The methods exposed by XmlDocument reflect the intricate manipulation of an XML document. For example, the following creation methods are implemented by this class: CreateAttribute, CreateCDataSection, CreateComment, CreateDocumentFragment, CreateDocumentType, CreateElement, CreateEntityReference, CreateNavigator, CreateNode, CreateProcessingInstruction, CreateSignificantWhitespace, CreateTextNode, CreateWhitespace, and CreateXmlDeclaration. The elements contained in the document can be retrieved. Other methods support the retrieving, importing, cloning, loading, and writing of nodes.
  • XmlNode—Corresponds to a node within the DOM tree. This is the base class for the other node type classes. A robust set of methods and properties is provided to create, delete, and replace nodes. The contents of a node can similarly be traversed in a variety of ways: FirstChild, LastChild, NextSibling, ParentNode, and PreviousSibling.
  • XmlElement—Corresponds to an element within the DOM tree. The functionality exposed by this class contains a variety of methods used to manipulate an element's attributes.
  • XmlAttribute—Corresponds to an attribute of an element (XmlElement) within the DOM tree. An attribute contains data and lists of subordinate data, so it is a less complicated object than an XmlNode or an XmlElement. An XmlAttribute can retrieve its owner document (property, OwnerDocument), retrieve its owner element (property, OwnerElement), retrieve its parent node (property, ParentNode), and retrieve its name (property, Name). The value of an XmlAttribute is available via a read-write property named Value. Given the diverse number of methods and properties exposed by XmlDocument, XmlNode, XmlElement, and XmlAttribute (and there are many more than those listed here), it's clear that any XML 1.0-or 1.1-compliant document can be generated and manipulated using these classes. In comparison to their XML stream counterparts, these classes offer more flexible movement within the XML document and through any editing of XML documents.

A similar comparison could be made between DOM and data serialized and deserialized using XML. Using serialization, the type of node (for example, attribute or element) and the node name are specified at compile time. There is no on-the-fly modification of the XML generated by the serialization process.

DOM Traversing XML

The first DOM example, located in the DomReading project, loads an XML document into an XmlDocument object using a string that contains the actual XML document. The example over the next few pages simply traverses each XML element (XmlNode) in the document (XmlDocument) and displays the data to the console. The data associated with this example is contained in a variable, rawData, which is initialized as follows:

Dim rawData  =
   <multiFilmOrders>
      <FilmOrder>
         <name>Grease</name>
         <filmId>101</filmId>
         <quantity>10</quantity>
      </FilmOrder>
      <FilmOrder>
         <name>Lawrence of Arabia</name>
         <filmId>102</filmId>
         <quantity>10</quantity>
      </FilmOrder>
   </multiFilmOrders>

The XML document in rawData is a portion of the XML hierarchy associated with a movie order. Notice the lack of quotation marks around the XML: this is an XML literal. XML literals allow you to insert a block of XML directly into your VB source code, and are covered a little later in this chapter. They can be written over a number of lines, and can be used wherever you might normally load an XML file.

The basic idea in processing this data is to traverse each <FilmOrder> element in order to display the data it contains. Each node corresponding to a <FilmOrder> element can be retrieved from your XmlDocument using the GetElementsByTagName method (specifying a tag name of FilmOrder). The GetElementsByTagName method returns a list of XmlNode objects in the form of a collection of type XmlNodeList. Using the For Each statement to construct this list, the XmlNodeList (movieOrderNodes) can be traversed as individual XmlNode elements (movieOrderNode). The general code for handling this is as follows:

Dim xmlDoc As New XmlDocument
Dim movieOrderNodes As XmlNodeList
Dim movieOrderNode As XmlNode
xmlDoc.LoadXml(rawData.ToString())
’ Traverse each <FilmOrder>
movieOrderNodes = xmlDoc.GetElementsByTagName("FilmOrder")
For Each movieOrderNode In movieOrderNodes
    '**********************************************************
    ' Process <name>, <filmId> and <quantity> here
    '**********************************************************
Next

Each XmlNode can then have its contents displayed by traversing the children of this node using the ChildNodes method. This method returns an XmlNodeList (baseDataNodes) that can be traversed one XmlNode list element at a time, shown here (code file: Main.vb):

Dim baseDataNodes As XmlNodeList
Dim bFirstInRow As Boolean
baseDataNodes = movieOrderNode.ChildNodes
bFirstInRow = True
For Each baseDataNode As XmlNode In baseDataNodes
  If (bFirstInRow) Then
    bFirstInRow = False
  Else
    Console.Write(", ")
  End If
  Console.Write(baseDataNode.Name & ": " & baseDataNode.InnerText)
Next
Console.WriteLine()

The bulk of the preceding code retrieves the name of the node using the Name property and the InnerText property of the node. The InnerText property of each XmlNode retrieved contains the data associated with the XML elements (nodes) <name>, <filmId>, and <quantity>. The example displays the contents of the XML elements using Console.Write. The XML document is displayed to the console as follows:

name: Grease, quantity: 10
name: Lawrence of Arabia, quantity: 10

Other, more practical, methods for using this data could have been implemented, including the following:

  • The contents could have been directed to an ASP.NET Response object, and the data retrieved could have been used to create an HTML table (<table> table, <tr> row, and <td> data) that would be written to the Response object.
  • The data traversed could have been directed to a ListBox or ComboBox Windows Forms control. This would enable the data returned to be selected as part of a GUI application.
  • The data could have been edited as part of your application's business rules. For example, you could have used the traversal to verify that the <filmId> matched the <name>. Something like this could be done if you wanted to validate the data entered into the XML document in any manner.

Writing XML with the DOM

You can also use the DOM to create or edit XML documents. Creating new XML items is a two-step process, however. First, you use the containing document to create the new element, attribute, or comment (or other node type), and then you add that at the appropriate location in the document.

Just as there are a number of methods in the DOM for reading the XML, there are also methods for creating new nodes. The XmlDocument class has the basic CreateNode method, as well as specific methods for creating the different node types, such as CreateElement, CreateAttribute, CreateComment, and others. Once the node is created, you add it in place using the AppendChild method of XmlNode (or one of the children of XmlNode).

The example for this section is in the DomWriting project and will be used to demonstrate writing XML with the DOM. Most of the work in this sample will be done in two functions, so the Main method can remain simple, as shown here (code file: Main.vb):

   Sub Main()
         
        Dim data As String
        Dim fileName As String = "filmorama.xml"
        data = GenerateXml(fileName)
         
        Console.WriteLine(data)
        Console.WriteLine("Press ENTER to continue")
        Console.ReadLine()
         
    End Sub

The GenerateXml function creates the initial XmlDocument, and calls the CreateFilmOrder function multiple times to add a number of items to the structure. This creates a hierarchical XML document that can then be used elsewhere in your application. Typically, you would use the Save method to write the XML to a stream or document, but in this case it just retrieves the OuterXml (that is, the full XML document) to display (code file: Main.vb):

    Private Function GenerateXml(ByVal fileName As String) As String
        Dim result As String
        Dim doc As New XmlDocument
        Dim elem As XmlElement
         
        'create root node
        Dim root As XmlElement = doc.CreateElement("FilmOrderList")
        doc.AppendChild(root)
        'this data would likely come from elsewhere
        For i As Integer = 1 To 5
            elem = CreateFilmOrder(doc, i)
            root.AppendChild(elem)
        Next
        result = doc.OuterXml
        Return result
    End Function

The most common error made when writing an XML document using the DOM is to create the elements but forget to append them into the document. This step is done here with the AppendChild method, but other methods can be used, in particular InsertBefore, InsertAfter, PrependChild, and RemoveChild.

Creating the individual FilmOrder nodes uses a similar CreateElement/AppendChild strategy. In addition, attributes are created using the Append method of the Attributes collection for each XmlElement. The following shows the CreateFilOrder method (code file: Main.vb):

    Private Function CreateFilmOrder(ByVal parent As XmlDocument,
       ByVal count As Integer) As XmlElement
        Dim result As XmlElement
        Dim id As XmlAttribute
        Dim title As XmlElement
        Dim quantity As XmlElement
         
        result = parent.CreateElement("FilmOrder")
        id= parent.CreateAttribute("id")
        id.Value = 100 + count
         
        title = parent.CreateElement("title")
        title.InnerText = "Some title here"
         
        quantity = parent.CreateElement("quantity")
        quantity.InnerText = "10"
         
        result.Attributes.Append(id)
        result.AppendChild(title)
        result.AppendChild(quantity)
        Return result
    End Function

This generates the following XML, although it will all be on one line in the output:

<FilmOrderList>
  <FilmOrder id="101">
    <title>Some title here</title>
    <quantity> 10 </quantity>
  </FilmOrder>
  <FilmOrder id="102">
    <title>Some title here</title>
    <quantity> 10 </quantity>
  </FilmOrder>
  <FilmOrder id="103">
    <title>Some title here</title>
    <quantity> 10 </quantity>
  </FilmOrder>
  <FilmOrder id="104">
    <title>Some title here</title>
    <quantity>10</quantity>
  </FilmOrder> 
  <FilmOrder id="105"> 
    <title>Some title here</title>
    <quantity>10</quantity>
  </FilmOrder>
</FilmOrderList>

Once you get the hang of creating XML with the DOM (and forget to add the new nodes a few dozen times), it is quite a handy method for writing XML. If the XML you need to create can all be created at once, it is probably better to use the XmlWriter class instead. Writing XML with the DOM is best left for those situations when you need to either edit an existing XML document or move backward through the document as you are writing. In addition, because the DOM is an international standard, it means that code using the DOM is portable to other languages that also provide a DOM.

In addition to the XmlWriter, the XElement shown later in this chapter provides yet another method for reading and writing XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset