Chapter 10. Working with XML in .NET

Topics in This Chapter

  • Introduction to Using XML: Introduces some of the basic concepts of working with XML. These include the XML validation and the use of an XML style sheet.

  • Reading XML Data: Explains how to use the .NET XML stack to access XML data. The XmlReader, XmlNodeReader, XmlTextReader are examined.

  • Writing XML Data: The easiest way to create XML data is to use the .NET XmlSerializer to serialize data into the XML format. When the data is not in a format that can be serialized, an alternative is the XmlWriter class.

  • Searching and Updating XML Documents: XPath is a query language to search XML documents. Examples illustrate how to use it to search an XmlDocument, XmlDataDocument, and XPathDocument.

Extensible Markup Language (XML) plays a key role in the .NET universe. Configuration files that govern an application or Web page's behavior are deployed in XML; objects are stored or streamed across the Internet by serializing them into an XML representation; Web Services intercommunication is based on XML; and as we see in Chapter 11, “ADO.NET,” .NET methods support the interchange of data between an XML and relational data table format.

XML describes data as a combination of markup language and content that is analogous to the way HTML describes a Web page. Its flexibility permits it to easily represent flat, relational, or hierarchical data. To support one of its design goals—that it “should be human-legible and reasonably clear”[1]—it is represented in a text-only format. This gives it the significant advantage of being platform independent, which has made it the de facto standard for transmitting data over the Internet.

This chapter focuses on pure XML and the classes that reside in the System.Xml namespace hierarchy. It begins with basic background information on XML: how schemas are used to validate XML data and how style sheets are used to alter the way XML is displayed. The remaining sections present the .NET classes that are used to read, write, update, and search XML documents. If you are unfamiliar with .NET XML, you may surprised how quickly you become comfortable with reading and searching XML data. Extracting information from even a complex XML structure is refreshingly easy with the XPath query language—and far less tedious than the original search techniques that required traversing each node of an XML tree. In many ways, it is now as easy to work with XML as it is to work with relational data.

Working with XML

Being literate in one's spoken language is defined as having the basic ability to read and write that language. In XML, functional literacy embraces more than reading and writing XML data. In addition to the XML data document, there is an XML Schema document (.xsd) that is used to validate the content and structure of an XML document. If the XML data is to be displayed or transformed, one or more XML style sheets (.xsl) can be used to define the transformation. Thus, we can define our own form of XML literacy as the ability to do five things:

  1. Create an XML file.

  2. Read and query an XML file.

  3. Create an XML Schema document.

  4. Use an XML Schema document to validate XML data.

  5. Create and use an XML style sheet to transform XML data.

The purpose of this section is to introduce XML concepts and terminology, as well as some .NET techniques for performing the preceding tasks. Of the five tasks, all are covered in this section, with the exception of reading and querying XML data, which is presented in later sections.

Using XML Serialization to Create XML Data

As discussed in Chapter 4, “Working with Objects in C#,” serialization is a convenient way to store objects so they can later be deserialized into the original objects. If the natural state of your data allows it to be represented as objects, or if your application already has it represented as objects, XML serialization often offers a good choice for converting it into an XML format. However, there are some restrictions to keep in mind when applying XML serialization to a class:

  • The class must contain a public default (parameterless) constructor.

  • Only a public property or field can be serialized.

  • A read-only property cannot be serialized.

  • To serialize the objects in a custom collection class, the class must derive from the System.Collections.CollectionBase class and include an indexer. The easiest way to serialize multiple objects is usually to place them in a strongly typed array.

An Example Using the XmlSerializer Class

Listing 10-1 shows the XML file that we're going to use for further examples in this section. It was created by serializing instances of the class shown in Listing 10-2.

Example 10-1. Sample XML File

<?xml version="1.0" standalone="yes"?>
   <films>
      <movies>
         <movie_ID>5</movie_ID>
         <movie_Title>Citizen Kane </movie_Title>
         <movie_Year>1941</movie_Year>
         <movie_DirectorID>Orson Welles</movie_DirectorID>
         <bestPicture>Y</bestPicture>
         <AFIRank>1</AFIRank>
      </movies>
      <movies>
         <movie_ID>6</movie_ID>
         <movie_Title>Casablanca </movie_Title>
         <movie_Year>1942</movie_Year>
         <movie_Director>Michael Curtiz</movie_Director>
         <bestPicture>Y</bestPicture>
         <AFIRank>1</AFIRank>
      </movies>
   </films>

In comparing Listings 10-1 and 10-2, it should be obvious that the XML elements are a direct rendering of the public properties defined for the movies class. The only exceptional feature in the code is the XmlElement attribute, which will be discussed shortly.

Example 10-2. Using XmlSerializer to Create an XML File

using System.Xml;
using System.Xml.Serialization;
// other code here ...
public class movies
{
   public movies()  // Parameterless constructor is required
   {   }
   public movies(int ID, string title, string dir,string pic,
                 int yr, int movierank)
   {
      movieID = ID;
      movie_Director = dir;
      bestPicture = pic;
      rank = movierank;
      movie_Title = title;
      movie_Year = yr;
   }
   // Public properties that are serialized
   public int movieID
   {
      get { return mID; }
      set { mID = value; }
   }
   public string movie_Title
   {
      get { return mTitle; }
      set { mTitle = value; }
   }
   public int movie_Year
   {
      get { return mYear; }
      set { mYear = value; }
   }
   public string movie_Director
   {
      get { return mDirector; }
      set { mDirector = value; }
   }
   public string bestPicture
   {
      get { return mbestPicture; }
      set { mbestPicture = value; }
   }
   [XmlElement("AFIRank")]
   public int rank
   {
      get { return mAFIRank; }
      set { mAFIRank = value; }
   }
   private int mID;
   private string mTitle;
   private int mYear;
   private string mDirector;
   private string mbestPicture;
   private int mAFIRank;
}

To transform the class in Listing 10-2 to the XML in Listing 10-1, we follow the three steps shown in the code that follows. First, the objects to be serialized are created and stored in an array. Second, an XmlSerializer object is created. Its constructor (one of many constructor overloads) takes the object type it is serializing as the first parameter and an attribute as the second. The attribute enables us to assign “films” as the name of the root element in the XML output. The final step is to execute the XmlSerializer.Serialize method to send the serialized output to a selected stream—a file in this case.

// (1) Create array of objects to be serialized
movies[] films = {new movies(5,"Citizen Kane","Orson Welles",
                             "Y", 1941,1 ),
                  new movies(6,"Casablanca","Michael Curtiz",
                             "Y", 1942,2)};
// (2) Create serializer
//     This attribute is used to assign name to XML root element 
XmlRootAttribute xRoot = new XmlRootAttribute();
   xRoot.ElementName = "films";
   xRoot.Namespace = "http://www.corecsharp.net";
   xRoot.IsNullable = true;
// Specify that an array of movies types is to be serialized
XmlSerializer xSerial = new XmlSerializer(typeof(movies[]),
                                          xRoot); 
string filename=@"c:oscarwinners.xml";
// (3) Stream to write XML into
TextWriter writer = new StreamWriter(filename);
xSerial.Serialize(writer,films);

Serialization Attributes

By default, the elements created from a class take the name of the property they represent. For example, the movie_Title property is serialized as a <movie_Title> element. However, there is a set of serialization attributes that can be used to override the default serialization results. Listing 10-2 includes an XmlElement attribute whose purpose is to assign a name to the XML element that is different than that of the corresponding property or field. In this case, the rank property name is replaced with AFIRank in the XML.

There are more than a dozen serialization attributes. Here are some other commonly used ones:

XmlAttribute

Is attached to a property or field and causes it to be rendered as an attribute within an element.

Example: XmlAttribute("movieID")]Result: <movies movieID="5">

XmlIgnore

Causes the field or property to be excluded from the XML.

XmlText

Causes the value of the field or property to be rendered as text. No elements are created for the member name.

Example: [XmlText]         public string movie_Title{Result: <movies movieID="5">Citizen Kane

XML Schema Definition (XSD)

The XML Schema Definition document is an XML file that is used to validate the contents of another XML document. The schema is essentially a template that defines in detail what is permitted in an associated XML document. Its role is similar to that of the BNF (Backus-Naur Form) notation that defines a language's syntax for a compiler.

.NET provides several ways (others are included in Chapter 11, “ADO.NET”) to create a schema from an XML data document. One of the easiest ways is to use the XML Schema Definition tool (Xsd.exe). Simply run it from a command line and specify the XML file for which it is to produce a schema:

C:/ xsd.exe  oscarwinners.xml

The output, oscarwinners.xsd, is shown in Listing 10-3.

Example 10-3. XML Schema to Apply Against XML in Listing 10-1

<xs:schema id="films" xmlns="" 
        xmlns:xs=http://www.w3.org/2001/XMLSchema
        xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="films" msdata:IsDataSet="true">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="movies">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="movie_ID" type="xs:int"
                     minOccurs="0" />
              <xs:element name="movie_Title" type="xs:string"
                     minOccurs="0" />
              <xs:element name="movie_Year" type="xs:int"
                     minOccurs="0" />
              <xs:element name="movie_Director" type="xs:string"
                     minOccurs="0" />
              <xs:element name="bestPicture" type="xs:string"
                     minOccurs="0" />
              <xs:element name="AFIRank" type="xs:int"
                     minOccurs="0" 
              />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

As should be evident from this small sample, the XML Schema language has a rather complex syntax. Those interested in all of its details can find them at the URL shown in the first line of the schema. For those with a more casual interest, the most important thing to note is that the heart of the document is a description of the valid types that may be contained in the XML data that the schema describes. In addition to the string and int types shown here, other supported types include boolean, double, float, dateTime, and hexBinary.

The types specified in the schema are designated as simple or complex. The complextype element defines any node that has children or an attribute; the simpletype has no attribute or child. You'll encounter many schemas where the simple types are defined at the beginning of the schema, and complex types are later defined as a combination of simple types.

XML Schema Validation

A schema is used by a validator to check an XML document for conformance to the layout and content defined by the schema. .NET implements validation as a read and check process. As a class iterates through each node in an XML tree, the node is validated. Listing 10-4 illustrates how the XmlValidatingReader class performs this operation.

First, an XmlTextReader is created to stream through the nodes in the data document. It is passed as an argument to the constructor for the XmlValidatingReader. Then, the ValidationType property is set to indicate a schema will be used for validation. This property can also be set to XDR or DTD to support older validation schemas.

The next step is to add the schema that will be used for validating to the reader's schema collection. Finally, the XmlValidatingReader is used to read the stream of XML nodes. Exception handling is used to display any validation error that occurs.

Example 10-4. XML Schema Validation

private static bool ValidateSchema(string xml, string xsd)
{
   // Parameters: XML document and schemas
   // (1) Create a validating reader
   XmlTextReader tr = new XmlTextReader(xml");
   XmlValidatingReader xvr = new XmlValidatingReader(tr);
   // (2) Indicate schema validation 
   xvr.ValidationType= ValidationType.Schema;
   // (3) Add schema to be used for validation
   xvr.Schemas.Add(null, xsd);
   try
   {
      Console.WriteLine("Validating: ");
      // Loop through all elements in XML document
      while(xvr.Read())
      {
         Console.Write(".");
      }
   }catch (Exception ex)
   { Console.WriteLine( "
{0}",ex.Message); return false;}
   return true;
}

Note that the XmlValidatingReader class implements the XmlReader class underneath. We'll demonstrate using XmlReader to perform validation in the next section. In fact, in most cases, XmlReader (.NET 2.0 implmentation) now makes XmlValidatingReader obsolete.

Using an XML Style Sheet

A style sheet is a document that describes how to transform raw XML data into a different format. The mechanism that performs the transformation is referred to as an XSLT (Extensible Style Language Transformation) processor. Figure 10-1 illustrates the process: The XSLT processor takes as input the XML document to be transformed and the XSL document that defines the transformation to be applied. This approach permits output to be generated dynamically in a variety of formats. These include XML, HTML or ASPX for a Web page, and a PDF document.

Publishing documents with XSLT

Figure 10-1. Publishing documents with XSLT

The XslTransform Class

The .NET version of the XSLT processor is the XslTransform class found in the System.Xml.Xsl namespace. To demonstrate its use, we'll transform our XML movie data into an HTML file for display by a browser (see Figure 10-2).

XML data is transformed into this HTML output

Figure 10-2. XML data is transformed into this HTML output

Before the XslTransform class can be applied, an XSLT style sheet that describes the transformation must be created. Listing 10-5 contains the style sheet that will be used. As you can see, it is a mixture of HTML markup, XSL elements, and XSL commands that displays rows of movie information with three columns. The XSL elements and functions are the key to the transformation. When the XSL style sheet is processed, the XSL elements are replaced with the data from the original XML document.

Example 10-5. XML Style Sheet to Create HTML Output

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">  
   <HTML>
     <TITLE>Movies</TITLE>
     <Table border="0" padding="0" cellspacing="1">
     <THEAD>  
       <TH>Movie Title</TH>
       <TH>Movie Year </TH>
       <TH>AFI Rank   </TH>
       <TH>Director   </TH>
     </THEAD>
     <xsl:for-each select="//movies">
         <xsl:sort select="movie_Title" />
       <tr>
         <td><xsl:value-of select="movie_Title"/> </td>
         <td align="center"><xsl:value-of select=
               "movie_Year"/></td>
         <td align="center"><xsl:value-of select=
               "AFIRank" /></td>
      <td><xsl:value-of select="movie_Director" /></td>
        </tr>
      </xsl:for-each>
      </Table>
    </HTML>
  </xsl:template>
</xsl:stylesheet>

Some points of interest:

  • The URL in the namespace of the <xsl:stylesheet> element must be exactly as shown here.

  • The match attribute is set to an XPath query that indicates which elements in the XML file are to be converted. Setting match="/" selects all elements.

  • The for-each construct loops through a group of selected nodes specified by an XPath expression following the select attribute. XPath is discussed in Section 10.4, “Using XPath to Search XML.”

  • The value-of function extracts a selected value from the XML document and inserts it into the output.

  • The <xsl:sort> element is used to sort the incoming data and is used in conjunction with the for-each construct. Here is its syntax:

select = XPath expression
order = {"ascending" | "descending"}
data-type = {"text" | "number"}
case-order = {"upper-first" | "lower-first"}

After a style sheet is created, using it to transform a document is a breeze. As shown by the following code, applying the XslTransform class is straightforward. After creating an instance of it, you use its Load method to specify the file containing the style sheet. The XslTransform.Transform method performs the transformation. This method has several overloads. The version used here requires an XpathDocument object that represents the XML document, as a parameter, and an XmlWriter parameter that designates where the output is written—an HTML file in this case.

// Transform XML into HTML and store in movies.htm
XmlWriter writer = new 
      XmlTextWriter("c:\movies.htm",Encoding.UTF8);
XslTransform xslt = new XslTransform();
XPathDocument xpd = new 
      XPathDocument("c:\oscarwinners.xml");
xslt.Load("movies.xsl");
xslt.Transform(xpd, null, writer,null);

Core Note

Core Note

You can link a style sheet to an XML document by placing an href statement in the XML document on the line preceding the root element definition:

<?xml:stylesheet type="text/xsl" href="movies.xsl" ?>

If a document is linked to a style sheet that converts XML to HTML, most browsers automatically perform the transformation and display the HTML. This can be a quick way to perform trial-and-error testing when developing a style sheet.

It takes only a small leap from this simple XSLT example to appreciate the potential of being able to transform XML documents dynamically. It is a natural area of growth for Web Services and Web pages that now on demand accept input in one format, transform it, and serve the output up in a different format.

Techniques for Reading XML Data

XML can be represented in two basic ways: as the familiar external document containing embedded data, or as an in-memory tree structure know as a Document Object Model (DOM). In the former case, XML can be read in a forward-only manner as a stream of tokens representing the file's content. The object that performs the reading stays connected to the data source during the read operations. The XmlReader and XmlTextReader shown in Figure 10-3 operate in this manner.

Classes to read XML data

Figure 10-3. Classes to read XML data

More options are available for processing the DOM because it is stored in memory and can be traversed randomly. For simply reading a tree, the XmlNodeReader class offers an efficient way to traverse a tree in a forward, read-only manner. Other more sophisticated approaches that also permit tree modification are covered later in this section.

XmlReader Class

XmlReader is an abstract class possessing methods and properties that enable an application to pull data from an XML file one node at a time in a forward-only, read-only manner. A depth-first search is performed, beginning with the root node in the document. Nodes are inspected using the Name, NodeType, and Value properties.

XmlReader serves as a base class for the concrete classes XmlTextReader and XmlNodeReader. As an abstract class, XmlReader cannot be directly instantiated; however, it has a static Create method that can return an instance of the XmlReader class. This feature became available with the release of .NET Framework 2.0 and is recommended over the XmlTextReader class for reading XML streams.

Listing 10-6 illustrates how to create an XmlReader object and use it to read the contents of a short XML document file. The code is also useful for illustrating how .NET converts the content of the file into a stream of node objects. It's important to understand the concept of nodes because an XML or HTML document is defined (by the official W3C Document Object Model (DOM) specification[2] ) as a hierarchy of node objects.

Example 10-6. Using XmlReader to Read an XML Document

// Include these namespaces:
// using System.Xml;
// using System.Xml.XPath;
public void ShowNodes()
{
   //(1) Settings object enables/disables features on XmlReader 
   XmlReaderSettings settings = new XmlReaderSettings();
   settings.ConformanceLevel = ConformanceLevel.Fragment;
   settings.IgnoreWhitespace = true;  
   try
   {
      //(2) Create XmlReader object 
      XmlReader rdr = XmlReader.Create("c:\oscarsshort.xml", 
                                       settings);
      while (rdr.Read())
      {
         Format(rdr);
      }
      rdr.Close();
   }
   catch (Exception e)
   {
       Console.WriteLine ("Exception: {0}", e.ToString());
   }
}
private static void Format(XmlTextReader reader)
{
   //(3) Print Current node properties
   Console.Write( reader.NodeType+ "<" + reader.Name + ">" + 
                  reader.Value);
   Console.WriteLine();
}

Before creating the XmlReader, the code first creates an XmlReaderSettings object. This object sets features that define how the XmlReader object processes the input stream. For example, the ConformanceLevel property specifies how the input is checked. The statement

settings.ConformanceLevel = ConformanceLevel.Fragment;

specifies that the input must conform to the standards that define an XML 1.0 document fragment—an XML document that does not necessarily have a root node.

This object and the name of the XML document file are then passed to the Create method that returns an XmlReader instance:

XmlReader rdr = XmlReader.Create("c:\oscarsshort.xml, settings);

The file's content is read in a node at a time by the XmlReader.Read method, which prints the NodeType, Name, and Value of each node. Listing 10-7 shows the input file and a portion of the generated output. Line numbers have been added so that an input line and its corresponding node information can be compared.

Example 10-7. XML Input and Corresponding Nodes

Input File: oscarsshort.xml

(1) <?xml version="1.0" standalone="yes"?>
(2) <films>
(3)  <movies>
(4)    <!-- Selected by AFI as best movie -->
(5)    <movie_ID>5</movie_ID>
(6)    <![CDATA[<a href="http://www.imdb.com/tt0467/">Kane</a>]]>
(7)    <movie_Title>Citizen Kane </movie_Title>
(8)    <movie_Year>1941</movie_Year>
(9)    <movie_Director>Orson Welles</movie_Director>
(10)   <bestPicture>Y</bestPicture>
(11) </movies>  
(12)</films>

Program Output (NodeType, <Name>, Value):

(1) XmlDeclaration<xml>version="1.0" standalone="yes"
(2) Element<films>
(3) Element<movies>
(4) Comment<> Selected by AFI as best movie
(5) Element<movie_ID>
      Text<>5
    EndElement<movie_ID>
(6) CDATA<><a href="http://www.imdb.com/tt0467/">Kane</a>
(7) Element<movie_Title>
      Text<>Citizen Kane 
    EndElement<movie_Title>
       ...
(12)EndElement<films>

Programs that use XmlReader typically implement a logic pattern that consists of an outer loop that reads nodes and an inner switch statement that identifies the node using an XMLNodeType enumeration. The logic to process the node information is handled in the case blocks:

while (reader.Read())
{
   switch (reader.NodeType)
   {
      case XmlNodeType.Element:
      // Attributes are contained in elements
         while(reader.MoveToNextAttribute())
         {
            Console.WriteLine(reader.Name+reader.Value);
         }
      break;
      case XmlNodeType.Text:
      // Process ..
      break;
      case XmlNodeType.EndElement
      // Process ..
      break;
   }
}

The Element, Text, and Attribute nodes mark most of the data content in an XML document. Note that the Attribute node is regarded as metadata attached to an element and is the only one not exposed directly by the XmlReader.Read method. As shown in the preceding code segment, the attributes in an Element can be accessed using the MoveToNextAttribute method.

Table 10-1 summarizes the node types. It is worth noting that these types are not an arbitrary .NET implementation. With the exception of Whitespace and XmlDeclaration, they conform to the DOM Structure Model recommendation.

Table 10-1. XmlNodeType Enumeration

Option

Description and Use

Attribute

An attribute or value contained within an element. Example:

<movie_title genre="comedy">The Lady Eve
</movie_title>
Attribute is genre="comedy". Attributes must be located within an element
if(reader.NodeType==XmlNodeType.Element){
  while(reader.MoveToNextAttribute())
   {
    Console.WriteLine(reader.Name+reader.Value);
   }

CData

Designates that the element is not to be parsed. Markup characters are treated as text:

![CDATA[<ELEMENT>
<a href="http://www.imdb.com">movies</a>
</ELEMENT>]]>

Comment

To make a comment:

<!-- comment -->

To have comments ignored:

XmlReaderSettings.IgnoreComment = true;

Document

A document root object that provides access to the entire XML document.

DocumentFragment

A document fragment. This is a node or subtree with a document. It provides a way to work with part of a document.

DocumentType

Document type declaration indicated by <!DOCTYPE … >. Can refer to an external Document Type Definition (DTD) file or be an inline block containing Entity and Notation declarations.

Element

An XML element. Designated by the < > brackets: <movie_Title>

EndElement

An XML end element tag. Marks the end of an element: </movie_Title>

EndEntity

End of an Entity declaration.

Entity

Defines text or a resource to replace the entity name in the XML. An entity is defined as a child of a document type node:

<!DOCTYPE movies[
   <!ENTITY leadingactress "stanwyck">
]>
XML would then reference this as: <actress>&leadingactress;</actress>

EntityReference

A reference to the entity. In the preceding example, &leadingactress; is an EntityReference.

Notation

A notation that is declared within a DocumentType declaration. Primary use is to pass information to the XML processor. Example:

<!NOTATION homepage="www.sci.com" !>

ProcessingInstruction

Useful for providing information about how the data was generated or how to process it. Example:

<?pi1 Requires IE 5.0 and above ?>

Text

The text content of a node.

Whitespace

Whitespace refers to formatting characters such as tabs, line feeds, returns, and spaces that exist between the markup and affect the layout of a document.

XmlDeclaration

The first node in the document. It provides version information.

<?xml version="1.0" standalone="yes"?>

XmlNodeReader Class

The XmlNodeReader is another forward-only reader that processes XML as a stream of nodes. It differs from the XmlReader class in two significant ways:

  • It processes nodes from an in-memory DOM tree structure rather than a text file.

  • It can begin reading at any subtree node in the structure—not just at the root node (beginning of the document).

In Listing 10-8, an XmlNodeReader object is used to list the movie title and year from the XML-formatted movies database. The code contains an interesting twist: The XmlNodeReader object is not used directly, but instead is passed as a parameter to the constructor of an XmlReader object. The object serves as a wrapper that performs the actual reading. This approach has the advantage of allowing the XmlSettings values to be assigned to the reader.

Example 10-8. Using XmlNodeReader to Read an XML Document

private void ListMovies()
{
   // (1) Specify XML file to be loaded as a DOM
   XmlDocument doc = new XmlDocument();
   doc.Load("c:\oscarwinners.xml");
   // (2) Settings for use with XmlNodeReader object
   XmlReaderSettings settings = new XmlReaderSettings();
   settings.ConformanceLevel = ConformanceLevel.Fragment;
   settings.IgnoreWhitespace = true;
   settings.IgnoreComments = true;
   // (3) Create a nodereader object
   XmlNodeReader noderdr = new XmlNodeReader(doc);
   // (4) Create an XmlReader as a wrapper around node reader
   XmlReader reader = XmlReader.Create(noderdr, settings);
   while (reader.Read())
   {
      if(reader.NodeType==XmlNodeType.Element){
         if (reader.Name == "movie_Title")
         {
            reader.Read();  // next node is text for title
            Console.Write(reader.Value);    // Movie Title
         }
         if (reader.Name == "movie_Year")
         {
            reader.Read();  // next node is text for year
            Console.WriteLine(reader.Value); // year
         }
      }
   }
}

The parameter passed to the XmlNodeReader constructor determines the first node in the tree to be read. When the entire document is passed—as in this example—reading begins with the top node in the tree. To select a specific node, use the XmlDocument.SelectSingleNode method as illustrated in this segment:

XmlDocument doc = new XmlDocument();
doc.Load("c:\oscarwinners.xml");  // Build tree in memory
XmlNodeReader noderdr = new 
      XmlNodeReader(doc.SelectSingleNode("films/movies[2]"));

Refer to Listing 10-1 and you can see that this selects the second movies element group, which contains information on Casablanca.

If your application requires read-only access to XML data and the capability to read selected subtrees, the XmlNodeReader is an efficient solution. When updating, writing, and searching become requirements, a more sophisticated approach is required; we'll look at those techniques later in this section.

The XmlReaderSettings Class

A significant advantage of using an XmlReader object—directly or as a wrapper—is the presence of the XmlReaderSettings class as a way to define the behavior of the XmlReader object. Its most useful properties specify which node types in the input stream are ignored and whether XML validation is performed. Table 10-2 lists the XmlReaderSettings properties.

Table 10-2. Properties of the XmlReaderSettings Class

Property

Default Value

Description

CheckCharacters

true

Indicates whether characters and XML names are checked for illegal XML characters. An exception is thrown if one is encountered.

CloseInput

false

An XmlReader object may be created by passing a stream to it. This property indicates whether the stream is closed when the reader object is closed.

ConformanceLevel

Document

Indicates whether the XML should conform to the standards for a Document or DocumentFragment.

DtdValidate

false

Indicates whether to perform DTD validation.

IgnoreComments
IgnoreInlineSchema
IgnoreProcessingInstructions
IgnoreSchemaLocation
IgnoreValidationWarnings
IgnoreWhitespace

false

true

false

true

true

false

Specify whether a particular node type is processed or ignored by the XmlReader.Read method.

LineNumberOffset
LinePositionOffset

0

0

XmlReader numbers lines in the XML document beginning with 0. Set this property to change the beginning line number and line position values.

Schemas

is empty

Contains the XmlSchemaSet to be used for XML Schema Definition Language (XSD) validation.

XsdValidate

false

Indicates whether XSD validation is performed.

Using an XML Schema to Validate XML Data

The final two properties listed in Table 10-2Schemas and XsdValidate—are used to validate XML data against a schema. Recall that a schema is a template that describes the permissible content in an XML file or stream. Validation can be (should be) used to ensure that data being read conforms to the rules of the schema. To request validation, you must add the validating schema to the XmlSchemaSet collection of the Schemas property; next, set XsdValidate to true; and finally, define an event handler to be called if a validation error occurs. The following code fragment shows the code used with the schema and XML data in Listings 10-1 and 10-3:

XmlReaderSettings settings = new XmlReaderSettings();
// (1) Specify schema to be used for validation
settings.Schemas.Add(null,"c:\oscarwinners.xsd");
// (2) Must set this to true
settings.XsdValidate = true;
// (3) Delegate to handle validation error event
settings.ValidationEventHandler += new 
      System.Xml.Schema.ValidationEventHandler(SchemaValidation);
// (4) Create reader and pass settings to it
XmlReader rdr = XmlReader.Create("c:\oscarwinners.xml",
      settings);
// process XML data ...
...
// Method to handle errors detected during schema validation
private void SchemaValidation(object sender, System.Xml.Schema.ValidationEventArgs e)
{
   MessageBox.Show(e.Message);
}

Note that a detected error does not stop processing. This means that all the XML data can be checked in one pass without restarting the program.

Options for Reading XML Data

All the preceding examples that read XML data share two characteristics: data is read a node at a time, and a node's value is extracted as a string using the XmlReader.Value property. This keeps things simple, but ignores the underlying XML data. For example, XML often contains numeric data or data that is the product of serializing a class. Both cases can be handled more efficiently using other XmlReader methods.

XmlReader has a suite of ReadValueAsxxx methods that can read the contents of a node in its native form. These include ReadValueAsBoolean, ReadValueAsDateTime, ReadValueAsDecimal, ReadValueAsDouble, ReadValueAsInt32, ReadValueAsInt64, and ReadValueAsSingle. Here's an example:

int age;
if(reader.Name == "Age") age= reader.ReadValueAsInt32();

XML that corresponds to the public properties or fields of a class can be read directly into an instance of the class with the ReadAsObject method. This fragment reads the XML data shown in Listing 10-1 into an instance of the movies class. Note that the name of the field or property must match an element name in the XML data.

// Deserialize XML into a movies object
if (rdr.NodeType == XmlNodeType.Element && rdr.Name == "movies")
{
   movies m = (movies)rdr.ReadAsObject(typeof(movies));
   // Do something with object
}
// XML data is read directly into this class
public class movies
{
   public int movie_ID;
   public string movie_Title;
   public string movie_Year;
   private string director;
   public string bestPicture;
   public string movie_Director 
   {
      set { director = value; }
      get { return (director); }
   }
}

Techniques for Writing XML Data

In many cases, the easiest way to present data in an XML format is to use .NET serialization. As demonstrated in Section 10.1, if the data is in a collection class, it can be serialized using the XmlSerializer class; as we see in the next chapter, if it's in a DataSet, the DataSet.WriteXml method can be applied. The advantages of serialization are that it is easy to use, generates well-formed XML, and is symmetrical—the XML that is written can be read back to create the original data objects.

For cases where serialization is not an option—a comma delimited file, for instance—or where more control over the XML layout is needed, the XmlWriter class is the best .NET solution.

Writing XML with the XmlWriter Class

The XmlWriter class offers precise control over each character written to an XML stream or file. However, this flexibility does require a general knowledge of XML and can be tedious to code, because a distinct Writexxx method is used to generate each node type. On the positive side, it offers several compliance checking features, and the ability to write CLR typed data directly to the XML stream:

  • XmlWriterSettings. CheckCharacters property configures the XmlWriter to check for illegal characters in text nodes and XML names, as well as check the validity of XML names. An exception is thrown if an invalid character is detected.

  • XmlWriterSettings. ConformanceLevel property configures the XmlWriter to guarantee that the stream complies with the conformance level that is specified. For example, the XML may be set to conform to a document or document fragment.

  • XmlWriter. WriteValue method is used to write data to the XML stream as a CLR type (int, double, and so on) without having to first convert it to a string.

Listing 10-9 illustrates the basic principles involved in using the XmlWriter class. Not surprisingly, there are a lot of similarities to the closely related XmlReader class. Both use the Create method to create an object instance, and both have constructor overloads that accept a settings object—XmlWriterSettings, in this case—to define the behavior of the reader or writer. The most important of these setting properties is the conformance level that specifies either document or fragment (a subtree) conformance.

A series of self-describing methods, which support all the node types listed in Table 10-1, generate the XML. Note that exception handling should always be enabled to trap any attempt to write an invalid name or character.

Example 10-9. Write XML Using XmlWriter Class

private void WriteMovie()
{
   string[,] movieList = { { "Annie Hall", "Woody Allen" },
                       { "Lawrence of Arabia", "David Lean" } };
   // (1) Define settings to govern writer actions
   XmlWriterSettings settings = new XmlWriterSettings();
   settings.Indent = true;
   settings.IndentChars = ("    ");
   settings.ConformanceLevel = ConformanceLevel.Document;
   settings.CloseOutput = false;
   settings.OmitXmlDeclaration = false;
   // (2) Create XmlWriter object
   XmlWriter writer = XmlWriter.Create("c:\mymovies.xml", 
                                       settings);
   writer.WriteStartDocument();
   writer.WriteComment("Output from xmlwriter class");
   writer.WriteStartElement("films");
   for (int i = 0; i <= movieList.GetUpperBound(0) ; i++)
   {
      try
      {
         writer.WriteStartElement("movie");
         writer.WriteElementString("Title", movieList[i, 0]);
         writer.WriteElementString("Director", movieList[i, 1]);
         writer.WriteStartElement("Movie_ID");
         writer.WriteValue(i); // No need to convert to string
         writer.WriteEndElement();
         writer.WriteEndElement();
      }
      catch (Exception ex)
      {
         MessageBox.Show(ex.Message);
      }
   }
   writer.WriteEndElement();
   writer.Flush();  // Flush any remaining content to XML stream
   writer.Close();
   /*
      Output:
      <?xml version="1.0" encoding="utf-8"?>
      <!--Output from xmlwriter class-->
      <films>
         <movie>
            <Title>Annie Hall</Title>
            <Director>Woody Allen</Director>
            <Movie_ID>0</Movie_ID>
         </movie>
         <movie>
            <Title>Lawrence of Arabia</Title>
            <Director>David Lean</Director>
            <Movie_ID>1</Movie_ID>
         </movie>
      </films>
   */ 
}

Before leaving the topic of XML writing, note that .NET also provides XmlTextWriter and XmlNodeWriter classes as concrete implementations of the abstract XmlWriter class. The former does not offer any significant advantages over the XmlWriter. The node writer is a bit more useful. It creates a DOM tree in memory that can be processed using the many classes and methods designed for that task. Refer to .NET documentation for XmlNodeWriter details.

Using XPath to Search XML

A significant benefit of representing XML in a tree model—as opposed to a data stream—is the capability to query and locate the tree's content using XML Path Language (XPath). This technique is similar to using a SQL command on relational data. An XPath expression (query) is created and passed to an engine that evaluates it. The expression is parsed and executed against a data store. The returned value(s) may be a set of nodes or a scalar value.

XPath is a formal query language defined by the XML Path Language 2.0 specification (www.w3.org/TR/xpath). Syntactically, its most commonly used expressions resemble a file system path and may represent either the absolute or relative position of nodes in the tree.

In the .NET Framework, XPath evaluation is exposed through the XPathNavigator abstract class. The navigator is an XPath processor that works on top of any XML data source that exposes the IXPathNavigable interface. The most important member of this interface is the CreateNavigator method, which returns an XPathNavigator object. Figure 10-4 shows three classes that implement this interface. Of these, XmlDocument and XmlDataDocument are members of the System.Xml namespace; XPathDocument (as well as the XmlNavigator class) resides in the System.Xml.XPath namespace.

  • XmlDocumentImplements the W3C Document Object Model (DOM) and supports XPath queries, navigation, and editing.

  • XmlDataDocumentIn addition to the features it inherits from XmlDocument, it provides the capability to map XML data to a DataSet. Any changes to the DataSet are reflected in the XML tree and vice versa.

  • XPathDocumentThis class is optimized to perform XPath queries and represents XML in a tree of read-only nodes that is more streamlined than the DOM.

XML classes that support XPath navigation

Figure 10-4. XML classes that support XPath navigation

Constructing XPath Queries

Queries can be executed against each of these classes using either an XPathNavigator object or the SelectNodes method implemented by each class. Generic code looks like this:

// XPATHEXPRESSION is the XPath query applied to the data
// (1) Return a list of nodes
XmlDocument doc = new XmlDocument();
doc.Load("movies.xml");
XmlNodeList selection = doc.SelectNodes(XPATHEXPRESSION);
// (2) Create a navigator and execute the query
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator iterator = nav.Select(XPATHEXPRESSION);

The XpathNodeIterator class encapsulates a list of nodes and provides a way to iterate over the list.

As with regular expressions (refer to Chapter 5, “C# Text Manipulation and File I/O”), an XPath query has its own syntax and operators that must be mastered in order to efficiently query an XML document. To demonstrate some of the fundamental XPath operators, we'll create queries against the data in Listing 10-10.

Example 10-10. XML Representation of Directors/Movies Relationship

<films>
  <directors>
    <director_id>54</director_id>
    <first_name>Martin</first_name>
    <last_name>Scorsese</last_name>
    <movies>
      <movie_ID>30</movie_ID>
      <movie_Title>Taxi Driver</movie_Title>
      <movie_DirectorID>54</movie_DirectorID>
      <movie_Year>1976</movie_Year>
    </movies>
    <movies>
      <movie_ID>28</movie_ID>
      <movie_Title>Raging Bull </movie_Title>
      <movie_DirectorID>54</movie_DirectorID>
      <movie_Year>1980</movie_Year>
    </movies>
  </directors>
</films>

Table 10-3 summarizes commonly used XPath operators and provides an example of using each.

Table 10-3. XPath Operators

Operator

Description

Child operator (/)

References the root of the XML document, where the expression begins searching. The following expression returns the last_name node for each director in the table:

/films/directors/last_name

Recursive descendant operator (//)

This operator indicates that the search should include descendants along the specified path. The following all return the same set of last_name nodes. The difference is that the first begins searching at the root, and second at each directors node:

//last_name
//directors//last_name

Wildcard operator (*)

Returns all nodes below the specified path location. The following returns all nodes that are descendants of the movies node:

//movies/*

Current operator (.)

Refers to the currently selected node in the tree, when navigating through a tree node-by-node. It effectively becomes the root node when the operator is applied. In this example, if the current node is a directors node, this will find any last_name child nodes:

.//last_name

Parent operator (..)

Used to represent the node that is the parent of the current node. If the current node were a movies node, this would use the directors node as the start of the path:

../last_name

Attribute operator (@)

Returns any attributes specified. The following example would return the movie's runtime assuming there were attributes such as <movie_ID time="98"> included in the XML.

//movies//@time

Filter operator ([ ])

Allows nodes to be filtered based on a matching criteria. The following example is used to retrieve all movie titles directed by Martin Scorsese:

//directors[last_name='Scorsese']
   /movies/movie_Title

Collection operator ([ ])

Uses brackets just as the filter, but specifies a node based on an ordinal value. Is used to distinguish among nodes with the same name. This example returns the node for the second movie, Raging Bull:

//movies[2] (Index is not 0 based.)

Union operator (|)

Returns the union of nodes found on specified paths. This example returns the first and last name of each director:

//last_name | //first_name

Note that the filter operator permits nodes to be selected by their content. There are a number of functions and operators that can be used to specify the matching criteria. Table 10-4 lists some of these.

Table 10-4. Functions and Operators used to Create an XPath Filter

Function/Operator

Description

and, or

Logical operators.

Example: "directors[last_name= 'Scorsese' and first_name= 'Martin']"

position( )

Selects node(s) at specified position.

Example: "//movies[position()=2]"

contains(node,string)

Matches if node value contains specified string.

Example: "//movies[contains(movie_Title,'Tax')]"

starts-with(node,string)

Matches if node value begins with specified string.

Example: "//movies[starts-with(movie_Title,'A')]"

substring-after(string,string)

Extracts substring from the first string that follows occurrence of second string.

Example: "//movies[substring-after('The Graduate','The ')='Graduate']"

substring(string, pos,length)

Extracts substring from node value.

Example: "//movies[substring(movie_Title,2,1)='a']"

Refer to the XPath standard (http://www.w3.org/TR/xpath) for a comprehensive list of operators and functions.

Let's now look at examples of using XPath queries to search, delete, and add data to an XML tree. Our source XML file is shown in Listing 10-10. For demonstration purposes, examples are included that represent the XML data as an XmlDocument, XPathDocument, and XmlDataDocument.

XmlDocument and XPath

The expression in this example extracts the set of last_name nodes. It then prints the associated text. Note that underneath, SelectNodes uses a navigator to evaluate the expression.

string exp = "/films/directors/last_name";
XmlDocument doc = new XmlDocument();
doc.Load("directormovies.xml");  // Build DOM tree
XmlNodeList directors = doc.SelectNodes(exp);
foreach(XmlNode n in directors)
   Console.WriteLine(n.InnerText);  // Last name or director 

The XmlNode.InnerText property concatenates the values of child nodes and displays them as a text string. This is a convenient way to display tree contents during application testing.

XPathDocument and XPath

For applications that only need to query an XML document, the XPathDocument is the recommended class. It is free of the overhead required for updating a tree and runs 20 to 30 percent faster than XmlDocument. In addition, it can be created using an XmlReader to load all or part of a document into it. This is done by creating the reader, positioning it to a desired subtree, and then passing it to the XPathDocument constructor. In this example, the XmlReader is positioned at the root node, so the entire tree is read in:

string exp = "/films/directors/last_name";
// Create method was added with .NET 2.0
XmlReader rdr = XmlReader.Create("c:\directormovies.xml");
// Pass XmlReader to the constructor
xDoc = new XPathDocument(rdr);
XPathNavigator nav= xDoc.CreateNavigator();
XPathNodeIterator iterator;
iterator = nav.Select(exp);
// List last name of each director
while (iterator.MoveNext())
   Console.WriteLine(iterator.Current.Value);
// Now, list only movies for Martin Scorsese
string exp2 = 
   "//directors[last_name='Scorsese']/movies/movie_Title";
iterator = nav.Select(exp2);
while (iterator.MoveNext())
   Console.WriteLine(iterator.Current.Value);

Core Note

Core Note

Unlike the SelectNodes method, the navigator's Select method accepts XPath expressions as both plain text and precompiled objects. The following statements demonstrate how a compiled expression could be used in the preceding example:

string exp = "/films/directors/last_name";
// use XmlNavigator to create XPathExpression object
XPathExpression compExp = nav.Compile(exp);
iterator = nav.Select(compExp);  

Compiling an expression improves performance when the expression (query) is used more than once.

XmlDataDocument and XPath

The XmlDataDocument class allows you to take a DataSet (an object containing rows of data) and create a replica of it as a tree structure. The tree not only represents the DatSet, but is synchronized with it. This means that changes made to the DOM or DataSet are automatically reflected in the other.

Because XmlDataDocument is derived from XmlDocument, it supports the basic methods and properties used to manipulate XML data. To these, it adds methods specifically related to working with a DataSet. The most interesting of these is the GetRowFromElement method that takes an XmlElement and converts it to a corresponding DataRow.

A short example illustrates how XPath is used to retrieve the set of nodes representing the movies associated with a selected director. The nodes are then converted to a DataRow, which is used to print data from a column in the row.

// Create document by passing in associated DataSet
XmlDataDocument xmlDoc = new XmlDataDocument(ds);
string exp = "//directors[last_name='Scorsese']/movies";
XmlNodeList nodeList =    
      xmlDoc.DocumentElement.SelectNodes(exp);
DataRow myRow;
foreach (XmlNode myNode in nodeList)
{
   myRow = xmlDoc.GetRowFromElement((XmlElement)myNode);
   if (myRow != null){
      // Print Movie Title from a DataRow
      Console.WriteLine(myRow["movie_Title"].ToString());
   }
}

This class should be used only when its hybrid features add value to an application. Otherwise, use XmlDocument if updates are required or XPathDocument if the data is read-only.

Adding and Removing Nodes on a Tree

Besides locating and reading data, many applications need to add, edit, and delete information in an XML document tree. This is done using methods that edit the content of a node and add or delete nodes. After the changes have been made to the tree, the updated DOM is saved to a file.

To demonstrate how to add and remove nodes, we'll operate on the subtree presented as text in Listing 10-10 and as a graphical tree in Figure 10-5.

Subtree used to delete and remove nodes

Figure 10-5. Subtree used to delete and remove nodes

This example uses the XmlDocument class to represent the tree for which we will remove one movies element and add another one. XPath is used to locate the movies node for Raging Bull along the path containing Scorsese as the director:

"//directors[last_name='Scorsese']/movies[movie_Title=
      'Raging Bull']"

This node is deleted by locating its parent node, which is on the level directly above it, and executing its RemoveChild method.

Example 10-11. Using XmlDocument and XPath to Add and Remove Nodes

Public void UseXPath()
{
   XmlDocument doc = new XmlDocument();
   doc.Load("c:\directormovies.xml");
   // (1) Locate movie to remove
   string exp = "//directors[last_name='Scorsese']/
         movies[movie_Title='Raging Bull']";
   XmlNode movieNode = doc.SelectSingleNode(exp);
   // (2) Delete node and child nodes for movie
   XmlNode directorNode = movieNode.ParentNode;
   directorNode.RemoveChild(movieNode);
   // (3) Add new movie for this director
   //     First, get and save director's ID
   string directorID = 
         directorNode.SelectSingleNode("director_id").InnerText;
   // XmlElement is dervied from XmlNode and adds members
   XmlElement movieEl = doc.CreateElement("movies");
   directorNode.AppendChild(movieEl);
   // (4) Add Movie Description
   AppendChildElement(movieEl, "movie_ID", "94");
   AppendChildElement(movieEl, "movie_Title", "Goodfellas");
   AppendChildElement(movieEl, "movie_Year", "1990");
   AppendChildElement(movieEl, "movie_DirectorID", 
                               directorID);
   // (5) Save updated XML Document
   doc.Save("c:\directormovies2.xml");
}
// Create node and append to parent
public void AppendChildElement(XmlNode parent, string elName, 
                               string elValue)
{
   XmlElement newEl = 
         parent.OwnerDocument.CreateElement(elName);
   newEl.InnerText = elValue;
   parent.AppendChild(newEl);
}

Adding a node requires first locating the node that will be used to attach the new node. Then, the document's Createxxx method is used to generate an XmlNode or XmlNode-derived object that will be added to the tree. The node is attached using the current node's AppendChild, InsertAfter, or InsertBefore method to position the new node in the tree. In this example, we add a movies element that contains information for the movie Goodfellas.

Summary

To work with XML, a basic understanding of the XML document, schema, and style sheet is required. An XML document, which is a representation of information based on XML guidelines, can be created in numerous ways. The XmlSerializer class can be used when the data takes the form of an object or objects within a program. After the XML document is created, a schema can be derived from it using the XML Schema Definition (XSD) tool. Several classes use the schema to provide automatic document validation. The usefulness of XML data is extended by the capability to transform it into virtually any other format using an XML style sheet. The style sheet defines a set of rules that are applied during XML Style Sheet Transformation (XSLT).

XML data can be processed as a stream of nodes or an in-memory tree known as a Document Object Model (DOM). The XmlReader and XmlNodeReader classes provide an efficient way to process XML as a read-only, forward-only stream. The Xml Reader, XPathDocument, and XmlDataReader classes offer methods for processing nodes in the tree structure.

In many cases, data extraction from an XML tree can be best achieved using a query, rather than traversing the tree nodes. The XPath expression presents a rich, standardized syntax that is easily used to specify criteria for extracting a node, or multiple nodes, from an XML tree.

Test Your Understanding

1:

XmlReader is an abstract class. How do you create an instance of it to read an XML document?

2:

What is the purpose of the XmlReaderSettings class?

3:

Which of these classes cannot be used to update an XML document?

  1. XmlDocument
    
  2. XmlDataDocument
    
  3. XmlPathDocument
    

4:

Using the XML data from Listing 10-10, show the node values returned by the following XPath expressions:

  1. //movies[substring( movie_Title,2,1)='a']
    
  2. //movies[2]
    
  3. //movies[movie_Year >= 1978]
    
  4. //directors[last_name='Scorsese']
          /movies/movie_Title
    

5:

Describe two ways to perform schema validation on an XML document.



[1] W3C Extensible Markup Language (XML), 1.0 (Third Edition), http://www.w3.org/TR/REC-xml/

[2] W3C Document Object Model (DOM) Level 3 Core Specification, April, 2004, http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset