Chapter 13. LINQ to XML: Creation

LINQ to XML provides support for querying, creating, and transforming XML documents. XML namespaces are included in the API, as well as support for XML schemas. LINQ to XML stands on its own as a compelling alternative to technologies such as XPath, XQuery, XSLT, and the XML DOM.

Although LINQ to XML is not a small subject, the learning curve is nonetheless gentle. You learned earlier in this book that LINQ has a unified querying model. The skills you learned reading about LINQ to objects and LINQ to SQL also apply to LINQ to XML. This can seem like a minor point at first. It is not. The unified model that allows you to apply a single set of rules to a wide variety of data sources is one of the most valuable benefits of the LINQ programming model.

A series of practical examples explored in this chapter exemplify the key themes underlying LINQ to XML development. The focus is on learning the basics and then slowly introducing more complex subjects over the course of three chapters:

• This chapter focuses on creating XML.

Chapter 14, “Querying and Editing XML,” shows you how to query XML.

The final chapter (Chapter 15, “XML Namespaces, Transformations, and Schema Validation”) in the series shows more advanced topics:

• XML namespaces

• XML transforms

• XML schemas

I like XML because it provides a simple, humble solution for use with a range of advanced technologies. For instance, XML can help you

• Transfer data across the Web.

• Create and call web services (SOAP).

• Create RSS documents that form a simple subscription model for information of all types.

• Define the object model for WPF (XAML).

• Define a host of other services too numerous to mention.

I’ve tried to make this chapter fit the subject matter by keeping the text easy to read. Hopefully you will be able to relax while reading, finding that each subject unfolds in a logical manner. When you are done, you will have learned about a relatively simple technology that is in wide use throughout many areas of both desktop and Internet-based computing.

XML Fundamentals

This chapter discusses several important features of XML with which you may already be conversant. Let’s take a moment, however, to make sure that you understand the fundamentals. I think I can safely assume that you probably know the basics of XML, but I want to pin down some nomenclature.

Consider the simple XML document shown in Listing 13.1.

Listing 13.1. A Simple XML Document Containing Two Planets and One Moon

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--The planets Venus and Earth-->
<Planets>
  <Planet>

    <Name>Venus</Name>
  </Planet>
  <Planet Id="3">
    <Name>Earth</Name>
    <Moons>
      <Moon>Moon</Moon>
    </Moons>
  </Planet>
</Planets>

The data captured in this document enumerates the second and third planets in our solar system. The code easily captures the fact that the Earth has a satellite that we call the moon. An Id field, set to the number 3, is also associated with the planet Earth.

This document contains four distinct pieces of XML syntax called a declaration, comment, element, and attribute. The top element, called Planets, is called a root node. These bits of syntax are illustrated in Figure 13.1.

Figure 13.1. This simple XML file includes a declaration, a comment, multiple elements, an attribute, and a root node.

image

At the top you see an XML declaration, containing information about the version, the encoding, and whether the file depends on any other files. In this case, it does not; it can stand on its own. After the declaration you see an XML comment.

The root node, called Planets, is the beginning of an XML tree. This tree has various nodes, most of which are elements. These elements include the nodes called Planets, Planet, Name, Moons, and Moon.

The Planets element is the parent of the Planet element. The Planet element is a child of the Planets element. The Planet, Name, Moons, and Moon elements are all descendants of the Planets element. The Planet element is an ancestor of the Name, Moons, and Moon elements.

Figure 13.2 shows a simple XML element consisting of some text and an opening and closing tag. The text field is also frequently called content. We call this an XML element because it has an opening tag delineated by brackets and a closing tag delineated by brackets and a slash. The actual text inside the brackets is arbitrary.

Figure 13.2. Elements typically consist of tags and content. In LINQ to XML, the text or content field is called Value.

image

Some of the XML elements in the sample document are nested:

<Moons>
   <Moon>Moon</Moon>
</Moons>

In this case both the Moons and Moon nodes are elements and can be treated as single entities. In other words, LINQ to XML allows you to address Moons as a single element, even though it contains a nested element.

The document shown in Listing 13.1 has a root node called Planets. The root node, or root element, is the outermost node in the portion of an XML file that contains its data. The declaration and some comments are located outside the root nodes, but they do not contain the data, which is the primary payload for an XML document. Locating the root can help you get your bearings in even the largest and most complex XML file.

The Planet element has an XML attribute, as shown in Figure 13.3. Attributes are nested inside an element tag, and they have a name, an equals sign, and a value in quotation marks.

Figure 13.3. This entire XML node is an element. It contains an attribute called Id with a value of 3.

image


Elements Versus Attributes

XML has no clearly defined rules about when data should be placed in an attribute and when it should reside in an element. I tend to follow the common practice of placing data that I want to display to a user in elements, and placing housekeeping data such as an Id in attributes. In general, I find elements easier to read than attributes, so I tend to favor them. However, these are simply my prejudices; opinions on this subject differ. Where necessary, I will happily suffer minor inconsistencies in my style.


XML declarations are optional in XML 1.0 and mandatory in XML 1.1. Therefore, any document without a declaration is assumed to be an XML 1.0 document. XML documents support the Unicode standard, and UTF-8 is a common way to implement that standard. A document is stand-alone if it does not rely on an external DTD file or other entities.

Only the version is required in an XML declaration. Most parsers can automatically determine if a document is UTF-8 or UTF-16, so specifying the encoding usually is unnecessary unless you are using some other format. By default, XML documents are not considered to be stand-alone, but it is not an error to omit external references from such a document.

My goal in this introductory section has been to provide the minimum information you need to follow the discussion in the rest of the chapter. As mentioned earlier, Chapter 14 describes XML namespaces and schemas. But this is all I’ll say about the basics. If you want more information, feel free to read any of the excellent books on XML that currently crowd bookstore shelves.

Understanding the LINQ to XML API

This section introduces the LINQ to XML API, which supplements the standard LINQ query operators with a set of XML-specific methods. Our exploration of LINQ to XML begins with samples of how to create, save, and read XML documents using the LINQ to XML API.

These opening sections focus on the objects shown in Figure 13.4. The XDocument, XElement, and XAttribute classes play central roles in this chapter. Other classes, such as XComment and XDeclaration, will be included in the discussion, but they have secondary importance. Most of your work with LINQ to XML will involve just a handful of classes, each of which has only a small number of important methods that you will use repeatedly. Many of the most important of those methods will be introduced during the discussion of querying XML data.

Figure 13.4. This hierarchy contains most of the important classes found in the LINQ to XML API.

image

Figure 13.4 does not show the complete LINQ to XML hierarchy of objects, because I have omitted classes that are not particularly important. So as not to leave gaps in the hierarchy, I’ve included in Figure 13.4 supporting classes such as XObject, XNode, and XText. You will rarely encounter these classes in your day-to-day programming work, but knowing of their existence can help inform your decision-making process.


The Role of Nodes

The hierarchy shown in Figure 13.4 correctly suggests that the term node is a general way of talking about virtually any entity found in an XML document. Comments and elements are both nodes. Even the content, or text, inside an element, such as the one shown in Figure 13.2, is considered to be a node in an XML document. As the hierarchy shown in Figure 13.4 suggests, LINQ does not regard attributes as nodes, although some developers may disagree.


Creating XML Elements

A “Hello World” program for the LINQ to XML API might look something like this:

using System.Xml.Linq;

var x = new XElement("Planet""Earth");
Console.WriteLine(x);

These few lines of code create the following simple XML element:

<Planet>Earth</Planet>

Note the presence of the System.Xml.Linq using directive. Unlike System.Linq, this directive is not automatically added to your new source files when you are working in Visual Studio. You must add it yourself.


Inserting using Directives

When working in Visual Studio, the simplest way to add a using directive to your program is to type in a member of a namespace not included in your using directives. In this case you might type XElement. Notice the red Smart Tag under the last letter of the word. This lets you know that Visual Studio thinks it knows a way to help you. Hold down the Ctrl key and press the period key. A window appears that allows you to automatically insert the appropriate using directive at the top of your current file.


You can access the name of an XML element through the Name property and access its content through the Value property. Consider the following code:

XElement element = new XElement("Planet""Earth");
Console.WriteLine(element.Name);
Console.WriteLine(element.Value);

This code writes the words Planet and Earth.

The constructor for the XElement class shown here allows you to pass in the name and content for a single XML element. Here is the complete list of overloads for the XElement constructor:

public XElement(XElement other);
public XElement(XName name);
public XElement(XStreamingElement other);
public XElement(XName name, object content);
public XElement(XName name, params object[] content);

We are currently using the fourth overload. This is probably the most commonly used overload. The fifth overload is also very important, but I will delay showing it to you until we reach the section “Creating an XML Declaration.”

Here are examples of using the first and second overloads:

var y = new XElement("Planets");
var z = new XElement(y);
Console.WriteLine(y);
Console.WriteLine(z);

The output from this code looks like this:

<Planets />
<Planets />

This syntax specifies that these elements do not have any value; they are empty.

Creating XML Attributes

LINQ to XML uses the XAttribute class to encapsulate the idea of an XML attribute. Here is how to create an XAttribute:

var xml = new XElement("Planet"new XAttribute("Id", 3));

If written to the console, this XElement produces the following code:

<Planet Id="3" />

XAttribute has only two constructors:

public XAttribute(XAttribute other);
public XAttribute(XName name, object value);

The second of these constructors is used in the previous example and in the majority of cases.

If you want to add two or more XAttributes to an XML element, you can write the following lines of code:

var xml = new XElement("Planet",
    new XAttribute("Id", 3),
    new XAttribute("ModelColor""blue"));

When this code is written to the console, the output from this simple statement looks like this:

<Planet Id="3" ModelColor="blue" />

Most of the important properties and methods of XElement and XAttribute are used primarily in the context of querying data, so I will show you how to use them in that section of this chapter. For now, it is important only that you understand how to use their constructors to create XML nodes.

Creating an XML Document

The XElement class is remarkably flexible, and it will fit your needs in many cases. However, a second class called XDocument is similar to XElement. There is no reason to create an XDocument unless you have a use for one. Typically, those uses would include a desire to explicitly access the Root element in your XML tree, or wanting to include an XML declaration in a document you are creating.

Here is code that creates a simple XML document:

var xml = new XDocument(new XElement("Planets",
             new XElement("Planet""Earth")));

You can print the output from this document to the console with the following line of code:

Console.WriteLine(xml);

The output looks like this:

<Planets>
  <Planet>Earth</Planet>
</Planets>

Creating an XML Declaration

An XML declaration is found on the first line of this simple XML document:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Planets>
  <Planet>Earth</Planet>
</Planets>

To add this node to your XML file, you must use an XDocument. XElement cannot handle declarations. As shown in Listing 13.2, LINQ to XML makes it easy for you to create and configure the various sections of an XML declaration. The code shown in Listing 13.2 includes an XML declaration, an XML comment, and an XML attribute. Listing 13.3 shows the simple XML file produced by this code.

Listing 13.2. Using a Single Statement to Create an XML Document That Includes a Declaration, Comment, Elements, and Attributes

var xml = new XDocument(new XDeclaration("1.0""utf-8""yes"),
    new XComment("The planets Venus and Earth"),
    new XElement("Planets",
        new XElement("Planet",
            new XElement("Name""Venus")),
        new XElement("Planet"new XAttribute("Id", 3),
            new XElement("Name""Earth"),
            new XElement("Moons",
                new XElement("Moon""Moon")))));

Console.WriteLine(xml.Declaration);
Console.WriteLine(xml);

Listing 13.3. The Output from Listing 13.2

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- The planets Venus and Earth -->
<Planets>
  <Planet>
    <Name>Venus</Name>
  </Planet>
  <Planet Id="3">
    <Name>Earth</Name>
    <Moons>
      <Moon>Moon</Moon>
    </Moons>
  </Planet>
</Planets>

A single nested statement, written in the declarative style, is used to create this XML document. If you indent your code properly, this kind of statement is easy to use, because it mirrors the structure of the document you want to create. Later in this chapter, I will show you how to create a similar document from a series of discrete statements. However, the declarative style shown in this example is preferred and is generally held up as one of the attractions of the LINQ to XML API.

Note that you need to use two WriteLine statements to display the output from the code in Listing 13.2 in its entirety. The first statement writes out the declaration, and the second writes out the body of the XML, including the comment:

Console.WriteLine(xml.Declaration);
Console.WriteLine(xml);

Five LINQ to XML classes are used in this example. You have already seen three of these classes: XDocument, XAttribute, and XElement. Two more classes are introduced in Listing 13.2:

• An XDeclaration is used to adorn our XML with some metadata that includes the version, the file encoding, and whether the document is stand-alone.

XComment creates an XML comment.

I can’t think of anything useful to say about the constructors for these simple classes other than they are easy to use and have obvious utilitarian value. Simply lift the code directly from Listing 13.2, and insert it into your own programs.

This excerpt from Listing 13.2 includes examples of how to use the fifth overload of the XElement constructor:

new XElement("Planet"new XAttribute("Id", 3),
       new XElement("Name""Earth"),
       new XElement("Moons",
       new XElement("Moon""Moon")))

Recall that this fifth overload of the XElement constructor looks like this:

public XElement(XName name, params object[] content);

This is a deceptively powerful line of code. The unusual type called params object[] allows you to pass an array of from 0 to n classes that derive from type object. This means, in effect, that you can pass an array of any type of object in this parameter. In particular, you can pass in a lengthy sequence of XAttribute and XElement constructors like those shown in this example. This is a form of compiler magic that enables LINQ to support the declarative style of programming.

Included with the programs that accompany this book is a sample called CreatePlanets. It shows you how to write a single declarative statement that generates a document listing all the planets and all the moons in our solar system. That sample includes the following constructor for the planet Jupiter, the body of which is nested inside a much larger declaration for all the planets and their moons:

public const string planet = "Planet";
public const string moon = "Moon";
public const string moons = "Moons";

... Code omitted here ...

new XElement(planet,
   new XElement(name, "Jupiter"),
   new XElement(moons,
      new XElement(moon, "Io"),
      new XElement(moon, "Europa"),
      new XElement(moon, "Ganymede"),
      new XElement(moon, "Callisto"),
      new XElement(moon, "Leda"),
      new XElement(moon, "Himalia"),
      new XElement(moon, "Lysithea"),
      new XElement(moon, "Elara"),
      new XElement(moon, "Ananke"),
      new XElement(moon, "Carme"),
      new XElement(moon, "Pasiphae"),
      new XElement(moon, "Sinope"),
      new XElement(moon, "Metis"),
      new XElement(moon, "Adrastea"),
      new XElement(moon, "Amalthea"),
      new XElement(moon, "Thebe"))),

The fifth overload of the XElement constructor accepts this code with nary a blink.

Designing and implementing code like this clearly requires an advanced degree in compiler magic. Nevertheless, the code itself is easy to use. This is declarative code at its best, allowing us to write a constructor that closely mirrors the shape of the complex XML documents that many developers frequently create.

Creating a Document from Raw Text

Here is an alternative means of creating an XML document:

string str = @"<?xml version=""1.0"" encoding=""utf-8""
  standalone=""yes""?>
    <!--The first three planets-->
  <Planets>
    <Planet>Mercury</Planet>
    <Planet>Venus</Planet>
    <Planet Moon=""Moon"">Earth</Planet>
  </Planets>";

XDocument doc = XDocument.Parse(str);

As you can see, the Parse method of the XDocument class allows you to pass in raw XML directly as a string literal. Sometimes this is the fastest and easiest way to create an XML document in your code.

Building a Document One Node at a Time

Although it is usually simplest to create an XML document with a single statement in the declarative style, it is possible to take other approaches. Listing 13.4 shows how to build a document one node at a time with a series of Add statements. See the program that accompanies this book called GettingStartedWithLinqToXml.

Listing 13.4. Creating an XML Document One Node at a Time Using Add Statements

public void BuildDocument()
{
     var xml = new XDocument();

     xml.Add(new XComment("Some of the Solar System"));
     xml.Add(new XElement("Sun"));
     XElement temp = new XElement("Planet");
     temp.Add(new XAttribute("Name""Earth"));
     xml.Root.Add(temp);
     temp = new XElement("Planet");
     temp.Add(new XAttribute("Name""Mars"));
     temp.Add(new XElement("Moon""Phobos"));
     temp.Add(new XElement("Moon""Deimos"));
     xml.Root.Add(temp);

    Console.WriteLine(xml);
}

The Add method shown here is found in both the XDocument and XElement classes. I recommend using this technique primarily when you need to edit an existing document. An XML document is a single, heavily nested hierarchy of nodes, but the code shown in Listing 13.4 gives the impression that the document consists of separable, discreet pieces. As a result, many developers prefer to use the declarative style shown in Listing 13.2. Note also that the code shown in Listing 13.4 gives you no sense of the shape of the document you are creating. The failure of this imperative code to give you a sense of the shape of the document highlights one of the virtues of declarative code.


Declarative Versus Imperative Revisited

It is my belief that declarative code is better than imperative code when it is used at the right time and place. Both methods of programming have advantages, and it is important to learn how to get the best from both styles. It just happens that the declarative style lends itself well to the act of creating XML documents, just as it suits the act of querying data. This doesn’t mean that it is the best tool to use in all cases, however.


Reading and Writing XML

Listing 13.5 shows how to create an XML document and then save it to disk.

Listing 13.5. Saving a File to Disk

var xml = new XDocument(new XDeclaration("1.0""utf-8""yes"),
    new XComment("The planet earth"),
    new XElement("Planets",
        new XElement("Planet",
            new XElement("Name""Venus")),
        new XElement("Planet"new XAttribute("Id", 3),
            new XElement("Name""Earth"),
            new XElement("Moons",
            new XElement("Moon""Moon")))));

xml.Save("Planets.xml");

Figure 13.5 shows the document created by Listing 13.4. It appears as it would if you typed it from the command prompt. Note the small set of unreadable characters at the start of the second line. This is the UTF-8 header. The header becomes visible at the command prompt, but it usually is not shown in most editors.

Figure 13.5. The UTF-8 document created by the code shown in Listing 13.4.

image

This is not a reference book, so I won’t discuss each part in depth, but here are the overloads for the XElement and XDocument Save method:

public void Save(string fileName);
public void Save(TextWriter textWriter);
public void Save(XmlWriter writer);
public void Save(string fileName, SaveOptions options);
public void Save(TextWriter textWriter, SaveOptions options);

XDocument.Save saves declarations and similar information that appear before the root node, but XElement does not. The SaveOptions enumeration allows you to decide how to treat white space.

Both XDocument and XElement provide a Load method:

var xml = XDocument.Load(fileName);
var xml = XElement.Load(fileName);

XElement does not load information such as a declaration that appears before the root node. XDocument reads in that kind of information.

As mentioned, if you load the document this way and then try to write it to the console, you will discover that the default ToString() method for the XDocument class does not write out the XML declaration. If you want to see the entire XML document, you need to write two lines of code:

Console.WriteLine(xml.Declaration);
Console.WriteLine(xml);

Alternatively, you can use the File object to read the text back in so that you can see how it appears on disk:

string data = File.ReadAllText(tempxml);
Console.WriteLine(data);

The Load method has six overloads. LoadOptions allows you to preserve white space and capture line number information:

public static XDocument Load(string uri);
public static XDocument Load(TextReader textReader);
public static XDocument Load(XmlReader reader);
public static XDocument Load(string uri, LoadOptions options);
public static XDocument Load(TextReader textReader, LoadOptions options);
public static XDocument Load(XmlReader reader, LoadOptions options);

For example, here is how to load an RSS feed from the Internet into an XDocument:

XDocument xml = XDocument.Load(@"http://blogs.msdn.com/charlie/rss.xml");
Console.WriteLine(xml.Declaration);
Console.WriteLine(xml.FirstNode);
Console.WriteLine(xml);

If this were a call to XElement instead of XDocument, the attempt to write out the Declaration would be a compile-time error, and the call to write out the FirstNode would dump the entire document, minus the declaration and other header information. As it is, the first two WriteLine statements print the following:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
   href=http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl
   media="screen"?>

You must use XDocument if you want the declaration, the doctype, and related information. If you have no need for that information, call XElement.Load.

Summary

This chapter began with a brief overview of key features of the XML standard. With the preliminaries out the way, the text moved on to explain how LINQ to XML provides the tools you need to create, read, and write XML documents.

All the code shown in this chapter is also found on the book’s web site. If you haven’t done so already, download these programs and run them. There is nothing like working with live code to increase your understanding of a subject.

In the next chapter, you will learn how to query an XML document and how to edit an existing XML document. The final chapter on LINQ to XML covers XML namespaces, transformations, and schemas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset