Chapter 19
XML and JSON

Wrox.com Code Downloads for this Chapter

You can find the wrox.com code downloads for this chapter at www.wrox.com/go/beginningvisualc#2015programming on the Download Code tab. The code is in the Chapter 19 download and individually named according to the names throughout the chapter.

Just as programming languages like C# describe computer logic in a format that is readable by both machines and humans, XML and JSON are both data languages, which are used storing data in a simple text format that can be read by both humans and nearly any computer.

Most C# .NET applications use XML in some form for storing data, such as .config files for storing configuration details and XAML files used in WPF and Windows Store applications. Because of this important fact, we'll spend the most time in this chapter on XML, with just a short look at JSON on the side.

During this chapter you will learn the basics of XML and JSON and then learn how to create XML documents and schemas. You will learn the basics of the XmlDocument class, how to read and write XML, how to insert and delete nodes, how to convert XML to JSON format, and finally how to search for data in XML documents using XPath.

XML Basics

Extensible Markup Language (XML) is a data language, which is a way of storing data in a simple text format that can be read by both humans and nearly any computer. It is a W3C standard format like HTML (www.w3.org/XML). It has been fully adopted by Microsoft in the .NET Framework and other Microsoft products. Even the document formats introduced with the newer versions of Microsoft Office are based on XML, although the Office applications themselves are not .NET applications.

The ins and outs of XML can be very complicated, so you won't look at every single detail here. Luckily, most tasks don't require a detailed knowledge of XML because Visual Studio typically takes care of most of the work — you will rarely have to write an XML document by hand. If you want to learn about XML in more depth, read a book such as Beginning XML by Joe Fawcett, Danny Ayers, and Liam Quin (Wrox, 2012) or one of the many online tutorials such as www.xmlnews.org/docs/xml-basics.html or http://www.w3schools.com/xml/.

The basic format is very simple, as you can see in the following example that shows an XML format for sharing data about books.

  <book>
    <title>Beginning Visual C# 2015</title>
    <author>Benjamin Perkins et al</author>
    <code>096689</code>
  </book>

In this example each book has a title, an author, and a unique code identifying the book. Each book's data is contained in a book element beginning with a <book> tag and ending with the </book> end tag. The title, author, and code values are stored in nested elements inside the book element.

Optionally, an element may also have attributes inside the tag itself. If the book code were an attribute of the book element instead of its own element, you'd see the book element beginning with something like this: <book code=096689>. To keep it simple we'll stick with elements in this chapter's examples. Generically both attributes and elements are called nodes, like the nodes of a graph.

JSON Basics

Another data language you may encounter when developing C# applications is JSON. JSON stands for JavaScript Object Notation. Like XML, it is also a standard (www.json.org), though as you can tell from the name it is derived from the JavaScript language rather than C#. While not used throughout .NET like XML, it is a common format for transferring data from web services and web browsers.

JSON also has a very simple format. The same book data we showed previously in XML is presented here in JSON:

{"book":[{"title":"Beginning Visual C# 2015",
          "author":"Benamin Perkins et al",
          "code":"096689"}]

As with the previous XML example, we see the same book with title, author, and a unique code. JSON uses curly braces ({}) to delimit blocks of data and square brackets ([]) to delimit arrays similar to the way C#, JavaScript, and other C-like languages use curly braces for blocks of code and square brackets for arrays.

JSON is a more compact format than XML, but it is much harder for humans to read, especially as the curly braces and brackets become deeply nested in complex data.

XML Schemas

An XML document may be described by a schema, which is another XML file describing what elements and attributes are allowed in a particular document. You can validate an XML document against a schema, ensuring that your program doesn't encounter data it isn't prepared to handle. The standard schema XML format used with C# is XSD (for XML Schema Definition).

Figure 19.1 includes a long list of schemas recognized by Visual Studio, but it will not automatically remember schemas you've used. If you are using a schema repeatedly and don't want to browse for it every time you need it, you can copy it to the following location: C:Program FilesMicrosoft Visual Studio 14.0XmlSchemas. Any schema copied to that location will show up on the XML Schemas dialog box.

Image described by surrounding text.

Figure 19.1

XML Document Object Model

The XML Document Object Model (XML DOM) is a set of classes used to access and manipulate XML in a very intuitive way. The DOM is perhaps not the quickest way to read XML data, but as soon as you understand the relationship between the classes and the elements of an XML document, you will find it very easy to use.

The classes that make up the DOM can be found in the namespace System.Xml. There are several classes and namespaces in this namespace, but this chapter focuses on only a few of the classes that enable you to easily manipulate XML. These classes are described in Table 19.1.

Table 19.1 Common DOM Classes

Class Description
XmlNode Represents a single node in a document tree. It is the base of many of the classes shown in this chapter. If this node represents the root of an XML document, you can navigate to any position in the document from it.
XmlDocument Extends the XmlNode class, but is often the first object you use when using XML. That's because this class is used to load and save data from disk or elsewhere.
XmlElement Represents a single element in the XML document. XmlElement is derived from XmlLinkedNode, which in turn is derived from XmlNode.
XmlAttribute Represents a single attribute. Like the XmlDocument class, it is derived from the XmlNode class.
XmlText Represents the text between a starting tag and a closing tag.
XmlComment Represents a special kind of node that is not regarded as part of the document other than to provide information to the reader about parts of the document.
XmlNode List Represents a collection of nodes.

The XmlDocument Class

Usually, the first thing your application will want to do with XML is read it from disk. As described in Table 19.1, this is the domain of the XmlDocument class. You can think of the XmlDocument as an in-memory representation of the file on disk. Once you have used the XmlDocument class to load a file into memory, you can obtain the root node of the document from it and start reading and manipulating the XML:

using System.Xml;
.
.
.
XmlDocument document = new XmlDocument();
document.Load(@"C:BegVCSharpChapter19XML and Schemaooks.xml");

The two lines of code create a new instance of the XmlDocument class and load the file books.xml into it.

Remember that the XmlDocument class is located in the System.Xml namespace, and you should insert a using System.Xml; in the using section at the beginning of the code.

In addition to loading and saving the XML, the XmlDocument class is also responsible for maintaining the XML structure itself. Therefore, you will find numerous methods on this class that are used to create, alter, and delete nodes in the tree. You will look at some of those methods shortly, but to present the methods properly, you need to know a bit more about another class: XmlElement.

The XmlElement Class

Now that the document has been loaded into memory, you want to do something with it. The DocumentElement property of the XmlDocument instance you created in the preceding code returns an instance of an XmlElement that represents the root element of the XmlDocument. This element is important because it gives you access to every bit of information in the document:

XmlDocument document = new XmlDocument();
document.Load(@"C:BegVCSharpChapter19
XML and Schemaooks.xml");
XmlElement element = document.DocumentElement;

After you have the root element of the document, you are ready to use the information. The XmlElement class contains methods and properties for manipulating the nodes and attributes of the tree. Let's examine the properties for navigating the XML elements first, shown in Table 19.2.

Table 19.2 XmlElement Properties

Property Description
FirstChild Returns the first child element after this one. If you recall the books.xml file from earlier in the chapter, the root node of the document was called “books” and the next node after that was “book.” In that document, then, the first child of the root node “books” is “book.”
<books> Root node
<book> FirstChild
FirstChild returns an XmlNode object, and you should test for the type of the returned node because it is unlikely to always be an XmlElement instance. In the books example, the child of the Title element is, in fact, an XmlText node that represents the text Beginning Visual C#.
LastChild Operates exactly like the FirstChild property except that it returns the last child of the current node. In the case of the books example, the last child of the “books” node will still be a “book” node, but it will be the node representing the “Beginning XML” book.
<books> Root node
<book> FirstChild
<title>Beginning Visual C# 2015</title>
<author>Benjamin Perkins et al</author>
<code>096689</code>
</book>
<book> LastChild
<title>Beginning XML</title>
<author>Joe Fawcett et al</author>
<code>162132</code>
</book>
</books>
ParentNode Returns the parent of the current node. In the books example, the “books” node is the parent of both of the “book” nodes.
NextSibling Where FirstChild and LastChild properties return the leaf node of the current node, the NextSibling node returns the next node that has the same parent node. In the case of the books example, that means getting the NextSibling of the title element will return the author element, and calling NextSibling on that will return the code element.
HasChildNodes Enables you to check whether the current element has child elements without actually getting the value from FirstChild and examining that against null.

Using the five properties from Table 19.2, it is possible to run through an entire XmlDocument, as shown in the following Try It Out.

Changing the Values of Nodes

Before you examine how to change the value of a node, it is important to realize that very rarely is the value of a node a simple thing. In fact, you will find that although all of the classes that derive from XmlNode include a property called Value, it very rarely returns anything useful to you. Although this can feel like a bit of a letdown at first, you'll find it is actually quite logical. Examine the books example from earlier:

<books>
  <book>
    <title>Beginning Visual C# 2015</title>
    <author>Benjamin Perkins et al</author>
    <code>096689</code>
  </book>
  <book>
</books>

Every single tag pair in the document resolves into a node in the DOM. Remember that when you looped through all the nodes in the document, you encountered a number of XmlElement nodes and three XmlText nodes. The XmlElement nodes in this XML are <books>, <book>, <title>, <author>, and <code>. The XmlText nodes are the text between the starting and closing tags of title, author, and code. Although it could be argued that the value of title, author, and code is the text between the tags, that text is itself a node; and it is that node that actually holds the value. The other tags clearly have no value associated with them other than other nodes.

The following line is in the if block near the top of the code in the earlier FormatText method. It executes when the current node is an XmlText node.

text += node.Value;

You can see that the Value property of the XmlText node instance is used to get the value of the node.

Nodes of the type XmlElement return null if you use their Value property, but it is possible to get the information between the starting and closing tags of an XmlElement if you use one of two other methods: InnerText and InnerXml. That means you are able to manipulate the value of nodes using two methods and a property, as described in Table 19.3.

Table 19.3 Three Ways to Get the Value of a Node

Property Description
InnerText Gets the text of all the child nodes of the current node and returns it as a single concatenated string. This means if you get the value of InnerText from the book node in the preceding XML, the string Beginning Visual C# 2015#Benjamin Perkins et al096689 is returned. If you get the InnerText of the title node, only "Beginning Visual C# 2015" is returned. You can set the text using this method, but be careful if you do so because if you set the text of a wrong node you may overwrite information you did not want to change.
InnerXml Returns the text like InnerText, but it also returns all of the tags. Therefore, if you get the value of InnerXml on the book node, the result is the following string:
<title>Beginning Visual C# 2015</title><author>Benjamin Perkins et al
</author><code>096689</code>
As you can see, this can be quite useful if you have a string containing XML that you want to inject directly into your XML document. However, you are entirely responsible for the string yourself, and if you insert badly formed XML, the application will generate an exception.
Value The “cleanest” way to manipulate information in the document, but as mentioned earlier, only a few of the classes actually return anything useful when you get the value. The classes that will return the desired text are as follows:
XmlText
XmlComment
XmlAttribute

Inserting New Nodes

Now that you've seen that you can move around in the XML document and even get the values of the elements, let's examine how to change the structure of the document by adding nodes to the books document you've been using.

To insert new elements in the list, you need to examine the new methods that are placed on the XmlDocument and XmlNode classes, shown in Table 19.4. The XmlDocument class has methods that enable you to create new XmlNode and XmlElement instances, which is nice because both of these classes have only a protected constructor, which means you cannot create an instance of either directly with new.

Table 19.4 Methods for Creating Nodes

Method Description
CreateNode Creates any kind of node. There are three overloads of the method, two of which enable you to create nodes of the type found in the XmlNodeType enumeration and one that enables you to specify the type of node to use as a string. Unless you are quite sure about specifying a node type other than those in the enumeration, use the two overloads that use the enumeration. The method returns an instance of XmlNode that can then be cast to the appropriate type explicitly.
CreateElement A version of CreateNode that creates only nodes of the XmlElement variety.
CreateAttribute A version of CreateNode that creates only nodes of the XmlAttribute variety.
CreateTextNode Creates — yes, you guessed it — nodes of the type XmlTextNode.
CreateComment This method is included here to highlight the diversity of node types that can be created. This method doesn't create a node that is actually part of the data represented by the XML document, but rather is a comment meant for any human eyes that might have to read the data. You can pick up comments when reading the document in your applications as well.

The methods in Table 19.4 are all used to create the nodes themselves, but after calling any of them you have to do something with them before they become interesting. Immediately after creation, the nodes contain no additional information, and they are not yet inserted into the document. To do either, you should use methods that are found on any class derived from XmlNode (including XmlDocument and XmlElement), described in Table 19.5.

Table 19.5 Methods for Inserting Nodes

Method Description
AppendChild Appends a child node to a node of type XmlNode or a derived type. Remember that the node you append appears at the bottom of the list of children of the node on which the method is called. If you don't care about the order of the children, there's no problem; if you do care, remember to append the nodes in the correct sequence.
InsertAfter Controls exactly where you want to insert the new node. The method takes two parameters — the first is the new node and the second is the node after which the new node should be inserted.
InsertBefore Works exactly like InsertAfter, except that the new node is inserted before the node you supply as a reference.

In the following Try It Out, you build on the previous example and insert a book node in the books.xml document. There is no code in the example to clean up the document (yet), so if you run it several times you will probably end up with a lot of identical nodes.

Deleting Nodes

Now that you've seen how to create new nodes, all that is left is to learn how to delete them again. All classes derived from XmlNode include two methods, shown in Table 19.6, that enable you to remove nodes from the document.

Table 19.6 Methods for Removing Nodes

Method Description
RemoveAll Removes all child nodes in the node on which it is called. What is slightly less obvious is that it also removes all attributes on the node because they are regarded as child nodes as well.
RemoveChild Removes a single child in the node on which it is called. The method returns the node that has been removed from the document, but you can reinsert it if you change your mind.

The following short Try It Out extends the application you've been creating over the past two examples to include the capability to delete nodes. For now, it finds only the last instance of the book node and removes it.

Selecting Nodes

You now know how to move back and forth in an XML document, how to manipulate the values of the document, how to create new nodes, and how to delete them again. Only one thing remains in this section: how to select nodes without having to traverse the entire tree.

The XmlNode class includes two methods, described in Table 19.7, commonly used to select nodes from the document without running through every node in it: SelectSingleNode and SelectNodes, both of which use a special query language, called XPath, to select the nodes. You learn about that shortly.

Table 19.7 Methods for Selecting Nodes

Method Description
SelectSingleNode Selects a single node. If you create a query that fetches more than one node, only the first node will be returned.
SelectNodes Returns a node collection in the form of an XmlNodeList class.

Converting XML to JSON

We mentioned the JSON data language in the introduction to this chapter. There is limited support for JSON in the C# system libraries, but you can use a free third-party JSON library to work with JSON to convert XML to JSON and vice versa, and to do other manipulations with JSON similar to the .NET classes for XML. One such library available via the NuGet Package Manager in Visual Studio is the Newtonsoft JSON.NET package. Help and a full tutorial for this package are available at www.json.net.

The following short Try It Out extends the application you've been creating over the previous examples in the chapter to include the capability to convert XML to JSON.

Searching XML with XPath

XPath is a query language for XML documents, much as SQL is for relational databases. It is used by the two methods described in Table 19.7 that enable you to avoid the hassle of walking the entire tree of an XML document. It does take a little getting used to, however, because the syntax is nothing like SQL or C#.

To properly see XPath in action, you are going to use an XML file called Elements.xml, which contains a partial list of the chemical elements of the periodic table. You will find a subset of that XML listed in the “Selecting Nodes” Try It Out example later in the chapter, and it can be found in the download code for this chapter on this book's website as Elements.xml.

Table 19.8 lists some of the most common operations you can perform with XPath. If nothing else is stated, the XPath query example makes a selection that is relative to the node on which it is performed. Where it is necessary to have a node name, you can assume the current node is the <element> node in the XML document.

Table 19.8 Common XPath Operations

Purpose XPath Query Example
Select the current node. .
Select the parent of the current node. ..
Select all child nodes of the current node. *
Select all child nodes with a specific name — in this case, title. Title
Select an attribute of the current node. @Type
Select all attributes of the current node. @*
Select a child node by index — in this case, the second element node. element[2]
Select all the text nodes of the current node. text()
Select one or more grandchildren of the current node. element/text()
Select all nodes in the document with a particular name — in this case, all mass nodes. //mass
Select all nodes in the document with a particular name and a particular parent name — in this case, the parent name is element and the node name is name. //element/name
Select a node where a value criterion is met — in this case, the element for which the name of the element is Hydrogen. //element[name='Hydrogen']
Select a node where an attribute value criterion is met — in this case, the Type attribute is Noble Gas. //element[@Type='Noble Gas']

In the following Try It Out, you'll create a small application that enables you to execute and see the results of a number of predefined queries, as well as enter your own queries.


EXERCISES

  1. 19.1 Change the Insert example in the “Creating Nodes” Try It Out section to insert an attribute called Pages with the value 1000+ on the book node.

  2. 19.2 Determine the outcome of the following XPath queries and then verify your results by typing the queries into the XPathQuery application from the “Selecting Nodes” Try It Out. Remember that all of your queries are being executed on the DocumentElement, which is the elements node.

    //elements
    element
    element[@Type='Noble Gas']
    //mass
    //mass/..
    element/specification[mass='20.1797']
    element/name[text()='Neon']
    Solution:
  3. 19.3 On many Windows systems the default viewer of XML is a web browser. If you are using Internet Explorer you will see a nicely formatted view of the XML when you load the Elements.xml file into it. Why would it not be ideal to display the XML from our queries in a browser control instead of a text box?

  4. 19.4 Use the Newtonsoft library to convert JSON to XML button as well (the reverse of the example shown in the chapter).

    Answers to the exercises can be found in Appendix A.


image What You Learned in This Chapter

Topic Key Concepts
XML basics XML documents are created from an XML declaration, XML namespaces, XML elements, and attributes. The XML declaration defines the XML version. XML namespaces are used to define vocabularies and XML elements and attributes are used to define the XML document content.
JSON basics JSON is a data language used when transferring JavaScript and web services. JSON is more compact than the XML but harder to read.
XML schema XML schemas are used to define the structure of XML documents. Schemas are especially useful when you need to exchange information with third parties. By agreeing on a schema for the data that is exchanged, you and the third party will be able to check that the documents are valid.
XML DOM The Document Object Model (XML DOM) is the basis for .NET Framework classes provided for creating and manipulating XML.
JSON packages You can use a JSON package such as Newtonsoft to convert XML to JSON and vice versa, and do other manipulations with JSON similar to the .NET classes for XML.
XPath XPath is one of the possible ways to query data in XML documents. To use XPath, you must be familiar with the structure of the XML document in order to be able to select individual elements from it. Although XPath can be used on any well-formed XML document, the fact that you must know the structure of the document when you create the query means that ensuring that the document is valid also ensures that the query will work from document to document, as long as the documents are valid against the same schema.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset