CDATA Sections

An XML document may contain information expressed in a non-XML syntax. The XML mechanism for indicating that such content is not to be parsed as XML is the CDATA section.

The starting delimiter of a CDATA section is the character sequence <![CDATA[. The ending delimiter is the character sequence ]]>. The character sequence ]]> cannot be used as the content of a CDATA section.

CDATA sections can be used to store XML code, such as code snippets used in this book, without having to escape all characters that an XML processor would recognize as markup. For example, if the text of this chapter were written and stored in XML, you would not want example code to be parsed. Thus, if you wanted to create in XML the text for a section of this book that referred to example text expressed as XML, you could write something like this:

<example> 
<![CDATA[ 
<book> 
<title>Sams Teach Yourself XML in 10 Minutes</title> 
<author>Andrew Watt</author> 
</book> 
]]> 
</example> 

If you didn’t use a CDATA section, you would have to write this:

<example> 
&lt;book&gt; 
&lt;title&gt;Sams Teach Yourself XML in 10 
  Minutes&lt;/title&gt; 
&lt;author&gt;Andrew Watt&lt;/author&gt; 
&lt;/book&gt; 
</example> 

It very quickly becomes tedious having to write &lt; for each < character and &gt; for each > character in a section of example code. The CDATA section is more convenient.

One common use of CDATA sections appears in Scalable Vector Graphics (SVG) documents (SVG is an XML application language for two-dimensional graphics), which can contain scripting code written, for example, in JavaScript. The general structure using the SVG script element would look as follows:

<script type="text/javascript" > 
<![CDATA[ 

//JavaScript code goes here 
]]> 
</script> 

CDATA sections cannot be nested within each other because the starting delimiter <![CDATA[ is recognized only as a sequence of characters, not as a starting delimiter of a nested CDATA section. Only the ending delimiter character sequence, ]]>, is recognized as markup within a CDATA section.

Text Content

Text, which is basically a sequence of characters, may occur between the start tag and end tag of an element; it is said to be (or to form part of) the element’s content.

Most English alphabetic or numeric characters can simply be typed as normal. Certain characters must not be used in text content, however. The following simple description of an arithmetic axiom in XML generates an error in an XML processor:

<axiom> 1 < 2 </axiom> 

An XML processor recognizes the less than sign between 1 and 2 as the starting angle bracket of a new tag. An error results upon finding a space and then a number (which is not allowed to start an XML name). The following characters must be escaped to use them in text content:

  • <(The less than symbol)— Must be written as &lt;

  • > (The greater than symbol)— Must be written as &gt;

  • ' (The single quotation mark)— Must be written as &apos;

  • " (The double quotation mark)— Must be written as &quot;

  • & (The ampersand)— Must be written as &amp;

The alternative is to use these characters written literally (that is, not escaped) within a CDATA section. The choice of whether to escape charac-ters or to enclose them inside a CDATA section often depends on how many characters in a particular section of text require escaping. The more characters need escaping, the more likely it is that using a CDATA section offers the most convenient solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset