What Is an Entity?

An entity is an expression of the physical, rather than logical, structure of an XML document. An entity is a physical data object. When your XML documents are short and simple, you likely will seldom use entities other than the built-in entities. As you begin to create XML documents of greater length and complexity, the usefulness of entities will become more apparent.

One situation in which entities are sometimes used in relatively short documents is in Scalable Vector Graphics (SVG), an XML application language that you will meet in Chapter 15, “Presenting XML Graphically—SVG.” In SVG, for example, an entity can be used to define a particular style. If you had an entity called BlackAndRed, you could declare it like this:

<!ENTITY BlackAndRed "fill:black;stroke:red"> 

Then you could reuse the entity many times in attribute values in the document, like this:

<rect style="&BlackAndRed;" .... /> 

Even when creating the simplest XML documents, you are using at least one entity, although you might not be aware of it. Each XML document has at least one physical entity: the document entity.

An XML document can be viewed as being contained within the document entity. The document entity is not expressed within the syntax of an XML document; instead, it is the container for the syntax that makes up the document.

Note

Most XML entities have a name, which is used to reference the entity. The exceptions are the document entity and the external subset of the DTD; these have no name, although both have filenames.



For example, the description of this book used in earlier examples has one logical structure but could be expressed by either of the physical structures shown in the following examples. The simplest expression of the logical structure exists in a single document entity with the description contained in one XML file with the following content, as shown in Listing 5.1.

Listing 5.1. SingleEntity.xml: A Description of This Book in a Single Document Entity
<?xml version="1.0" ?> 
<book> 
<title>Sams Teach Yourself XML in 10 Minutes</title> 
<author>Andrew Watt </author> 
<publisher>Sams Publishing</publisher> 
</book> 

Alternatively, the same logical structure could be expressed in several different physical structures. One possibility is to use an external parsed entity to express title information, as in Listing 5.2. For a document as simple as this, there is little practical point in splitting it this way, but the example serves to illustrate the principle.

Listing 5.2. SplitEntities.xml: A Description of the Book with an External Parsed Entity
<?xml version="1.0" ?> 
<!DOCTYPE book [ 
<!ENTITY bookTitle SYSTEM "title.xml"> 
]> 
<book> 
<title>&bookTitle;</title> 
<author>Andrew Watt</author> 
<publisher>Sams Publishing</publisher> 
</book> 

The file title.xml specified in the entity declaration is shown in Listing 5.3. In a typical external parsed entity in real-life use, the content would be much more extensive.

Listing 5.3. Title.xml: A Brief External Entity Referenced in Listing 5.2
Sams Teach Yourself XML in 10 Minutes 

In Listing 5.2 the entity reference &bookTitle; is used by the XML processor together with the corresponding entity declaration to find the file Title.xml and to insert the content of that file between the start tag and end tag of the title element in Listing 5.2:

<!ENTITY bookTitle SYSTEM "title.xml"> 

So, after the external parsed entity has been retrieved, the title element is processed as if it read as follows:

<title>Sams Teach Yourself XML in 10 Minutes</title> 

Caution

If the parsed entity is defined in the external subset of the DTD, some nonvalidating XML parsers might not retrieve external entity declarations.



One use of external entities is to centralize frequently referenced information used by multiple files. In lengthy, complex XML documents, it can be very useful to split documents into entities. For example, a change made in an external parsed entity can be reflected at each place where the entity reference occurs in the XML document and other documents that reference the same external parsed entity.

You might structure financial results using separate XML files for each quarter’s figures. For example, the sales figures for Quarter 1 2003 might be represented as follows:

<Q12003> 
 <Total Sales>$74,300,000</TotalSales> 
 <GrossProfit>$2,900,000</GrossProfit> 
 <NetProfit>$1,500,000</NetProfit> 
</Q12003> 

If you stored that content in a file named Q12003.xml, you could reference that data in several places after declaring an entity:

<!ENTITY Q12003Sales system "Q12003.xml"> 

It makes sense to store the data once rather than risk it being stored in several places with inconsistent data. If that data was referenced several times, such as in department reports and company reports, it would make sense to store it once and then reference it each time it is used.

Entities and Entity References

An entity is a data object. An entity reference refers to a parsed entity or parameter entity. The entity referenced may be either a parsed entity or a parameter entity. The syntax for referencing these two types of entities differs.

Note

A parsed entity can be internal—declared in the document entity—or external—contained in a file (entity) physically separate from the document entity. A parameter entity is declared and referenced within the DTD, in either the internal or the external subset.



Parsed entities are referenced by an initial & character followed immediately by the entity’s name and a semicolon. If you had declared an internal parsed entity called myEntity

<!ENTITY myEntity "This is my own entity"> 

you would reference it as follows:

&myEntity; 

You can, of course, choose any name that makes sense in your context.

Parameter entities, which are used only in the DTD, use a different syntax, both in declaration

<!ENTITY % myParameterEntity "Class"> 

and in references to them:

%myParameterEntity; 

Unparsed entities are referenced by names contained in attribute values declared to be of type ENTITY or ENTITIES.

Predefined Entities

XML processors recognize a number of entity references as referring to five characters that have special meaning when used in XML documents. This means that a character being used literally in content can be distinguished from its use as part of markup. The five entity references and the characters that they represent are listed here:

  • amp— Represents the ampersand character (&) in parsed character data

  • apos— Represents the apostrophe (') in parsed character data

  • gt— Represents the right angle bracket (>) in parsed character data

  • lt— Represents the left angle bracket (<) in parsed character data

  • quot— Represents a single double quotation mark (") in parsed character data

Let’s consider parsed entities and parameter entities in more detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset