XML gives you considerable power to choose your own element types and invent your own grammars to create custom-made markup languages. But this flexibility can be dangerous for XML parsers if they don't have some minimal rules to protect them. A parser dedicated to a single markup language such as an HTML browser can accept some sloppiness in markup, because the set of tags is small and there isn't much complexity in a web page. Since XML processors have to be prepared for any kind of markup language, a set of ground rules is necessary.
These rules are very simple syntax constraints. All tags must use the proper delimiters; an end tag must follow a start tag; elements can't overlap; and so on. Documents that satisfy these rules are said to be well-formed. Some of these rules are listed here.
The first rule is that an element containing text or elements must have start and end tags.
Good | Bad |
---|---|
<list> <listitem>soupcan</listitem> <listitem>alligator</listitem> <listitem>tree</listitem> </list> | <list> <listitem>soupcan <listitem>alligator <listitem>tree </list> |
An empty element's tag must have a slash (/) before the end bracket.
Good | Bad |
---|---|
<graphic filename="icon.png"/> | <graphic filename="icon.png"> |
All attribute values must be in quotes.
Good | Bad |
---|---|
<figure filename="icon.png"/> | <figure filename=icon.png/> |
Elements may not overlap.
Good | Bad |
---|---|
<a>A good <b>nesting</b> example.</a> | <a>This is <b>a poor</a> nesting scheme.</b> |
Isolated markup characters may not appear in parsed content. These include <, ]]>, and &.
Good | Bad |
---|---|
<equation>5 < 2</equation> | <equation>5 < 2</equation> |
A final rule stipulates that element names may start only with letters and underscores, and may contain only letters, numbers, hyphens, periods, and underscores. Colons are allowed for namespaces.
Good | Bad |
---|---|
<example-one> <_example2> <Example.Three> | <bad*characters> <illegal space> <99number-start> |