This chapter describes the functions and constructors that act on namespace-qualified names, Uniform Resource Identifiers (URIs), and IDs. Each of these types has unique properties and complexities that sets it apart from simple strings.
The type xs:QName
is used to represent qualified names in XQuery. An xs:QName
value has three parts: a namespace, a local part, and an associated prefix. The namespace and the prefix are optional. If a QName does not have a namespace associated with it, it is considered to be in “no namespace.”
A prefix may be used to represent a namespace in a qualified name, for example, in an XML document. The prefix is bound to a namespace by using a namespace declaration. The prefix itself has no meaning; it is just a placeholder. Two QNames that have the same local part and namespace are equivalent, regardless of prefix. However, the XQuery processor does keep track of a QName’s prefix. This simplifies certain processes like serializing QNames and casting them to strings.
Most query writers who are working with qualified names are working with the names of elements and attributes. (It is also possible for a qualified name to appear as element content or as an attribute value, but this is less common.) You may want to retrieve all or part of a name if, for example, you want to test to see if it is a particular value, or you want to include the name in the query results. You may want to construct a qualified name for a node if you are dynamically creating the name of a node, using a computed element constructor. These two use cases are discussed in this section.
Four functions retrieve node names or parts of node names: node-name
, name
, local-name
, and namespace-uri
from element and attribute nodes. They are summarized in Table 21-1.
Function name | Description |
---|---|
node-name | The qualified name of the node as an xs:QName |
name | The qualified name of the node as an xs:string that may be prefixed |
local-name | The local part of the node name as an xs:string |
namespace-uri | The namespace part of a node name (a full namespace name, not a prefix) as an xs:anyURI |
Each of these functions takes as an argument a single (optional) node. Table 21-2 shows examples of all four functions. They use the input document names.xml, shown in Example 21-1.
<noNamespace>
<pre:prefixed
xmlns=
"http://datypic.com/unpre"
xmlns:pre=
"http://datypic.com/pre"
>
<unprefixed
pre:prefAttr=
"a"
noNSAttr=
"b"
>
123</unprefixed>
</pre:prefixed>
</noNamespace>
Note that the original prefixes from the input document (or lack thereof) are taken into account when retrieving the names. For example, calling the name
function with the unprefixed
element results in the unprefixed string unprefixed
. This does not mean that the unprefixed
element is not in a namespace; it is in the http://datypic.com/unpre
namespace. It simply indicates that the unprefixed
element was not prefixed in the input document, because its namespace was the default, and therefore had no prefix as part of its QName. Therefore, if you are testing the name or manipulating it in some way, it is best to use node-name
rather than name
, because node-name
provides a result that includes the namespace.
Node | node-name returns an xs:QName with: | name returns | local-name returns | namespace-uri returns |
---|---|---|---|---|
noNamespace | Namespace: empty Prefix: empty Local part: | noNamespace | noNamespace | A zero-length string |
pre:prefixed | Namespace: Prefix: Local part: | pre:prefixed | prefixed | http://datypic.com/pre |
unprefixed | Namespace: Prefix: empty Local part: | unprefixed | unprefixed | http://datypic.com/unpre |
@pre:prefAttr | Namespace: Prefix: Local part: | pre:prefAttr | prefAttr | http://datypic.com/pre |
@noNSAttr | Namespace: empty Prefix: empty Local part: | noNSAttr | noNSAttr | A zero-length string |
Suppose you want to create a report on the product catalog. You want to list all the properties of each product
element in an HTML list. You could accomplish this by using the query shown in Example 21-2. It uses the local-name
function to return the names like name
, colorChoices
, and desc
, allowing them to appear as part of the report.
Query
<html>
{
for
$
prod
in
doc
(
"catalog.xml"
)//
product
return
(
<p>
Product #
{
string
(
$
prod
/
number
)}
</p>
,
<ul>
{
for
$
child
in
$
prod
/(
*
except
number
)
return
<li>
{
local-name
(
$
child
)}
:
{
string
(
$
child
)}
</li>
}
</ul>
)
}
</html>
Results
<html>
<p>
Product # 557</p>
<ul>
<li>
name: Fleece Pullover</li>
<li>
colorChoices: navy black</li>
</ul>
<p>
Product # 563</p>
<ul>
<li>
name: Floppy Sun Hat</li>
</ul>
<p>
Product # 443</p>
<ul>
<li>
name: Deluxe Travel Bag</li>
</ul>
<p>
Product # 784</p>
<ul>
<li>
name: Cotton Dress Shirt</li>
<li>
colorChoices: white gray</li>
<li>
desc: Our favorite shirt!</li>
</ul>
</html>
There are several ways to construct qualified names. Qualified names are constructed automatically when you are using direct element and attribute constructors. They can also be constructed directly from strings in certain expressions, such as computed element constructors. In addition, three functions are available to construct QNames: the xs:QName
constructor, the QName
function, and the resolve-QName
function.
The xs:QName
type has a constructor just like all other built-in simple types. The argument may be prefixed (e.g., prod:number
) or unprefixed (e.g., number
).
A function called QName
can also be used to construct QNames. Unlike the xs:QName
constructor, it can be used to generate names dynamically. It accepts a namespace URI and name (optionally prefixed), and returns a QName. For example:
QName("http://datypic.com/prod", "pre:child")
returns a QName with the namespace http://datypic.com/prod
, the local part child
, and the prefix pre
. As with any function call, the arguments are not required to be literal strings. You could just as easily use an expression such as concat("pre:", $myElName)
to express the name.
A third option is the resolve-QName
function, which accepts two arguments: a string and an element. The string represents the name, which may have a prefix. The element is used to determine the appropriate namespace URI for that prefix. Typically, this function is used to resolve a QName appearing in the content of a document against the namespace context of the element where the QName appears. For example, to retrieve all products that carry the attribute xsi:type="prod:ProductType"
, you can use a path such as:
declare namespace prod = "http://datypic.com/prod"; doc("catalog.xml")//product[resolve-QName(@xsi:type, .) = xs:QName("prod:ProductType")]
This test allows the value of xsi:type
in the input document to use any prefix (not just prod
) as long as it is bound to the http://datypic.com/prod
namespace.
Three functions exist to extract parts of an xs:QName
:
local-name-from-QName
prefix-from-QName
namespace-uri-from-QName
The local-name-from-QName
and namespace-uri-from-QName
functions are similar to the local-name
and namespace-uri
functions, respectively, except that they take an atomic xs:QName
rather than a node as an argument. If you are working with element or attribute names, it is easier to use the functions for retrieving node names, such as local-name
and name
.
XQuery also has two other prefix-related functions: in-scope-prefixes
and namespace-uri-for-prefix
. The in-scope-prefixes
function returns a list of all the prefixes that are in scope for a given element, as a sequence of strings. The namespace-uri-for-prefix
function retrieves the namespace URI associated with a particular prefix, in the scope of a specified element. Because most processing is based on namespaces rather than prefixes (which are technically irrelevant), these functions are not especially useful to the average query writer.
Uniform Resource Identifiers (URIs) are used to uniquely identify resources, and they may be absolute or relative. Absolute URIs provide the entire context for identifying the resources, such as http://datypic.com/prod.html
. Relative URI references are specified as the difference from a base URI, such as ../prod.html
. A URI reference may also contain a fragment identifier following the #
character, such as ../prod.html#shirt
.
The three previous examples happen to be HTTP Uniform Resource Locators (URLs), but URIs also encompass URLs of other schemes (e.g., FTP, gopher, telnet), as well as Uniform Resource Names (URNs). URIs are not required to be dereferenceable; that is, it is not necessary for there to be a web page or other resource at http://datypic.com/prod.html
in order for this to be a valid URI. Sometimes URIs just serve as names. For example, in XQuery, URIs are used as the names of namespaces and collations.
Internationalized Resources Identifiers (IRIs) are an extension of URIs that allow a wider, more international set of characters to appear without being escaped. Generally, the term URI is used in this book (and in the XQuery specification) to mean “URI or IRI.” There are no functions or operations in XQuery that support URIs without also supporting IRIs.
The built-in type xs:anyURI
represents a URI reference. Most XQuery functions that accept URIs as arguments call for xs:string
values instead, but an xs:anyURI
value is acceptable also. This is because of a special type-promotion rule that allows xs:anyURI
values to be automatically promoted to xs:string
when a string is expected. Most of the URI-related functions return xs:anyURI
values, following the philosophy of being liberal in what they accept and specific in what they produce.
Relative URIs are interpreted relative to an absolute URI, known as a base URI. For example, the relative URI prod.html
is useless unless interpreted in the context of an absolute URI. In HTML documents, the base URI is often the URI of the document itself. If an HTML document is located at http://datypic.com/order.html
, and it contains a link to prod.html
, that prod.html
relative URI is resolved in the context of the http://datypic.com/order.html
, and the link points to http://datypic.com/prod.html
.
xml:base
attributeIn XML documents, you can also explicitly specify a base URI using the xml:base
attribute. The scope of each xml:base
attribute is the element on which it appears and all its content.
Example 21-3 shows an XML document that uses the xml:base
attribute on the catalog
elements, with relative URI references (the href
attributes) for each product
. The href="prod443.html"
attribute of the first product
element, for example, is resolved relative to the xml:base
attribute of the first catalog
element, namely http://datypic.com/ACC/
.
xml:base
(http://datypic.com/input/cats.xml
)<catalogs>
<catalog
name=
"ACC"
xml:base=
"http://datypic.com/ACC/"
>
<product
number=
"443"
href=
"prod443.html"
/>
<product
number=
"563"
href=
"prod563.html"
/>
</catalog>
<catalog
name=
"WMN"
xml:base=
"http://datypic.com/WMN/"
>
<product
number=
"557"
href=
"prod557.html"
/>
</catalog>
</catalogs>
The base-uri
function can be used to retrieve the base URI of a node. For document nodes, the base URI is the URI from which the document was retrieved. For example:
base-uri(doc("http://datypic.com/input/cats.xml"))
returns http://datypic.com/input/cats.xml
.
For element nodes, the base URI is the value of its xml:base
attribute, if any, or the xml:base
attribute of its nearest ancestor. For example, if $prod
is bound to the first product
element in cats.xml, the function call:
base-uri($prod)
returns http://datypic.com/ACC/
, because that is the xml:base
value of its nearest ancestor.
If no xml:base
attributes appear among its ancestors, it defaults to the base URI of the document node, if one exists.
The base URI of an individual node is set by the xml:base
attribute or by the document URI. There is also a separate base URI, known as the static base URI. The static base URI is used in several cases:
When an element is constructed in a query, its base URI is set to the static base URI, if it is not absent. Otherwise, its base URI is the empty sequence.
When relative URI references are used in certain expressions, or in arguments to functions like the doc
and collection
functions, they are resolved relative to the static base URI.
When a base URI argument is not provided to the resolve-uri
function, it resolves the URI relative to the static base URI.
The static base URI can be set in the query prolog, using a base URI declaration. Its syntax is shown in Figure 21-1.
Here’s an example of a base URI declaration:
declare base-uri "http://datypic.com";
The base URI must be a literal value in quotes (not an evaluated expression), and it should be a syntactically valid absolute URI.
It is also possible for the processor to set the static base URI outside the scope of the query. Although it is implementation-defined, it’s reasonable to expect that if the query itself is read from a file, the static base URI will default to the location of that file. The value of the static base URI can be retrieved using the static-base-uri
function.
When accessing an input document using the doc
function, a URI is used to specify the document of interest. Processors interpret the URI passed to the doc
function in different ways. Some, like Saxon, will dereference the URI, that is, go out to the URL and retrieve the resource at that location. Other implementations, such as those embedded in XML databases, consider the URIs to be just names. The processor might take the name and look it up in an internal catalog to find the document associated with that name.
You can find the absolute URI from which a document node was retrieved by using the document-uri
function. This function is basically the inverse of the doc
function. Where the doc
function accepts a URI and returns a document node, the document-uri
function accepts a document node and returns a URI.
For example, if the variable $orderDoc
is bound to the result of doc("http://datypic.com/input/order.xml")
, then document-uri($orderDoc)
returns "http://datypic.com/input/order.xml"
.
In most cases, this has the same effect as calling the base-uri
function on the document node.
Most of the examples of the doc
function in this book use a hardcoded URI, as in doc("order.xml")
. However, suppose you wanted to open the documents referenced in Example 21-3. For example, you want to open the product information page for product number 443. Its relative URI is prod443.html
, and its base URI is http://datypic.com/ACC/
. To do this, you could use:
let $prod := doc("cats.xml")/catalogs/catalog[1]/product[1]/@href let $absoluteURI := resolve-uri($prod, base-uri($prod)) return doc($absoluteURI)
which would open the document described by the URI http://datypic.com/ACC/prod443.html
.
URIs require that some characters be escaped with their hexadecimal Unicode codepoint preceded by the % character. This includes non-ASCII characters and some ASCII characters, namely control characters, spaces, and several others. In addition, certain characters in URIs are separators that are intended to delimit parts of URIs, namely the characters ; , / ? : @ & = + $ [ ]
and %
. If these delimiter characters must be used in a URI, having a meaning other than as a delimiter, they too must be escaped.
Three functions are available for escaping URI values: iri-to-uri
, escape-html-uri
, and encode-for-uri
. All three replace each special character with an escape sequence in the form %xx
(possibly repeating), where xx
is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, ../édition.html
is changed to ../%C3%A9dition.html
, with the é
escaped as %C3%A9
.
The three escape functions vary in which characters they escape:
iri-to-uri
Escapes only those characters that are not allowed in URIs, but not the delimiters ; , / ? : @ & = + $ [ ] or %
. It is appropriate for escaping entire URIs.
escape-html-uri
Escapes characters as required by HTML agents. Specifically, it escapes everything except ASCII characters 32 to 126. It is appropriate for URIs that are to be handled by browsers.
encode-for-uri
Is the most aggressive of the three. It escapes all the characters that are required to be escaped in URIs, plus all the delimiter characters. It is appropriate for escaping pieces of URIs, such as filenames, that cannot contain delimiter characters.
Note that none of these functions check whether the argument provided is a valid URI; they simply act on the argument as if it were any string.
IDs and IDREFs are used in XML to uniquely identify elements within a single document and to create references to those elements. This is useful, for example, to create footnotes and references to them, or to create hyperlinks to specific sections of HTML documents.
Typically, an attribute is used as an ID to uniquely represent the element that carries it. It is also technically possible to use a child element as an ID, but it is discouraged for reasons of compatibility with XML DTDs. The value of an ID must be a valid NCName (an XML name with no colon), which means that it must follow certain rules like starting with a letter or underscore, and not containing spaces.
Attributes named xml:id
(in the http://www.w3.org/XML/1998/namespace
namespace) are always considered to be IDs. Attributes with other names can also be considered IDs if they are declared to have the built-in type xs:ID
in a schema, or ID
in a DTD.
Example 21-4 shows an XML document that contains some ID attributes, namely the id
attribute of the section
element, and the fnid
attribute of the fn
element. Each section
and fn
element is uniquely identified by an ID value, such as fn1
, preface
, or context
.
The example assumes that this document was validated with a schema that declares these attributes to be of type xs:ID
. The id
attributes are not automatically considered to be IDs because they are not in the appropriate namespace. In fact, the name is irrelevant if it is not xml:id
; an attribute named foo
can have the type xs:ID
, and an attribute named id
can have the type xs:integer
.
The type xs:IDREF
is used for an attribute that references an xs:ID
. All attributes of type xs:IDREF
must reference an ID in the same XML document. A common use case for xs:IDREF
is to create a cross-reference to a particular section of a document. The ref
attribute of the fnref
element in Example 21-4 contains an xs:IDREF
value (again, assuming it is validated with a schema or DTD). Its value, fn1
, matches the value of the fnid
attribute of the fn
element.
The type xs:IDREFS
represents a whitespace-separated list of one or more xs:IDREF
values. In Example 21-4, the refs
attribute of secRef
is assumed to be of type xs:IDREFS
. The first refs
attribute contains only one xs:IDREF
(context
), while the second contains two xs:IDREF
values (context
and language
).
The id
and idref
functions allow you to reference elements based on the ID/IDREF relationship.
Given a sequence of IDs, the id
function returns the elements whose xs:ID
attributes match them. For example, the function call:
doc("book.xml")/id( ("preface", "context") )
returns the first two section
elements, because their ID attributes have the values preface
and context
, respectively.
The idref
function returns elements that refer to specified IDs, using either an xs:IDREF
or xs:IDREFS
attribute. For example, the function call:
doc("book.xml")/idref( ("context", "language") )
returns the refs
attributes of the two secRef
elements, because each of these attributes is of type xs:IDREFS
and contains either context
or language
or both.
The previous examples used literal strings for the argument. These two functions become even more useful when they are used to link referring elements to referred elements. For example, the expression:
for $child in (doc("book.xml")//section[1]/node())
return if (name($child) = "fnref")
then concat ("[", string(doc("book.xml")/id($child/@ref)
), "]")
else string($child)
uses the id
function to resolve the footnote reference in the first section. It returns:
This book introduces XQuery... The examples are downloadable [See http://datypic.com.]...
The text that was contained in the fn
element now appears where it was referenced using fnref
.
Another ID-related function is element-with-id
, which behaves identically to the id
function when IDs are only contained in attributes, as in the previous examples. However, when an ID is contained in element content, the id
function returns that element itself, while the element-with-id
function returns its parent.
You can create result elements with IDs by using the xml:id
attribute in your element constructors. For example, the constructor:
<prod xml:id="{concat('P', $prodNum)}"/>
will create a prod
element with an ID attribute that is equal to the letter P
concatenated with the value of the $prodNum
variable. The value of an attribute named xml:id
must be a valid XML name. Any whitespace in its value will be normalized automatically.
You can generate unique identifiers for nodes using the generate-id
function, which accepts a node and returns a unique identifier for that node. For example, the constructor:
for $prod in doc("catalog.xml")//product return <div id="{generate-id($prod)}"> ... </div>
will create a div
with a unique identifier for each product. This is useful whenever you need to create unique identifiers in your output. The exact value of the ID is implementation-dependent, but it is guaranteed to be a syntactically valid xs:ID
value, unique for each node in an input document, and consistent if the function is called multiple times within the execution of a query.