XQuery can be used for a wide variety of XML processing needs. As such, different XQuery implementations provide customized functions and settings for specific use cases. This chapter looks at some of the implementation-specific aspects of XQuery.
The XQuery specification consists of a core set of features that all implementations are required to support. Supporting these features is known as minimal conformance. In addition, there are a handful of optional features that are clearly defined and scoped. These optional features are listed in Table 25-1.
Feature | Description | Chapter/Section |
---|---|---|
Module | Support for library modules and module imports | “Assembling Queries from Multiple Modules” |
Schema Aware | Support for schema imports in the prolog and validate expressions | Chapter 14 |
Typed Data | Support for typed element and attribute nodes | Chapter 14 |
Static Typing | Detection of all static type errors in the analysis phase | Chapter 15 |
Higher-Order Function | Support for function items, inline function expressions, and dynamic function calls | Chapter 23 |
Serialization | Ability to serialize query results | “Serializing Output” |
In addition to these seven features, some aspects of the core language are either implementation-dependent or implementation-defined. Implementation-defined features are those where the implementer is required to document the choices she has made. For example, the list of supported collations or additional built-in functions is implementation-defined. Implementation-dependent behavior may vary by implementation but does not have to be explicitly stated in the documentation and cannot necessarily be predicted. For example, when an unordered
expression is used, the order of the results is implementation-dependent.
Any XQuery processor is associated with a particular version of XQuery. It is either an “XQuery 1.0 processor,” an “XQuery 3.0 processor,” or an “XQuery 3.1 processor.” If an XQuery 3.1 processor processes an XQuery module that does not have a version declaration, it will assume that the query is version 3.1. If it encounters a version declaration that says that it is an earlier version, it may do one of three things:
Process the module by using the (prior) version that is specified in the version declaration.
Process the module as version 3.1. Because XQuery is generally backward compatible, the result should be the same.
An XQuery implementation may support either XML 1.0 and Namespaces 1.0, or XML 1.1 and Namespaces 1.1. This is an implementation-defined choice that should be clearly documented. XML 1.1 allows a much wider set of characters in XML names, and adds two line-end characters to the set of characters that are considered to be whitespace. The main changes in Namespaces 1.1 are the ability to undeclare prefixes, and support for Internationalized Resources Identifiers (IRIs) rather than just URIs.
An XQuery implementer can also choose whether to support XML Schema 1.0 or XML Schema 1.1, and choose what version of Unicode to support for the functions that rely on Unicode definitions, such as normalization and case mapping.
The following significant features were added in XQuery 3.0:
Higher-order functions, dynamic functions, and function items, described in Chapter 23
More flexibility in the order of clauses in a FLWOR expression, and three new clauses:
group by
for grouping, described in “Grouping Using the group by
Clause”
window
for windowing, described in “Windowing”
count
to provide the position of the current iteration, described in “Using the count
Clause”
allowing empty
in for
clause, which allows more straightforward outer joins, described in “Outer joins with allowing empty
”
Try/catch expressions to improve error handling, described in “Try/Catch Expressions”
Output declarations to specify serialization parameters, described in “Specifying Serialization Parameters by Using Option Declarations”
Annotations on functions and variables, described in “Annotations”, including the ability to indicate that functions and variables are private, described in “Private Functions and Variables”
The switch expression, described in “Switch Expressions”
The string concatenation operator (||
), described in “Concatenating Strings”
The simple map operator (!
), described in “The Simple Map Operator”
New functions, described in Appendix A. Each function that is new in XQuery 3.0 is marked as such in the appendix.
Math and trigonometric functions (math:*
)
Functions for formatting dates, times, and numbers (format-*
)
A function that returns matching and non-matching parts of a string (analyze-string
)
Functions for parsing a string as XML, and serializing XML to a string (parse-xml
, parse-xml-fragment
, serialize
)
Functions for parsing non-XML text files (unparsed-text
, unparsed-text-lines
)
Generic higher-order functions (for-each
, for-each-pair
, filter
, fold-*
)
The following significant features were added in XQuery 3.1:
Maps, described in “Maps”
Arrays, described in “Arrays”
Parsing and serializing JSON, described in “JSON”
The arrow operator (=>
) for chaining function calls, described in “Calling Functions with the Arrow Operator”
String constructors (using ``[
and ]``
) for constructing structured strings, described in “String Constructors”
New functions, described in Appendix A. Each function that is new in XQuery 3.1 is marked as such in the appendix.
Functions for querying and manipulating maps (map:*
)
Functions for querying and manipulating arrays (array:*
)
Functions for converting JSON to and from strings and XML (parse-json
, json-to-xml
, xml-to-json
)
A function that sorts items based on a supplied function (sort
)
A function that generates random numbers (random-number-generator
)
A function to call an XSLT stylesheet (transform
)
A function to dynamically load an XQuery module (load-xquery-module
)
Every query is analyzed and evaluated within a context that is defined by the implementation. This context includes settings like implicit time zone, context item, and default collation. In some cases, the settings of the context can be overridden by prolog declarations in the query, but sometimes they cannot. It is useful to know what defaults and choices your implementation supports for these settings.
Your implementation may do any of the following to augment the built-in functions and features of the XQuery language:
Add predeclared namespaces (including a default element and function namespace)
Add built-in schemas, whose type names can be used in queries and whose element and attribute declarations can be used in validation
In addition, your implementation may set default values for any of the prolog “setters,” namely:
Boundary-space policy
Ordering mode
Empty-order specification
Copy-namespaces mode
Construction mode
Default collation
Static base URI
Serialization parameters
The implementation may or may not allow you the option of overriding these settings outside the scope of the query. For example, you may be allowed to enter them into a dialog box in a user interface, specify them at a command-line prompt, or set them programmatically. However, any settings specified in the query prolog take priority.
Option declarations can be used to specify an implementation-defined setting in the query prolog. For example, the Saxon implementation allows for several different types of options. Example 25-1 shows one of them.
declare
namespace
saxon
=
"http://saxon.sf.net/"
;
declare
option
saxon:output
"saxon:indent-spaces=5"
;
The option declaration, for saxon:output
, is used to specify values for additional serialization parameters, in this case, the number of spaces to indent when serializing the results. Incidentally, option declarations are also used for the built-in serialization parameters, which are discussed in “Specifying Serialization Parameters by Using Option Declarations”.
An option declaration may apply to the whole query, or just the subsequent prolog declaration, or any other scope defined by the implementation.
Options have namespace-qualified names, which means that the prefixes used must be declared, and processors recognize them by their namespace. If an option belongs to a namespace that is not supported by the implementation, it is ignored. If a processor recognizes the option but determines that the content is invalid, the behavior is implementation-dependent. It may raise an error, or it may ignore it.
Queries can also contain implementation-specific extension expressions that may be used to specify additional parameters to a query. Extensions are similar to options, except that they can appear anywhere that an expression is allowed in the query (not just the prolog) and they apply to an individual expression.
For example, the extension:
(# dty:timeOut 200 #) { count($doc//author) }
might be used to tell the processor to time out after 200 seconds. The syntax of an extension is shown in Figure 25-1.
Extension expressions consist of one or more pragmas, each delimited by (#
and #)
, followed by the affected expression in curly braces. A pragma has two parts: a qualified name, and optional content, which can be any string of characters (except for #
).
Extensions can be used in a number of ways. Examples include:
Providing hints to the processor regarding how best to evaluate the expression, such as what index to use or how long to wait before timing out.
Allowing non-standard interpretation of XQuery syntax, for example, allowing the comparison of xs:gDay
values by using the <
operator, which is normally not permitted. However, the expression in curly braces still must use valid XQuery syntax.
Specifying an alternate proprietary syntax in the pragma content that may be more efficient or otherwise preferable to the expression in curly braces.
Use of options and pragmas that affect the result of the expression make for queries that are not interoperable across implementations. Use such extensions only when absolutely necessary.
Like options, pragmas are recognized by their namespace, and a processor will ignore any pragmas in namespaces it doesn’t recognize. If all the pragmas associated with an expression are ignored, the expression is evaluated normally, as if no pragmas were specified.
If a processor recognizes the namespace used in a pragma, but not the local name, it may either raise an error or ignore it. If it recognizes the pragma and determines that it is invalid, it will raise error XQST0013
.
Starting in version 3.0, annotations provide another way to specify implementation-defined properties, specifically relating to user-defined functions and global variables. Annotations are specified after the keyword declare
in function and variable declarations and they have names preceded by percent signs. For example, a hypothetical annotation named stable
might be specified as follows:
xquery version "3.0"; declare namespace dty="http://datypic.com"; declare %dty:stable("true") function dty:myFunction () { "function body here" }; declare %dty:stable("true") variable $dty:myVariable external;
The syntax of an annotation is shown in Figure 25-2.
It is possible to omit the parentheses and value if the name of the annotation alone conveys all the necessary information. It is also possible to specify multiple values for a single annotation by separating them with commas within the parentheses. Multiple annotations (with different names) can also be specified. For example, XQSuite, a test framework for XQuery, allows the insertion of test cases and expected results by using annotations, as shown in Example 25-2.
xquery
version
"3.0"
;
declare
namespace
test
=
"http://exist-db.org/xquery/xqsuite"
;
declare
%test:args
(
"Hello"
,
"world"
)
%test:assertEquals
(
"Hello world"
)
function
local:hello
(
$
greet
as
xs:string
,
$
user
as
xs:string
)
{
$
greet
||
" "
||
$
user
};
Like options and pragmas, annotations are recognized by their namespace, and a processor will ignore any annotations in namespaces it doesn’t recognize.