Chapter 12. Prologs, Modules, and Variables

This chapter covers the structure of queries in more detail. It discusses the query prolog and its various declarations. It then describes how to assemble queries from multiple modules, declare global variables, and define external functions.

Structure of a Query: Prolog and Body

An XQuery query is made up of two parts: a prolog and a body. The query prolog is an optional section that appears at the beginning of a query. The prolog can contain various declarations that affect settings used in evaluating the query. This includes namespace declarations, imports of schemas, variable declarations, function declarations, and other setting values. In a query module of any size, the prolog is actually likely to be much larger than the body.

Example 12-1 shows a query with a prolog containing several different types of declarations.

Example 12-1. A query prolog
xquery version "3.0";
declare default element namespace "http://datypic.com/cat";
declare boundary-space preserve;
declare namespace ord = "http://datypic.com/ord";
import schema namespace prod="http://datypic.com/prod"
                        at "prod.xsd";
declare function local:getProdNums
  ($catalog as element()) as xs:integer*
  {for $prod in $catalog/product
   return xs:integer($prod/number)};

The query body is a single expression, but that expression can consist of a sequence of one or more expressions that are separated by commas. Example 12-2 shows a query body that contains a sequence of two expressions, a constructed element, and a FLWOR. The comma after the h1 element is used to separate the two expressions in the query body.

Example 12-2. A query body
<h1>Order Report</h1>,
for $item in doc("order.xml")//item
order by $item/@num
return $item

Prolog Declarations

The prolog consists of a series of declarations terminated by semicolon (;) characters. There are three distinct sections of the prolog.

The first declaration to appear in the query prolog is a version declaration, if it exists.

The second prolog section consists of setters, imports, and namespace declarations. Setters are the declarations listed in Table 12-1, along with a link to where they are covered fully in the book. Each kind of setter can only appear once.

Table 12-1. Query prolog setters
DeclarationDescriptionChapter/Section
Boundary-spaceHow to process boundary whitespace in element constructors “The boundary-space declaration”
Ordering modeWhether the default order is document order or some implementation-dependent order “The ordering mode declaration”
Empty orderWhether empty sequences should come first or last when ordered “Empty order”
Copy-namespacesWhether nodes copied in constructors should copy namespaces from their parents “Controlling the Copying of Namespace Declarations”
ConstructionWhether nodes copied in constructors should be typed “Types and Newly Constructed Elements and Attributes”
Decimal formatA decimal format used by the format-number function “The Decimal Format Declaration”
Default collationThe default collation for string comparison “Specifying a collation”
Base URIThe static base URI “Static base URI”

Imports and namespace declarations, listed in Table 12-2, can appear multiple times (with different values), intermingled with setters in any order.

Table 12-2. Query prolog imports and namespace declarations
DeclarationDescriptionChapter/Section
Default namespace declarationBinds unprefixed names to a namespace for the entire scope of the query “Default namespace declarations in the prolog”
Namespace declarationBinds a prefix to a namespace for the entire scope of the query “Prolog Namespace Declarations”
Module importImports a library module from a specified location “Importing a Library Module”
Schema importImports a schema definition from a specified location “Schema Imports”

The last section of the prolog consists of function, variable, context item, and option declarations, listed in Table 12-3. They must appear after all the setters, imports, and namespace declarations.

Table 12-3. Query prolog variable and function declarations
DeclarationDescriptionChapter/Section
Function declarationDeclares a user-defined function “Function Declarations”
Variable declarationDeclares global variables “Variable Declarations”
Option declarationDeclares implementation-specific parameters “The Option Declaration”
Output declarationDeclares serialization parameters; a special kind of option declaration “Specifying Serialization Parameters by Using Option Declarations”
Context item declarationDeclares the context item for the query “Setting the Context in the Prolog”

It is important to note that your processor might also be setting these values. For example, different XQuery implementations can choose to build in different default collations or different sets of predefined functions. In addition, an implementation might allow the user to specify these values outside the query—for example, using a command-line interface. Prolog declarations override or augment the default settings defined outside the scope of the query.

The Version Declaration

The first of the declarations in Example 12-1 is a version declaration, whose syntax is shown in Figure 12-1. The version declaration is used to indicate the version of the XQuery language. Appropriate values are 1.0, 3.0 or 3.1. If the version is not specified, the default value depends on which version the processor supports. If it is an XQuery 3.1 processor, it will assume the version is 3.1. If a version declaration does appear, it must appear first in the query, even before any comments.

Figure 12-1. Syntax of a version declaration

The version declaration also allows you to specify a character encoding for the query itself by using the encoding keyword and a literal string. For example, the following version declaration specifies an encoding of UTF-8:

xquery version "3.1" encoding "UTF-8";

Other example values for the encoding include UTF-16, ISO-8859-1, and US-ASCII. The way encoding is handled is somewhat implementation-dependent, in that processors are allowed to ignore the encoding value specified in the query if they have other knowledge about the encoding.

Because the encoding of a file can easily change unintentionally—for example, when you save it by using a text editor—it’s safest to stick to using ASCII characters in the query, using numeric character references for any non-ASCII characters. This does not prevent you from specifying the encoding as UTF-8, because ASCII is a subset of UTF-8.

Assembling Queries from Multiple Modules

So far, all of this book’s example queries were contained in one module, known as the main module. However, you can declare functions and variables in other modules and import them into the main module of the query. This is a very useful feature for:

  • Reusing functions among many queries

  • Creating standardized libraries that can be distributed to a variety of query users

  • Organizing and reducing the size of query modules

The main module contains a query prolog followed by a query body, which is the main expression to be evaluated. In its prolog, the main module can import other modules known as library modules.

Not all implementations support the use of library modules; it is an optional feature. If this feature is used but is not supported by the implementation, error XQST0016 is raised.

Library Modules

Library modules differ from main modules in that they cannot have a query body, only a prolog. They also differ in that they must start with a module declaration, whose syntax is shown in Figure 12-2.

Figure 12-2. Syntax of a module declaration

The module declaration identifies the module as a library module. It also declares the target namespace of the module and binds it to a prefix. For example, the expression:

module namespace strings = "http://datypic.com/strings";

declares the target namespace of the module to be http://datypic.com/strings and binds that namespace to the prefix strings.

The target namespace must be a literal value in quotes, not an evaluated expression. It should be a syntactically valid absolute URI, and cannot be a zero-length string.

The names of all the functions and variables declared in that library module must be qualified with that same target namespace. This differs from main modules, which do not have target namespaces and allow you to declare variables and functions in a variety of namespaces.

A library module is shown in Example 12-3, where the variable and function names are prefixed with strings.

Example 12-3. A library module (strings.xqm)
module namespace strings = "http://datypic.com/strings";
declare variable $strings:maxStringLength := 32;
declare function strings:trim($arg as xs:string?) as xs:string? {
  replace(replace($arg,'^s+',''),'s+$','')
};

Importing a Library Module

Both main modules and library modules can import other library modules. Importing a library module allows its variables and functions to be referenced from the importing module. Only library modules can be imported; a main module can never be imported by another module.

A module import, which appears in the prolog, specifies the target namespace and location (URI) of the library module to be imported. Multiple module imports can appear in the query prolog. The syntax of a module import is shown in Figure 12-3.

Figure 12-3. Syntax of a module import

For example, the declaration:

import module "http://datypic.com/strings"
           at "http://datypic.com/input/strings.xqm";

imports the module from http://datypic.com/input/strings.xqm whose target namespace is http://datypic.com/strings. The target namespace specified in the module import must match the target namespace of the library module that is being imported. The module location and namespace must be literal values in quotes (not evaluated expressions), and they should be syntactically valid URIs. If the location is a relative URI, it is resolved relative to the static base URI, for example:

declare base-uri "http://datypic.com/";
import module "http://datypic.com/strings"
              at "strings.xqm";

Imported modules can have the same target namespace as the importing module, or they can have a different one. For convenience, it is also possible to bind a namespace prefix directly in the module import. For example:

import module namespace strings = "http://datypic.com/strings"
                        at "http://datypic.com/input/strings.xqm";

binds the prefix strings to the namespace, in addition to importing the module.

The at keyword and location are optional. If the processor has some other way to locate the module based on its target namespace, it can be omitted. In fact, the processor is not required to use the location even if it is provided. The processor also has some leeway in how it interprets the location. One implementation might treat the location like a filename and look for the module on the filesystem, while another might consider it an identifier that allows them to retrieve it from an XML database.

Multiple module imports

It is possible to specify multiple module locations for the same target namespace in a single import, using commas to separate them, as in:

import module "http://datypic.com/strings"
               at "http://datypic.com/input/strings.xqm",
                  "http://datypic.com/input/strings2.xqm";

This syntax is the only way to specify multiple imports for the same target namespace. If two separate module imports specify the same target namespace, error XQST0047 is raised.

Functions and variables must be unique across all modules that are used together (main or imported). Declaring two functions with the same qualified name and the same arity raises an error, as does declaring two variables with the same qualified name. These errors are raised even if the two duplicate declarations are exactly the same.

Multi-level imports

It is important to understand that a module import only imports the function and variable declarations of the library module. It does not import any schemas or other modules that are imported in the prolog of the library module.

For example, suppose the strings.xqm library module contains an import of a third module, called characters.xqm. If the main module imports strings.xqm, that does not mean that characters.xqm is also imported into the main module. If the main module refers to the functions and variables of characters.xqm directly, it needs to have a separate module import for characters.xqm.

Likewise, if strings.xqm imports a schema named stringtypes.xsd, the main module that imports strings.xqm must also separately import the schema if it refers to any types from that schema. This is described further in “Schema Imports”.

In complex import scenarios, it may arise that there is a circular reference, for example strings.xqm imports characters.xqm, which turns around and imports strings.xqm. This is not an error starting in version 3.0.

It is also not an error if the same module is imported multiple times, as long as it is imported from the same location each time. For example, if strings.xqm imports two separate modules, each of which imports common.xqm, the common.xqm module functions and variables are not considered duplicates of themselves.

Loading a Library Module Dynamically

Starting in version 3.1, a built-in function load-xquery-module can also be used to gain access to the functions and variables in a library module. Calling this function has some benefits over using a module import declaration in the prolog. The function allows you to dynamically pass a context item, and values for external variables. It also allows modules to be loaded dynamically, only if they are needed. Detailed information on this function can be found in Appendix A, in the “load-xquery-module” section.

Variable Declarations

Variables can optionally be declared (and bound) in the query prolog. These variables are sometimes referred to as global variables in order to easily distinguish them from other variables bound within an expression in the query body, such as a let clause. If a variable is bound within an expression in the query body, it does not have to be declared in the prolog as well. For example, you can use the expression let $myInt := 2 in the query body without declaring $myInt in advance. $myInt is bound when the let expression is evaluated.

However, it is sometimes necessary to declare variables in the prolog, such as when:

  • They are referenced in a function that is declared in that module

  • They are referenced in other modules that import the module

  • Their value is set externally by the processor outside the scope of the query

Declaring variables in the prolog can also be a useful way to define constants, or values that can be calculated up front and used throughout the query. It’s important to remember that global variables (prolog-declared variables) are immutable, just like other XQuery variables.

Variable Declaration Syntax

The syntax of a variable declaration is shown in Figure 12-4. As you can see, a variable declaration consists of several parts:

Figure 12-4. Syntax of a variable declaration

For example, the declaration:

declare variable $maxItems := 12;

binds the value 12 to the variable $maxItems.

The Scope of Variables

The processor evaluates all variable declarations before it evaluates the query body. When a variable declaration is evaluated, the variable is bound to a specific value, and you can reference that variable anywhere in the query. When variable declarations appear in library modules, those variables can be also referenced in any modules that import that library module.

As with all other XQuery variables, a variable can only be bound once. Therefore, you cannot declare a variable in the prolog and then set its value later, for example, in a let expression. If you attempt to do this, it will not raise an error, but it will be considered an entirely new variable with the same name and a smaller scope (the FLWOR).

Variable Names

Each variable declaration specifies a unique variable name. Variables have qualified names, meaning that they can be associated with a namespace. In the main module, variable declarations can have either unprefixed or prefixed names. If they have unprefixed names, they are not in any namespace, because variable names are not affected by default namespace declarations. If they are prefixed, they can be associated with any namespace that was also declared in the prolog.

In library modules, on the other hand, the names of declared variables must be in the target namespace of the module. Because default namespace declarations do not apply to variable names, they must be prefixed with a prefix bound to the target namespace. This applies only to the variables that are declared in the prolog. If other variables are bound inside a function body, for example in let clauses, they can be in no namespace (unprefixed) or associated with a namespace other than the target namespace.

Initializing Expressions

The initializing expression appears after := and specifies the value of the variable. It can be any XQuery expression. For example:

declare variable $firstNum := doc("catalog.xml")//product[1]/number;

binds the value of the first product number in the catalog to the $firstNum variable.

The initializing expression can call any function, or reference any other global variable, that is declared anywhere in the module or in an imported module. However, the variable declarations cannot be circular in that they depend on each other. For example, the following is invalid because the declarations of $varA and $varC depend on each other indirectly:

declare variable $varA := $varB + 1;
declare variable $varB := $varC + 1;
declare variable $varC := $varA + 1;

External Variables

External variables are variables whose values are passed to the query by the external environment. This is useful for parameterizing queries. For example, an external variable might allow a query user to specify the maximum number of items she wants returned from the query.

To declare an external variable, you use the keyword external instead of an initializing expression, as in:

declare variable $maxItems external;

In this case, the processor must pass a $maxItems value to the query, or error XPDY0002 is raised. Starting in version 3.0, it is also possible to specify a default value for an external variable. It is used in case a value is not passed to the query. This is done with an initializing expression after the keyword external, for example:

declare variable $maxItems external := 100;

In this case, if the processor passes a value to the query, it is used, otherwise the value 100 is used. Consult the documentation of your XQuery implementation to determine exactly how you can specify the values of external variables outside the query. It may allow them to be specified on a command-line interface or set programmatically.

Note that external variables are not the same thing as variables that are imported from other XQuery modules. Variables from imported modules do not need to be redeclared in the importing module.

Private Functions and Variables

Starting in version 3.0, it is possible to declare that a function or variable in a library module is private. This means that it can only be used from within the library module in which it is defined, and is not available to other modules that import it. This is useful for isolating or “hiding” functionality that is internal to the module and not intended to be part of the services that module provides to other modules.

To make a function or variable private, use the annotation %private after the declare keyword. Example 12-4 shows an example where the strings:get-norm-length function and the strings:trim-length variables are private because they are intended to be used only within the library module.

Example 12-4. Private function
module namespace strings = "http://datypic.com/strings";
declare function strings:trim($arg as xs:string?) as xs:string? {
  if (strings:get-norm-length($arg) < $strings:trim-length)
  then substring($arg, 1, $strings:trim-length)
  else $arg
};
declare %private function strings:get-norm-length
  ($arg as xs:string?) as xs:string? {
  string-length(normalize-space($arg))
};
declare %private variable $strings:trim-length := 32;

It is also possible to specify the opposite annotation, %public, but since that is the default, it is not necessary. The %private and %public keywords are examples of annotations, which are described further in “Annotations”. It is also possible to use the %private annotation on functions and variables in main modules, but it has no effect because main modules cannot be imported.

Declaring External Functions

External functions can be provided by a particular XQuery implementation. They may be unique to that implementation or be part of a standard set of extensions defined by a user community. They may be implemented in XQuery or in another language; they simply need to be able to interface with a query using an XQuery function signature.

External functions may be declared with signatures in the query prolog. Their syntax starts out similar to a function declaration, but instead of a function body in curly braces, they use the keyword external. For example:

declare namespace strings = "http://datypic.com/strings";
declare function strings:trim ($arg as xs:string?) as xs:string? external;

declares an external function named trim. Like other function names, the names of external functions must be prefixed, and those prefixes must be declared using a namespace declaration.

Note that external functions are not the same thing as user-defined functions that are imported from other modules. Functions declared in imported modules do not need to be redeclared in the importing module.

You should consult the documentation for your XQuery implementation to determine whether there are libraries of external functions that you can call, or whether you are able to write external functions of your own.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset