The basic unit of evaluation in the XQuery language is the expression. A query contains expressions that can be made up of a number of sub-expressions, which may themselves be composed from other sub-expressions. This chapter explains the XQuery syntax, and covers the most basic types of expressions that can be used in queries: literals, variables, function calls, and comments.
A query can range in complexity from a single expression such as 2+3
, to a complex composite expression like a FLWOR. Within a FLWOR, there may be other expressions, such as $prodDept = "ACC"
, which is a comparison expression, and doc("catalog.xml")/catalog/product
, which is a path expression. Within these expressions are further expressions, such as "ACC"
, which is a literal, and $prodDept
, which is a variable reference. Every expression evaluates to a sequence, which may be a single item (for example, an atomic value or node), the empty sequence, or multiple items.
The categories of expressions available are summarized in Table 3-1, along with a reference to the chapter or section that covers them.
Category | Description | Operators or keywords | Chapter/Section |
---|---|---|---|
Primary | The basics: literals, variables, function calls, and parenthesized expressions | Chapter 3 | |
Comparison | Comparison based on value, node identity, or document order | = , != , < , <= , > , >= , eq , ne , lt , le , gt , ge , is , << , >> | “Comparison Expressions” |
String concatenation | Concatenating two strings | || | “Concatenating Strings” |
String construction | Interspersing strings with expressions | ``[ ]`` | “String Constructors” |
Conditional | If-then-else expressions | if , then , else | “Conditional (if -then -else ) Expressions” |
Switch | Switch expressions | switch , case | “Switch Expressions” |
Logical | Boolean and /or operators | or , and | “Logical (and /or ) Expressions” |
Path | Selecting nodes from XML documents | / , // , .. , . , child:: , etc. | Chapter 4 |
Simple map | Iterating through items | ! | “The Simple Map Operator” |
Constructor | Adding XML to the results | < , > , element , attribute | Chapter 5 |
FLWOR | Controlling the selection and processing of nodes | for , let , where , order by , group by , count , return | “FLWOR Expressions” |
Quantified | Determining whether sequences fulfill specific conditions | some , every , in , satisfies | “Quantified Expressions” |
Sequence-related | Creating and combining sequences | to , union (| ), intersect , except | Chapter 9 |
Type-related | Casting and validating values based on type | instance of , typeswitch , cast as , castable as , treat as , validate | Chapter 11, Chapter 15 |
Arithmetic | Adding, subtracting, multiplying, and dividing | + , - , * , div , idiv , mod | Chapter 17 |
The XQuery language uses a number of keywords and symbols in its expressions. All of the keywords are case-sensitive, and they are generally lowercase. In some cases, a symbol (such as *
) or keyword (such as in
) has several meanings, depending on where they appear. The XQuery grammar is defined in such a way that these multi-use operators are never ambiguous.
Names are used in XQuery to identify elements, attributes, types, variables, and functions. These names must conform to the rules for XML qualified names, meaning that they can start with a letter or underscore and contain letters, digits, underscores, hyphens, and periods. Like the keywords, they are also case-sensitive. Because there are no reserved words in the XQuery language, a name (for example, a variable or function name) used in a query may be the same as any of the keywords, without any ambiguity arising.
All names used in XQuery are namespace-qualified names. This means that they can be prefixed in order to associate them with a namespace name, and they may be affected by default namespace declarations.
Whitespace (spaces, tabs, and line breaks) is allowed almost anywhere in a query to break up expressions and make queries more readable. You are required to use whitespace to separate keywords from each other—for example, order by
cannot be written as orderby
. Extra whitespace is acceptable, as in order by
. By contrast, you are not required to use whitespace as a separator when using non-word symbols such as =
and (
. For example, you can use a=b
or a = b
.
In most cases, whitespace used in queries has no significance. Whitespace is significant in quoted strings, e.g., in the expression "contains spaces"
, and in constructed elements and attributes when it’s combined with other characters.
No special end-of-line characters are required in the XQuery language as they might be in some programming languages. Line-feed and carriage return characters are treated like any other whitespace.
Literals are simply constant values that are directly represented in a query, such as "ACC"
and 29.99
. They can be used in expressions anywhere a constant value is needed, for example the strings in the conditional expression:
if ($department ="ACC"
) then"accessories"
else"other"
or the numbers 1
and 30
in the function call:
substring($name,1
,30
)
There are two kinds of literals: string literals, which must be enclosed in single or double quotes, and numeric literals, which must not. Numeric literals can take the form of simple integers, such as 1
, decimal numbers, such as 1.5
, or floating-point numbers, such as 1.5E2
. The processor makes assumptions about the type of a numeric literal based on its format.
You can also use type constructors to convert your literal values to the desired type. For example, to include a literal date in an expression, you can use xs:date("2015-05-03")
. For literal Boolean values, you can use the function calls true()
and false()
.
Variables in XQuery are identified by names that are preceded by a dollar sign ($). The names (not including the dollar sign) must conform to the definition of an XML-qualified name. This means that they can be prefixed, in which case they are associated with the namespace bound to that prefix. If they are not prefixed, they are not associated with any namespace.
When a query is evaluated, a variable is bound to a particular value. That value may be any sequence, including a single item such as a node or atomic value, the empty sequence, or multiple items. Once the variable is bound to a value, its value does not change. One consequence of this is that you cannot bind a new value to the variable as you can in most procedural languages. Instead, you must use a new variable.
Variables can be bound in several kinds of expressions, including global variable declarations, for
or let
clauses of a FLWOR, quantified expressions, or switch expressions. For example, evaluation of the FLWOR:
for$prod
in doc("catalog.xml")/catalog/product return$prod
/number
binds the $prod
variable to a product
element node. The variable is then referenced in the return
clause. Function declarations also bind variables to values. For example, the function declaration:
declare function local:addTwo ($value
as xs:integer) as xs:integer {$value
+ 2 };
binds the $value
variable to the value of the argument passed to it. In this case, the $value
variable is referenced in the function body.
Function calls are another building block of queries. A typical function call might look like:
substring($prodName, 1, 5)
where the name of the function is substring
and there are three arguments, separated by commas and surrounded by parentheses. The first argument is a variable reference, whereas the other two are numeric literals.
The XQuery language has almost 200 built-in functions, detailed in Appendix A. Chapter 8 explains the details of the rules for calling functions based on their signatures. It also explains how to define your own functions.
Starting in version 3.0 there are additional options for calling functions, including partial function application and dynamic function calls. These are discussed in Chapter 23.
XQuery comments, delimited by (:
and :)
, can be added to any query to provide more information about the query itself. These comments are ignored during processing. XQuery comments can contain any text, including XML markup. For example:
(: This query returns the <number> children :)
XQuery comments can appear anywhere insignificant whitespace is allowed in a query. If they appear within quoted strings, or directly in the content of element constructors, they are not interpreted as comments. XQuery comments can be nested within other XQuery comments.
You can also include XML comments, delimited by <!--
and -->
, in your queries. Unlike XQuery comments, these comments appear in the result document. They can include expressions that are evaluated, making them a useful debugging tool. XML comments are discussed further in “XML Comments”.
A query can contain many nested expressions that are not necessarily delimited by parentheses. Therefore, it is important to understand which expressions are evaluated first. In most cases, the precedence (also known as the evaluation order) of expressions is straightforward. For example, in the expression:
if ($x < 12 and $y > 0) then $x + $y else $x - $y
it is easy to see that the if
, then
, and else
keywords are all parts of the same expression that should be evaluated as a whole after all the sub-expressions have been evaluated. In the cases where it is not obvious, this book explains the precedence of that type of expression. For example, any and
operators are evaluated before or
operators, so that:
true() and true() or false() and false()
is the same as:
(
true() and true())
or(
false() and false())
If there is doubt in your mind regarding which expression is evaluated first, it is likely that others reading your query will be uncertain too. In this case, it is best to surround the expressions in question with parentheses. For example, you can change the previous if-then-else expression to:
if ((
$x < 12)
and(
$y > 0)
) then(
$x + $y)
else(
$x - $y)
The meaning is exactly the same, but the precedence is clearer. Parentheses can also be used to change the precedence. For example, if you change the true/false example to:
true() and(
true() or false())
and false()
it now has a different value (false
) because the or
expression is evaluated first.
Comparison expressions are used to compare values. There are three kinds of comparison expressions: general, value, and node.
General comparisons are used for comparing atomic values or nodes that contain atomic values. Table 3-2 shows some examples of general comparisons. They use the operators =
(equal to), !=
(not equal to), <
(less than), <=
(less than or equal to), >
(greater than), and >=
(greater than or equal to). Unlike in XSLT, you don’t need to escape the <
operator as <
. In fact, it won’t be recognized if you do.
Example | Value |
---|---|
doc("catalog.xml")/catalog/product[2]/name = 'Floppy Sun Hat' | true |
doc("catalog.xml")/catalog/product[4]/number < 500 | false |
1 > 2 | false |
() = (1, 2) | false |
(2, 5) > (1, 3) | true |
1 = "2" | Error XPTY0004 |
(1, "a") = (2, "b") | Error XPTY0004 |
If either operand is the empty sequence, the expression evaluates to false
.
General comparisons can operate on sequences of more than one item, as well as empty sequences. If one or both of the operands is a sequence of more than one item, the expression evaluates to true
if the corresponding value comparison is true for any combination of two items from the two sequences. For example, the expression (2, 5) < (1, 3)
returns true
if one or more of the following conditions is true:
2 is less than 1
2 is less than 3
5 is less than 1
5 is less than 3
This example returns true
because 2 is less than 3. The expression (2, 5) > (1, 3)
also returns true
because there are values in the first sequence that are greater than values in the second sequence.
General comparisons are useful for determining if any values in a sequence meet a particular criterion. For example, if you want to return all the products that are in either the ACC or the WMN department, you can use the expression:
doc("catalog.xml")/catalog/product[@dept = ("ACC", "WMN")]
This expression is true
if the dept
attribute is equal to at least one of the two values.
When comparing two values, their types are taken into account. Values of like types (e.g., both numeric or both strings) can always be tested for equality using the =
and !=
operators. Usually, values of like types can also be compared using less than or greater than operators (<
, ><=
, >
, >=
), although there are a few less common types (such as xs:QName
and xs:gYear
) that do not support less than or greater than comparisons. If the values have different types that cannot be compared to each other, the processor may raise type error XPTY0004
, as shown in the last two rows of Table 3-2.
When comparing any two of the atomic values in each operand, if one value is typed, and the other is untyped, the untyped value is cast to the other value’s type (or to xs:double
if the specific type is numeric). For example, you can compare the untyped value of a number
element with the xs:integer
500, as long as the number
element’s content can be cast to xs:double
. If both operands are untyped, they are compared as strings.
Value comparisons differ fundamentally from general comparisons in that they can only operate on single atomic values. They use the operators eq
(equal to), ne
(not equal to), lt
(less than), le
(less than or equal to), gt
(greater than), and ge
(greater than or equal to). Table 3-3 shows some examples.
Example | Value |
---|---|
3 gt 4 | false |
"abc" lt "def" | true |
doc("catalog.xml")/catalog/product[4]/number lt 500 | Error XPTY0004 , if number is untyped or non-numeric |
<a>3</a> gt <z>2</z> | true |
<a>03</a> gt <z>2</z> | false , since a and z are untyped and treated like strings |
() eq 1 | () |
1 eq "2" | Error XPTY0004 |
(1, 2) eq (1, 2) | Error XPTY0004 |
Unlike general comparisons, if either operand is the empty sequence, the empty sequence is returned. In this respect, the empty sequence behaves like null in SQL.
Each operand of a value comparison must be either a single atomic value, a single node that contains a single atomic value, or the empty sequence. If either operand is a sequence of more than one item, type error XPTY0004
is raised. For example, the expression:
doc("catalog.xml")/catalog/product/@dept eq "ACC"
raises an error, because the path expression on the left side of the operator returns more than one dept
attribute. The difference between general and value comparisons is especially important in the predicates of path expressions.
When comparing typed values, value comparisons have similar restrictions to general comparisons. The two operands must have comparable types. For example, you cannot compare the string "4"
with the integer 3
. In this case, one value must be explicitly cast to the other’s type, as in:
xs:integer(
"4")
gt 3
However, value comparisons treat untyped data differently from general comparisons. Untyped values are always treated like strings by value comparisons. This means that if you have two untyped elements that contain numbers, they will be compared as strings unless you explicitly cast them to numbers. For example, the expression:
xs:integer(
$prodNum1)
gtxs:integer(
$prodNum2)
explicitly casts the two variables to the type xs:integer
.
You also must perform an explicit cast if you are comparing the value of an untyped element to a numeric literal. For example, the expression:
doc("catalog.xml")/catalog/product[1]/number gt 1
will raise type error XPTY0004
if the number
element is untyped, because you are essentially comparing a string to a number. Because of these complexities, you may prefer to use general comparisons if you are using untyped data.
Another type of comparison is the node comparison. To determine whether two operands are actually the same node, you can use the is
operator. Each of the operands must be a single node or the empty sequence. If one of the operands is the empty sequence, the result is the empty sequence.
The is
operator compares the nodes based on their identity rather than by any value they may contain. To compare the contents and attributes of two nodes, you can use the deep-equal
built-in function instead.
Table 3-4 shows some examples of node comparisons. They assume that the variables $n1
and $n2
are bound to two different nodes, as shown in the following variable declarations:
declare variable $n1 := doc("catalog.xml")/catalog/product[2]; declare variable $n2 := doc("catalog.xml")/catalog/product[3];
Example | Value |
---|---|
$n1 is $n2 | false |
$n1 is $n1 | true |
$n1 is doc("catalog.xml")//product[number = 563] | true |
$n1/@dept is $n2/@dept | false |
In the last example of the table, even though the second and third products have the same value for their dept
attributes, they are two distinct attribute nodes.
if
-then
-else
) ExpressionsXQuery allows conditional expressions using the keywords if
, then
, and else
. The syntax of a conditional expression is shown in Figure 3-1.
The expression after the if
keyword is known as the test expression. It must be enclosed in parentheses. If the test expression evaluates to true
, the value of the entire conditional expression is the value of the then-expression. Otherwise, it is the value of the else-expression.
Example 3-1 shows a conditional expression (embedded in a FLWOR).
Query
for
$
prod
in
(
doc
(
"catalog.xml"
)/
catalog
/
product
)
return
if
(
$
prod
/
@dept
=
"ACC"
)
then
<accessoryNum>
{
data
(
$
prod
/
number
)}
</accessoryNum>
else
<otherNum>
{
data
(
$
prod
/
number
)}
</otherNum>
Results
<otherNum>
557</otherNum>
<accessoryNum>
563</accessoryNum>
<accessoryNum>
443</accessoryNum>
<otherNum>
784</otherNum>
If the then-expression and else-expression are single expressions, they are not required to be in parentheses. However, to return the results of multiple expressions, they need to be concatenated together using a sequence constructor. For example, if in Example 3-1 you wanted to return an accessoryName
element in addition to accessoryNum
, you would be required to separate the two elements by commas and surround them with parentheses, effectively constructing a sequence of two elements. This is shown in Example 3-2.
Query
for
$
prod
in
(
doc
(
"catalog.xml"
)/
catalog
/
product
)
return
if
(
$
prod
/
@dept
=
"ACC"
)
then
(
<accessoryNum>
{
data
(
$
prod
/
number
)}
</accessoryNum>
,
<accessoryName>
{
data
(
$
prod
/
name
)}
</accessoryName>
)
else
<otherNum>
{
data
(
$
prod
/
number
)}
</otherNum>
Results
<otherNum>
557</otherNum>
<accessoryNum>
563</accessoryNum>
<accessoryName>
Floppy Sun Hat</accessoryName>
<accessoryNum>
443</accessoryNum>
<accessoryName>
Deluxe Travel Bag</accessoryName>
<otherNum>
784</otherNum>
The else
keyword and the else-expression are required. However, if you want the else-expression to evaluate to nothing, it can simply be ()
(the empty sequence).
The test expression is interpreted as an xs:boolean
value by calculating its effective boolean value. This means that if it evaluates to the xs:boolean
value false
, the number 0 or NaN
(i.e., not a number), a zero-length string, or the empty sequence, it is considered false
. Otherwise, it is generally considered true
. For example, the expression:
if (doc("order.xml")//item) then "Item List: " else ""
returns the string Item List:
if there are any item
elements in the order document. The test expression doc("order.xml")//item
returns a sequence of element nodes rather than a Boolean value, but its effective boolean value is true
. Effective boolean value is discussed in more detail in “Effective Boolean Value”.
You can also nest conditional expressions, as shown in Example 3-3. This provides an “else if” construct.
Query
for
$
prod
in
(
doc
(
"catalog.xml"
)/
catalog
/
product
)
return
if
(
$
prod
/
@dept
=
"ACC"
)
then
<accessory>
{
data
(
$
prod
/
number
)}
</accessory>
else
if
(
$
prod
/
@dept
=
"WMN"
)
then
<womens>
{
data
(
$
prod
/
number
)}
</womens>
else
if
(
$
prod
/
@dept
=
"MEN"
)
then
<mens>
{
data
(
$
prod
/
number
)}
</mens>
else
<other>
{
data
(
$
prod
/
number
)}
</other>
Results
<womens>
557</womens>
<accessory>
563</accessory>
<accessory>
443</accessory>
<mens>
784</mens>
Switch expressions, new in version 3.0, are used to branch to one of several expressions based on a particular value. For example, assuming you have a variable named $department
that was previously bound to a value, you can return one of several options depending on the value of $department
:
switch ($department) case "ACC" return "Accessories" case "MEN" return "Men's" case "WMN" return "Women's" default return "Other"
The processor proceeds through the case
clauses and compares $department
(known as the switch operand expression) to the expression after the case
keyword (known as the case operand expression). If they are equal, it returns the expression after the return
keyword. It chooses only the first case
clause that applies; if multiple case causes apply, the later ones are ignored.
The default return
keywords are required and are used to specify what to return if none of the case
clauses apply. If nothing should be returned in that case, the empty sequence can be specified by using the clause default return ()
. The syntax of a switch expression is shown in Figure 3-2.
You can have multiple case
keywords for a single return
, meaning that if any of those cases apply, that return
clause is used. In the following example, if $department
is equal to either MEN
or WMN
, the string Clothing
is returned.
switch ($department) case "ACC" return "Accessories" case "MEN" case "WMN" return "Clothing" default return "Other"
The switch operand expression (in parentheses) must evaluate to either the empty sequence or a single value, not a sequence of multiple values. If it evaluates to a node, for example an element or attribute, it is atomized, meaning that an atomic value is extracted from its contents. Likewise, the case operand expression (after case
) must also evaluate to zero or one values, with atomization occurring if necessary. The two atomic values are then compared, taking into account their data types.
A switch expression is similar to a nested conditional if-then-else expression except that it is based on a particular value rather than a Boolean. For example, you could rewrite the conditional expression in Example 3-3 by using a switch expression, as shown in Example 3-4.
Query
xquery
version
"3.0"
;
for
$
prod
in
(
doc
(
"catalog.xml"
)/
catalog
/
product
)
return
switch
(
$
prod
/
@dept
)
case
"ACC"
return
<accessory>
{
data
(
$
prod
/
number
)}
</accessory>
case
"WMN"
return
<womens>
{
data
(
$
prod
/
number
)}
</womens>
case
"MEN"
return
<mens>
{
data
(
$
prod
/
number
)}
</mens>
default
return
<other>
{
data
(
$
prod
/
number
)}
</other>
Results
<womens>
557</womens>
<accessory>
563</accessory>
<accessory>
443</accessory>
<mens>
784</mens>
Example 3-4 exhibits the fact that although the switch operand expression and the case operand expression have to evaluate to atomic values, the return
expression can return any number of any items of any kind, including an element node in this case.
and
/or
) ExpressionsLogical expressions combine Boolean values by using the operators and
and or
. They are most often used in conditional (if-then-else
) expressions, where
clauses of FLWORs and path expression predicates. However, they can be used anywhere a Boolean value is expected.
For example, when used in a conditional expression:
if ($isDiscounted and
$discount > 10) then 10 else $discount
an and
expression returns true
if both of its operands are true. An or
expression evaluates to true
if one or both of its operands is true.
As with conditional test expressions, the effective boolean value of each of the operands is evaluated. This means that if the operand expression evaluates to a Boolean false
value, the number 0 or NaN
, a zero-length string, or the empty sequence, it is considered false
; otherwise, it is generally considered true
. For example:
$order/item and $numItems
returns true
if there is at least one item
child of $order
, and $numItems
(assuming it is numeric) is not equal to 0 or NaN
.
The logical operators have lower precedence than comparison operators do, so you can use:
$x<
12and
$y>
15
without parenthesizing the two comparison expressions.
You can also chain multiple and
and or
expressions together. The and
operator takes precedence over the or
operator. Therefore:
true()and
true()or
false()and
false()
is the same as:
(
true()and
true()) or (
false()and
false())
and evaluates to true
. It is not equal to:
true()and (
true()or
false()) and
false()
which evaluates to false
.
You can negate any Boolean value by using the not
function, which turns false
to true
and true
to false
. Because not
is a function rather than a keyword, you are required to use parentheses around the value that you are negating.
The not
function accepts a sequence of items, from which it calculates the effective boolean value before negating it. This means that if the argument evaluates to the xs:boolean
value false
, the number 0 or NaN
, a zero-length string, or the empty sequence, the not
function returns true
. In most other cases, it returns false
.
Table 3-5 shows some examples of the not
function.
Example | Return value |
---|---|
not(true()) | false |
not(12 > 0) | false |
not(doc("catalog.xml")/catalog/product) | false if there is at least one product child of catalog in catalog.xml |
not( () ) | true |
not("") | true |
There is a subtle but important difference between using the !=
operator and calling the not
function with an expression that uses the =
operator. For example, the expression $prod/@dept != "ACC"
returns:
true
if the $prod
element has a dept
attribute that is not equal to ACC
false
if it has a dept
attribute that is equal to ACC
false
if it does not have a dept
attribute
On the other hand, not($prod/@dept = "ACC")
will return true
in the third case—that is, if the $prod
element does not have a dept
attribute. This is because the $prod/@dept
expression returns the empty sequence, which results in the comparison evaluating to false
. The not
function will negate this and return true
.