Chapter 3. Expressions: XQuery Building Blocks

The basic unit of evaluation in the XQuery language is the expression. A query contains expressions that can be made up of a number of sub-expressions, which may themselves be composed from other sub-expressions. This chapter explains the XQuery syntax, and covers the most basic types of expressions that can be used in queries: literals, variables, function calls, and comments.

Categories of Expressions

A query can range in complexity from a single expression such as 2+3, to a complex composite expression like a FLWOR. Within a FLWOR, there may be other expressions, such as $prodDept = "ACC", which is a comparison expression, and doc("catalog.xml")/catalog/product, which is a path expression. Within these expressions are further expressions, such as "ACC", which is a literal, and $prodDept, which is a variable reference. Every expression evaluates to a sequence, which may be a single item (for example, an atomic value or node), the empty sequence, or multiple items.

The categories of expressions available are summarized in Table 3-1, along with a reference to the chapter or section that covers them.

Table 3-1. Categories of expressions
CategoryDescriptionOperators or keywordsChapter/Section
PrimaryThe basics: literals, variables, function calls, and parenthesized expressionsChapter 3
ComparisonComparison based on value, node identity, or document order=, !=, <, <=, >, >=, eq, ne, lt, le, gt, ge, is, <<, >>“Comparison Expressions”
String concatenationConcatenating two strings||“Concatenating Strings”
String constructionInterspersing strings with expressions``[ ]``“String Constructors”
ConditionalIf-then-else expressionsif, then, else“Conditional (if-then-else) Expressions”
SwitchSwitch expressionsswitch, case“Switch Expressions”
LogicalBoolean and/or operatorsor, and“Logical (and/or) Expressions”
PathSelecting nodes from XML documents/, //, .., ., child::, etc.Chapter 4
Simple mapIterating through items!“The Simple Map Operator”
ConstructorAdding XML to the results<, >, element, attributeChapter 5
FLWORControlling the selection and processing of nodesfor, let, where, order by, group by, count, return“FLWOR Expressions”
QuantifiedDetermining whether sequences fulfill specific conditionssome, every, in, satisfies“Quantified Expressions”
Sequence-relatedCreating and combining sequencesto, union (|), intersect, exceptChapter 9
Type-relatedCasting and validating values based on typeinstance of, typeswitch, cast as, castable as, treat as, validateChapter 11, Chapter 15
ArithmeticAdding, subtracting, multiplying, and dividing+, -, *, div, idiv, modChapter 17

Keywords and Names

The XQuery language uses a number of keywords and symbols in its expressions. All of the keywords are case-sensitive, and they are generally lowercase. In some cases, a symbol (such as *) or keyword (such as in) has several meanings, depending on where they appear. The XQuery grammar is defined in such a way that these multi-use operators are never ambiguous.

Names are used in XQuery to identify elements, attributes, types, variables, and functions. These names must conform to the rules for XML qualified names, meaning that they can start with a letter or underscore and contain letters, digits, underscores, hyphens, and periods. Like the keywords, they are also case-sensitive. Because there are no reserved words in the XQuery language, a name (for example, a variable or function name) used in a query may be the same as any of the keywords, without any ambiguity arising.

All names used in XQuery are namespace-qualified names. This means that they can be prefixed in order to associate them with a namespace name, and they may be affected by default namespace declarations.

Whitespace in Queries

Whitespace (spaces, tabs, and line breaks) is allowed almost anywhere in a query to break up expressions and make queries more readable. You are required to use whitespace to separate keywords from each other—for example, order by cannot be written as orderby. Extra whitespace is acceptable, as in order   by. By contrast, you are not required to use whitespace as a separator when using non-word symbols such as = and (. For example, you can use a=b or a = b.

In most cases, whitespace used in queries has no significance. Whitespace is significant in quoted strings, e.g., in the expression "contains   spaces", and in constructed elements and attributes when it’s combined with other characters.

No special end-of-line characters are required in the XQuery language as they might be in some programming languages. Line-feed and carriage return characters are treated like any other whitespace.

Literals

Literals are simply constant values that are directly represented in a query, such as "ACC" and 29.99. They can be used in expressions anywhere a constant value is needed, for example the strings in the conditional expression:

if ($department = "ACC") then "accessories" else "other"

or the numbers 1 and 30 in the function call:

substring($name, 1, 30)

There are two kinds of literals: string literals, which must be enclosed in single or double quotes, and numeric literals, which must not. Numeric literals can take the form of simple integers, such as 1, decimal numbers, such as 1.5, or floating-point numbers, such as 1.5E2. The processor makes assumptions about the type of a numeric literal based on its format.

You can also use type constructors to convert your literal values to the desired type. For example, to include a literal date in an expression, you can use xs:date("2015-05-03"). For literal Boolean values, you can use the function calls true() and false().

Variables

Variables in XQuery are identified by names that are preceded by a dollar sign ($). The names (not including the dollar sign) must conform to the definition of an XML-qualified name. This means that they can be prefixed, in which case they are associated with the namespace bound to that prefix. If they are not prefixed, they are not associated with any namespace.

When a query is evaluated, a variable is bound to a particular value. That value may be any sequence, including a single item such as a node or atomic value, the empty sequence, or multiple items. Once the variable is bound to a value, its value does not change. One consequence of this is that you cannot bind a new value to the variable as you can in most procedural languages. Instead, you must use a new variable.

Variables can be bound in several kinds of expressions, including global variable declarations, for or let clauses of a FLWOR, quantified expressions, or switch expressions. For example, evaluation of the FLWOR:

for $prod in doc("catalog.xml")/catalog/product
return $prod/number

binds the $prod variable to a product element node. The variable is then referenced in the return clause. Function declarations also bind variables to values. For example, the function declaration:

declare function local:addTwo ($value as xs:integer) as xs:integer
   { $value + 2 };

binds the $value variable to the value of the argument passed to it. In this case, the $value variable is referenced in the function body.

Function Calls

Function calls are another building block of queries. A typical function call might look like:

substring($prodName, 1, 5)

where the name of the function is substring and there are three arguments, separated by commas and surrounded by parentheses. The first argument is a variable reference, whereas the other two are numeric literals.

The XQuery language has almost 200 built-in functions, detailed in Appendix A. Chapter 8 explains the details of the rules for calling functions based on their signatures. It also explains how to define your own functions.

Starting in version 3.0 there are additional options for calling functions, including partial function application and dynamic function calls. These are discussed in Chapter 23.

Comments

XQuery comments, delimited by (: and :), can be added to any query to provide more information about the query itself. These comments are ignored during processing. XQuery comments can contain any text, including XML markup. For example:

(: This query returns the <number> children :)

XQuery comments can appear anywhere insignificant whitespace is allowed in a query. If they appear within quoted strings, or directly in the content of element constructors, they are not interpreted as comments. XQuery comments can be nested within other XQuery comments.

You can also include XML comments, delimited by <!-- and -->, in your queries. Unlike XQuery comments, these comments appear in the result document. They can include expressions that are evaluated, making them a useful debugging tool. XML comments are discussed further in “XML Comments”.

Precedence and Parentheses

A query can contain many nested expressions that are not necessarily delimited by parentheses. Therefore, it is important to understand which expressions are evaluated first. In most cases, the precedence (also known as the evaluation order) of expressions is straightforward. For example, in the expression:

if ($x < 12 and $y > 0)
then $x + $y
else $x - $y

it is easy to see that the if, then, and else keywords are all parts of the same expression that should be evaluated as a whole after all the sub-expressions have been evaluated. In the cases where it is not obvious, this book explains the precedence of that type of expression. For example, any and operators are evaluated before or operators, so that:

true() and true() or false() and false()

is the same as:

(true() and true()) or (false() and false())

If there is doubt in your mind regarding which expression is evaluated first, it is likely that others reading your query will be uncertain too. In this case, it is best to surround the expressions in question with parentheses. For example, you can change the previous if-then-else expression to:

if (($x < 12) and ($y > 0))
then ($x + $y)
else ($x - $y)

The meaning is exactly the same, but the precedence is clearer. Parentheses can also be used to change the precedence. For example, if you change the true/false example to:

true() and (true() or false()) and false()

it now has a different value (false) because the or expression is evaluated first.

Comparison Expressions

Comparison expressions are used to compare values. There are three kinds of comparison expressions: general, value, and node.

General Comparisons

General comparisons are used for comparing atomic values or nodes that contain atomic values. Table 3-2 shows some examples of general comparisons. They use the operators = (equal to), != (not equal to), < (less than), <= (less than or equal to), > (greater than), and >= (greater than or equal to). Unlike in XSLT, you don’t need to escape the < operator as &lt;. In fact, it won’t be recognized if you do.

Table 3-2. General comparisons
ExampleValue
doc("catalog.xml")/catalog/product[2]/name = 'Floppy Sun Hat'true
doc("catalog.xml")/catalog/product[4]/number < 500false
1 > 2false
() = (1, 2)false
(2, 5) > (1, 3)true
1 = "2"Error XPTY0004
(1, "a") = (2, "b")Error XPTY0004

If either operand is the empty sequence, the expression evaluates to false.

General comparisons on multi-item sequences

General comparisons can operate on sequences of more than one item, as well as empty sequences. If one or both of the operands is a sequence of more than one item, the expression evaluates to true if the corresponding value comparison is true for any combination of two items from the two sequences. For example, the expression (2, 5) < (1, 3) returns true if one or more of the following conditions is true:

  • 2 is less than 1

  • 2 is less than 3

  • 5 is less than 1

  • 5 is less than 3

This example returns true because 2 is less than 3. The expression (2, 5) > (1, 3) also returns true because there are values in the first sequence that are greater than values in the second sequence.

General comparisons are useful for determining if any values in a sequence meet a particular criterion. For example, if you want to return all the products that are in either the ACC or the WMN department, you can use the expression:

doc("catalog.xml")/catalog/product[@dept = ("ACC", "WMN")]

This expression is true if the dept attribute is equal to at least one of the two values.

General comparisons and types

When comparing two values, their types are taken into account. Values of like types (e.g., both numeric or both strings) can always be tested for equality using the = and != operators. Usually, values of like types can also be compared using less than or greater than operators (<, ><=, >, >=), although there are a few less common types (such as xs:QName and xs:gYear) that do not support less than or greater than comparisons. If the values have different types that cannot be compared to each other, the processor may raise type error XPTY0004, as shown in the last two rows of Table 3-2.

When comparing any two of the atomic values in each operand, if one value is typed, and the other is untyped, the untyped value is cast to the other value’s type (or to xs:double if the specific type is numeric). For example, you can compare the untyped value of a number element with the xs:integer 500, as long as the number element’s content can be cast to xs:double. If both operands are untyped, they are compared as strings.

Value Comparisons

Value comparisons differ fundamentally from general comparisons in that they can only operate on single atomic values. They use the operators eq (equal to), ne (not equal to), lt (less than), le (less than or equal to), gt (greater than), and ge (greater than or equal to). Table 3-3 shows some examples.

Table 3-3. Value comparisons
ExampleValue
3 gt 4false
"abc" lt "def"true
doc("catalog.xml")/catalog/product[4]/number lt 500Error XPTY0004, if number is untyped or non-numeric
<a>3</a> gt <z>2</z>true
<a>03</a> gt <z>2</z>false, since a and z are untyped and treated like strings
() eq 1()
1 eq "2"Error XPTY0004
(1, 2) eq (1, 2)Error XPTY0004

Unlike general comparisons, if either operand is the empty sequence, the empty sequence is returned. In this respect, the empty sequence behaves like null in SQL.

Each operand of a value comparison must be either a single atomic value, a single node that contains a single atomic value, or the empty sequence. If either operand is a sequence of more than one item, type error XPTY0004 is raised. For example, the expression:

doc("catalog.xml")/catalog/product/@dept eq "ACC"

raises an error, because the path expression on the left side of the operator returns more than one dept attribute. The difference between general and value comparisons is especially important in the predicates of path expressions.

When comparing typed values, value comparisons have similar restrictions to general comparisons. The two operands must have comparable types. For example, you cannot compare the string "4" with the integer 3. In this case, one value must be explicitly cast to the other’s type, as in:

xs:integer("4") gt 3

However, value comparisons treat untyped data differently from general comparisons. Untyped values are always treated like strings by value comparisons. This means that if you have two untyped elements that contain numbers, they will be compared as strings unless you explicitly cast them to numbers. For example, the expression:

xs:integer($prodNum1) gt xs:integer($prodNum2)

explicitly casts the two variables to the type xs:integer.

You also must perform an explicit cast if you are comparing the value of an untyped element to a numeric literal. For example, the expression:

doc("catalog.xml")/catalog/product[1]/number gt 1

will raise type error XPTY0004 if the number element is untyped, because you are essentially comparing a string to a number. Because of these complexities, you may prefer to use general comparisons if you are using untyped data.

Node Comparisons

Another type of comparison is the node comparison. To determine whether two operands are actually the same node, you can use the is operator. Each of the operands must be a single node or the empty sequence. If one of the operands is the empty sequence, the result is the empty sequence.

The is operator compares the nodes based on their identity rather than by any value they may contain. To compare the contents and attributes of two nodes, you can use the deep-equal built-in function instead.

Table 3-4 shows some examples of node comparisons. They assume that the variables $n1 and $n2 are bound to two different nodes, as shown in the following variable declarations:

declare variable $n1 := doc("catalog.xml")/catalog/product[2];
declare variable $n2 := doc("catalog.xml")/catalog/product[3];
Table 3-4. Node comparisons
ExampleValue
$n1 is $n2false
$n1 is $n1true
$n1 is doc("catalog.xml")//product[number = 563]true
$n1/@dept is $n2/@deptfalse

In the last example of the table, even though the second and third products have the same value for their dept attributes, they are two distinct attribute nodes.

Conditional (if-then-else) Expressions

XQuery allows conditional expressions using the keywords if, then, and else. The syntax of a conditional expression is shown in Figure 3-1.

Figure 3-1. Syntax of a conditional expression

The expression after the if keyword is known as the test expression. It must be enclosed in parentheses. If the test expression evaluates to true, the value of the entire conditional expression is the value of the then-expression. Otherwise, it is the value of the else-expression.

Example 3-1 shows a conditional expression (embedded in a FLWOR).

Example 3-1. Conditional expression

Query

for $prod in (doc("catalog.xml")/catalog/product)
return if ($prod/@dept = "ACC")
       then <accessoryNum>{data($prod/number)}</accessoryNum>
       else <otherNum>{data($prod/number)}</otherNum>

Results

<otherNum>557</otherNum>
<accessoryNum>563</accessoryNum>
<accessoryNum>443</accessoryNum>
<otherNum>784</otherNum>

If the then-expression and else-expression are single expressions, they are not required to be in parentheses. However, to return the results of multiple expressions, they need to be concatenated together using a sequence constructor. For example, if in Example 3-1 you wanted to return an accessoryName element in addition to accessoryNum, you would be required to separate the two elements by commas and surround them with parentheses, effectively constructing a sequence of two elements. This is shown in Example 3-2.

Example 3-2. Conditional expression returning multiple expressions

Query

for $prod in (doc("catalog.xml")/catalog/product)
return if ($prod/@dept = "ACC")
       then (<accessoryNum>{data($prod/number)}</accessoryNum>,
            <accessoryName>{data($prod/name)}</accessoryName>)
       else <otherNum>{data($prod/number)}</otherNum>

Results

<otherNum>557</otherNum>
<accessoryNum>563</accessoryNum>
<accessoryName>Floppy Sun Hat</accessoryName>
<accessoryNum>443</accessoryNum>
<accessoryName>Deluxe Travel Bag</accessoryName>
<otherNum>784</otherNum>

The else keyword and the else-expression are required. However, if you want the else-expression to evaluate to nothing, it can simply be () (the empty sequence).

Conditional Expressions and Effective Boolean Value

The test expression is interpreted as an xs:boolean value by calculating its effective boolean value. This means that if it evaluates to the xs:boolean value false, the number 0 or NaN (i.e., not a number), a zero-length string, or the empty sequence, it is considered false. Otherwise, it is generally considered true. For example, the expression:

if (doc("order.xml")//item) then "Item List: " else ""

returns the string Item List: if there are any item elements in the order document. The test expression doc("order.xml")//item returns a sequence of element nodes rather than a Boolean value, but its effective boolean value is true. Effective boolean value is discussed in more detail in “Effective Boolean Value”.

Nesting Conditional Expressions

You can also nest conditional expressions, as shown in Example 3-3. This provides an “else if” construct.

Example 3-3. Nested conditional expressions

Query

for $prod in (doc("catalog.xml")/catalog/product)
return if ($prod/@dept = "ACC")
       then <accessory>{data($prod/number)}</accessory>
       else if ($prod/@dept = "WMN")
            then <womens>{data($prod/number)}</womens>
            else if ($prod/@dept = "MEN")
                 then <mens>{data($prod/number)}</mens>
                 else <other>{data($prod/number)}</other>

Results

<womens>557</womens>
<accessory>563</accessory>
<accessory>443</accessory>
<mens>784</mens>

Switch Expressions

Switch expressions, new in version 3.0, are used to branch to one of several expressions based on a particular value. For example, assuming you have a variable named $department that was previously bound to a value, you can return one of several options depending on the value of $department:

switch ($department)
  case "ACC" return "Accessories"
  case "MEN" return "Men's"
  case "WMN" return "Women's"
  default return "Other"

The processor proceeds through the case clauses and compares $department (known as the switch operand expression) to the expression after the case keyword (known as the case operand expression). If they are equal, it returns the expression after the return keyword. It chooses only the first case clause that applies; if multiple case causes apply, the later ones are ignored.

The default return keywords are required and are used to specify what to return if none of the case clauses apply. If nothing should be returned in that case, the empty sequence can be specified by using the clause default return (). The syntax of a switch expression is shown in Figure 3-2.

Figure 3-2. Syntax of a switch expression

You can have multiple case keywords for a single return, meaning that if any of those cases apply, that return clause is used. In the following example, if $department is equal to either MEN or WMN, the string Clothing is returned.

switch ($department)
  case "ACC" return "Accessories"
  case "MEN"
  case "WMN" return "Clothing"
  default return "Other"

The switch operand expression (in parentheses) must evaluate to either the empty sequence or a single value, not a sequence of multiple values. If it evaluates to a node, for example an element or attribute, it is atomized, meaning that an atomic value is extracted from its contents. Likewise, the case operand expression (after case) must also evaluate to zero or one values, with atomization occurring if necessary. The two atomic values are then compared, taking into account their data types.

A switch expression is similar to a nested conditional if-then-else expression except that it is based on a particular value rather than a Boolean. For example, you could rewrite the conditional expression in Example 3-3 by using a switch expression, as shown in Example 3-4.

Example 3-4. Switch expression similar to nested conditional expressions

Query

xquery version "3.0";
for $prod in (doc("catalog.xml")/catalog/product)
return switch($prod/@dept)
         case "ACC" return <accessory>{data($prod/number)}</accessory>
         case "WMN" return <womens>{data($prod/number)}</womens>
         case "MEN" return <mens>{data($prod/number)}</mens>
         default return <other>{data($prod/number)}</other>

Results

<womens>557</womens>
<accessory>563</accessory>
<accessory>443</accessory>
<mens>784</mens>

Example 3-4 exhibits the fact that although the switch operand expression and the case operand expression have to evaluate to atomic values, the return expression can return any number of any items of any kind, including an element node in this case.

Logical (and/or) Expressions

Logical expressions combine Boolean values by using the operators and and or. They are most often used in conditional (if-then-else) expressions, where clauses of FLWORs and path expression predicates. However, they can be used anywhere a Boolean value is expected.

For example, when used in a conditional expression:

if ($isDiscounted and $discount > 10) then 10 else $discount

an and expression returns true if both of its operands are true. An or expression evaluates to true if one or both of its operands is true.

As with conditional test expressions, the effective boolean value of each of the operands is evaluated. This means that if the operand expression evaluates to a Boolean false value, the number 0 or NaN, a zero-length string, or the empty sequence, it is considered false; otherwise, it is generally considered true. For example:

$order/item and $numItems

returns true if there is at least one item child of $order, and $numItems (assuming it is numeric) is not equal to 0 or NaN.

Precedence of Logical Expressions

The logical operators have lower precedence than comparison operators do, so you can use:

$x < 12 and $y > 15

without parenthesizing the two comparison expressions.

You can also chain multiple and and or expressions together. The and operator takes precedence over the or operator. Therefore:

true() and true() or false() and false()

is the same as:

(true() and true()) or (false() and false())

and evaluates to true. It is not equal to:

true() and (true() or false()) and false()

which evaluates to false.

Negating a Boolean Value

You can negate any Boolean value by using the not function, which turns false to true and true to false. Because not is a function rather than a keyword, you are required to use parentheses around the value that you are negating.

The not function accepts a sequence of items, from which it calculates the effective boolean value before negating it. This means that if the argument evaluates to the xs:boolean value false, the number 0 or NaN, a zero-length string, or the empty sequence, the not function returns true. In most other cases, it returns false.

Table 3-5 shows some examples of the not function.

Table 3-5. Examples of the not function
ExampleReturn value
not(true())false
not(12 > 0)false
not(doc("catalog.xml")/catalog/product)false if there is at least one product child of catalog in catalog.xml
not( () )true
not("")true

There is a subtle but important difference between using the != operator and calling the not function with an expression that uses the = operator. For example, the expression $prod/@dept != "ACC" returns:

  • true if the $prod element has a dept attribute that is not equal to ACC

  • false if it has a dept attribute that is equal to ACC

  • false if it does not have a dept attribute

On the other hand, not($prod/@dept = "ACC") will return true in the third case—that is, if the $prod element does not have a dept attribute. This is because the $prod/@dept expression returns the empty sequence, which results in the comparison evaluating to false. The not function will negate this and return true.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset