This chapter describes the facilities in XQuery for selecting, filtering, and joining data from one or more input documents. It covers the syntax of FLWOR expressions and quantified expressions.
Chapter 4 described how to use path expressions to select elements from input documents. For example, the expression:
doc("catalog.xml")//product[@dept = "ACC"]/name
can be used to select the names of all the products in the ACC department. You can add multiple predicates (expressions in square brackets) to filter the results based on more than one criterion. You can even add logical and other expressions to predicates, as in:
doc("catalog.xml")//product[@dept = "ACC" or @dept = "WMN"]/name
A path expression can be the entire content of a query; there is no requirement that there be a FLWOR expression in every query. Path expressions are useful for queries where no new elements and attributes are being constructed and the results don’t need to be sorted. A path expression can be preferable to a FLWOR because it is more compact and some implementations will be able to evaluate it faster.
FLWOR expressions, also known simply as FLWORs, are used for queries that are more complex. In addition to allowing more readable and structured selections, they allow functionality such as joining data from multiple sources, constructing new elements and attributes, evaluating functions on intermediate values, and sorting results.
FLWOR (pronounced “flower”), stands for “for, let, where, order by, return,” the keywords that are used in the expression. Example 6-1 shows a FLWOR that is equivalent to the second path expression from the previous section.
for
$
prod
in
doc
(
"catalog.xml"
)//
product
let
$
prodDept
:=
$
prod
/
@dept
where
$
prodDept
=
"ACC"
or
$
prodDept
=
"WMN"
return
$
prod
/
name
Of course, this is far more verbose, and for such a simple example, the path expression is preferable. However, this example is useful as an illustration before moving on to examples that are more complex. As you can see, the FLWOR is made up of several parts:
for
This clause sets up an iteration through the product
elements returned by the path expression. The variable $prod
is bound, in turn, to each product
in the sequence. The rest of the FLWOR is evaluated once for each product
, in this case, four times.
let
This clause binds the $prodDept
variable to the value of the dept
attribute.
where
This clause selects elements whose dept
attribute is equal to ACC
or WMN
.
return
This clause returns the name
child of each of the three product
elements that pass the where
clause.
The overall syntax of a FLWOR is shown in Figure 6-1.
The first clause in a FLWOR must be a for
, let
, or window
clause. After that, any number of any of the clauses listed may appear, in any order. The final clause is the required return
clause. The clauses of a FLWOR are listed in Table 6-1, along with a link to where they are covered fully in the book.
Keyword | Description | Chapter/Section |
---|---|---|
for | Sets up an iteration |
“The for Clause”
|
let | Binds a variable |
“The let Clause”
|
for tumbling window , for sliding window | Creates windows | “Windowing” |
where | Filters out elements |
“The where Clause”
|
group by | Groups results |
“Grouping Using the group by Clause”
|
order by | Sorts the results |
“The order by Clause”
|
count | Binds the position to a variable |
“Using the count Clause”
|
return | Specifies what to return |
“The return Clause”
|
FLWORs can be the whole query, or they can appear in other expressions such as in the return
clause of another FLWOR or even in a function call, as in:
max(for $prod in doc("catalog.xml")//product return xs:integer($prod/number))
The for
and return
keywords are aligned vertically here to make the structure of the FLWOR more obvious. This is generally good practice, although not always possible.
Let’s take a closer look at the main clauses that make up the FLWOR.
for
ClauseA for
clause, whose syntax is shown in Figure 6-2, sets up an iteration that allows the rest of the FLWOR to be evaluated multiple times, once for each item in the sequence returned by the expression after the in
keyword. This sequence, also known as the binding sequence, can evaluate to any sequence of zero, one, or more items. In the previous example, it was a sequence of product
elements, but it could also be atomic values, attribute nodes, or indeed items of any kind, or a mixture of items. If the binding sequence is the empty sequence, the rest of the FLWOR is simply not evaluated (it iterates zero times).
Additional features of the for
clause are described elsewhere in the book. The at
clause, which allows for positional variables, is described in “Working with Positions and Sequence Numbers”. The as
clause, which declares the type of the variable, is described in “Type Declarations in FLWORs”. The allowing empty
keywords, which allow outer joins, are described in “Outer joins with allowing empty
”.
The FLWOR expression with its for
clause is similar to loops in procedural languages such as C. However, one key difference is that in XQuery, because it is a functional language, the iterations are considered to be in no particular order. They do not necessarily occur sequentially, one after the other. One manifestation of this is that you cannot keep variable counters that are incremented with each iteration, or continuously append to the end of a string variable with each iteration. “Working with Positions and Sequence Numbers” provides more information about simulating counters.
Another useful technique is to supply a sequence of integers in the for
clause in order to specify the number of times to iterate. This can be accomplished through a range expression, which creates a sequence of consecutive integers. For example, the range expression 1 to 3
evaluates to a sequence of integers (1, 2, 3)
. The FLWOR shown in Example 6-2 iterates three times and returns three oneEval
elements.
Query
for
$
i
in
1
to
3
return
<oneEval>
{
$
i
}
</oneEval>
Results
<oneEval>
1</oneEval>
<oneEval>
2</oneEval>
<oneEval>
3</oneEval>
Range expressions can be included within parenthesized expressions, as in (1 to 3, 6, 8 to 10)
. They can also use variables, as in 1 to $prodCount
. Each of the expressions before and after the to
keyword must evaluate to an integer.
If the first integer is greater than the second, as in 3 to 1
, or if either operand is the empty sequence, the expression evaluates to the empty sequence. The reason for this is to ensure that for $i in 1 to count($seq)
does the expected thing even if $seq
is an empty sequence.
You can use the reverse
function if you want to descend in value, as in:
for $i in reverse(1 to 3)
You can also increment by some value other than 1 by using an expression like:
for $i in (1 to 100)[. mod 2 = 0]
which gives you every other number (2, 4, 6, etc.) up to 100.
for
clausesYou can use multiple for
clauses in a FLWOR, which is similar to nested loops in a programming language. The result is that the rest of the FLWOR is evaluated for every combination of the values of the variables. Example 6-3 shows a query with two for
clauses, and demonstrates the order of the results.
for
clausesQuery
for
$
i
in
(
1
,
2
)
for
$
j
in
(
"a"
,
"b"
)
return
<oneEval>
i is
{
$
i
}
and j is
{
$
j
}
</oneEval>
Results
<oneEval>
i is 1 and j is a</oneEval>
<oneEval>
i is 1 and j is b</oneEval>
<oneEval>
i is 2 and j is a</oneEval>
<oneEval>
i is 2 and j is b</oneEval>
The order is significant; it uses the first value of the first variable ($i
) and iterates over the values of the second variable ($j
), then takes the second value of $i
and iterates over the values of $j
.
Also, multiple variables can be bound in a single for
clause, separated by commas. This has the same effect as using multiple for
clauses. The example shown in Example 6-4 returns the same results as Example 6-3. This syntax is shorter but can be less clear in the case of complex expressions.
for
clausefor
$
i
in
(
1
,
2
),
$
j
in
(
"a"
,
"b"
)
return
<oneEval>
i is
{
$
i
}
and j is
{
$
j
}
</oneEval>
Specifying multiple variable bindings (or multiple for
clauses) is especially useful for joining data. This is described further in “Joins”.
let
ClauseA let
clause is a convenient way to bind a variable to a value. Unlike a for
clause, a let
clause does not result in iteration; it binds the whole sequence to the variable rather than binding each item in turn. The let
clause serves as a programmatic convenience that avoids repeating the same expression multiple times. With some implementations, it may improve performance, because the expression is evaluated only once instead of each time it is needed.
The syntax of a let
clause is shown in Figure 6-3. (The as
clause, which declares the type of the variable, is described in “Type Declarations in FLWORs”.)
To illustrate the difference between for
and let
clauses, compare Example 6-5 with Example 6-2.
let
clause with a range expressionQuery
let
$
i
:=
(
1
to
3
)
return
<oneEval>
{
$
i
}
</oneEval>
Results
<oneEval>
1 2 3</oneEval>
The FLWOR with the let
clause returns only a single oneEval
element, because no iteration takes place and the return
clause is evaluated only once.
One or more let
clauses can be intermingled with one or more for
clauses. Each of the let
and for
clauses may reference a variable bound in any previous clause. Example 6-6 shows such a FLWOR.
for
and let
clauseslet
$
doc
:=
doc
(
"catalog.xml"
)
for
$
prod
in
$
doc
//
product
let
$
prodDept
:=
$
prod
/
@dept
let
$
prodName
:=
$
prod
/
name
where
$
prodDept
=
"ACC"
or
$
prodDept
=
"WMN"
return
$
prodName
As with for
clauses, adjacent let
clauses can be represented using a slightly shortened syntax that replaces the let
keyword with a comma, as in:
let $prodDept := $prod/@dept, $prodName := $prod/name
Another handy use for the let
clause is to perform several functions or operations in order. For example, suppose you want to take a string and replace all instances of at
with @
, replace all instances of dot
with a period (.
), and remove any remaining spaces. You could write the expression:
replace(replace(replace($myString, 'at', '@'), 'dot', '.'), ' ', '')
but that is difficult to read and debug, especially as more functions are added. An alternative is the expression:
let $myString2 := replace($myString, 'at', '@') let $myString3 := replace($myString2, 'dot', '.') let $myString4 := replace($myString3, ' ', '') return $myString4
which makes the query clearer.
where
ClauseThe where
clause is used to specify criteria that filter the results of the FLWOR. Its syntax is shown in Figure 6-4.
The where
clause can reference variables that were bound by a for
or let
clause. For example:
where $prodDept = "ACC" or $prodDept = "WMN"
references the $prodDept
variable. In addition to expressing complex filters, the where
clause is also very useful for joins.
A where
clause can be composed of many expressions joined by and
and or
keywords, as shown in Example 6-7.
where
clause with multiple expressionsfor
$
prod
in
doc
(
"catalog.xml"
)//
product
let
$
prodDept
:=
$
prod
/
@dept
where
$
prod
/
number
>
100
and
starts-with
(
$
prod
/
name
,
"F"
)
and
exists
(
$
prod
/
colorChoices
)
and
(
$
prodDept
=
"ACC"
or
$
prodDept
=
"WMN"
)
return
$
prod
Starting in version 3.0, it is also possible to have multiple where
clauses in the same FLWOR. In previous versions, only one was allowed.
Note that when using paths within the where
clause, they need to start with an expression that sets the context. For example, it has to say $prod/number > 100
rather than just number > 100
. Otherwise, the processor does not know where to look for the number
child.
The effective boolean value of the where
clause is calculated. This means that if the where
clause evaluates to a Boolean value false
, a zero-length string, the number 0 or NaN
, or the empty sequence, it is considered false
, and the return
clause of the FLWOR is not evaluated for that iteration. If the effective boolean value is true
, the return
clause is evaluated. For example, you could use:
where $prod/name
which returns true
if $prod
has a name
child, and false
if it does not. As another example, you could use:
where $numProds
which returns true
if $numProds
is a numeric value that is not zero (and not NaN
). However, these types of expressions are somewhat cryptic, and it is preferable to use clearer expressions, such as:
where exists($prod/name) and $numProds > 0
return
ClauseThe return
clause consists of the return
keyword followed by the single expression that is to be returned. It is evaluated once for each iteration, assuming the where
clause evaluated to true
. The result value of the entire FLWOR is a sequence of items returned by each evaluation of the return
clause. For example, the value of the entire FLWOR:
for $i in (1 to 3) return <oneEval>{$i}</oneEval>
is a sequence of three oneEval
elements, one for each time the return
clause was evaluated.
If more than one expression is to be included in the return
clause, they can be combined in a sequence. For example, the FLWOR:
for $i in (1 to 3) return(
<one>{$i}</one>,
<two>{$i}</two>)
returns a sequence of six elements, two for each time the return
clause is evaluated. The parentheses and comma are used in the return
clause to indicate that a sequence of the two elements should be returned. If no parentheses or comma were used, the two
element constructor would not be considered part of the FLWOR.
When a variable is bound in a for
or let
clause, it can be referenced anywhere in that FLWOR after the clause that binds it. This includes, for example, other subsequent let
or for
clauses, the where
clause, or the return
clause. It cannot be referenced in a preceding clause, and it cannot be referenced in the clause itself, as in:
let $count := 0 for $prod in doc("catalog.xml")//product let $count := $count + 1
This does not raise an error, but it is actually declaring a new variable with the same name, so it will have unexpected results, as described in “Adding Sequence Numbers to Results”.
If you bind two variables with the same name with the same containing expression, such as two for
or let
clauses that are part of the same FLWOR, you may again get unexpected results. It will create two separate variables with the same name, where the second masks the first and makes it inaccessible within the scope of the second variable.
A quantified expression determines whether some or all of the items in a sequence meet a particular condition. For example, if you want to know whether any of the items in an order are from the accessory department, you can use the expression shown in Example 6-8. This expression will return true
.
some
keywordsome
$
dept
in
doc
(
"catalog.xml"
)//
product
/
@dept
satisfies
(
$
dept
=
"ACC"
)
Alternatively, if you want to know if every item in an order is from the accessory department, you can simply change the word some
to every
, as shown in Example 6-9. This expression will return false
.
every
keywordevery
$
dept
in
doc
(
"catalog.xml"
)//
product
/
@dept
satisfies
(
$
dept
=
"ACC"
)
A quantified expression always evaluates to a Boolean value (true
or false
). As such, it is not useful for selecting the elements or attributes that meet certain criteria, but rather for simply determining whether any exist. Quantified expressions can generally be easily rewritten as FLWORs or even as simple path expressions. However, the quantified expression can be more compact and easier for implementations to optimize.
A quantified expression is made of several parts:
The syntax of a quantified expression is shown in Figure 6-5. (The as
clause, which specifies the type of the variable, is described in “Type Declarations in Quantified Expressions”.)
The processor tests the satisfies
expression (using its effective boolean value) for every item in the sequence. If the quantifier is some
, it returns true
if the satisfies
expression is true
for any of the items. If the quantifier is every
, it returns true
only if the satisfies
expression is true
for all items. If there are no items in the sequence, an expression with some
always returns false
, while an expression with every
always returns true
.
You can use the not
function with a quantified expression to express “not any” (none), and “not every.” Example 6-10 returns true
if none of the product
elements have a dept
attribute equal to ACC
. For our particular catalog, this returns false
.
not
function with a quantified expressionnot
(
some
$
dept
in
doc
(
"catalog.xml"
)//
product
/
@dept
satisfies
(
$
dept
=
"ACC"
))
You can bind multiple variables in a quantified expression by separating the clauses with commas. As with the for
clauses of FLWORs, the result is that every combination of the items in the sequences is taken. Example 6-11 returns true
because there is a combination of values (where $i
is 3 and $j
is 10) where the satisfies
expression is true
.
some
$
i
in
(
1
to
3
),
$
j
in
(
10
,
11
)
satisfies
$
j
-
$
i
=
7
The distinct-values
function selects distinct atomic values from a sequence. For example, the function call:
distinct-values(doc("catalog.xml")//product/@dept)
returns all the distinct values of the dept
attribute, namely ("WMN", "ACC", "MEN")
. This function determines whether two values are distinct based on their value equality by using the eq
operator.
It is also common to select a distinct set of combinations of values. For example, you might want to select all the distinct department/product number combinations from the product catalog. You cannot use the distinct-values
function directly for this, because it accepts only one sequence of atomic values, not multiple sequences of multiple values. Instead, you could use the expression shown in Example 6-12.
Query
let
$
prods
:=
doc
(
"catalog.xml"
)//
product
for
$
d
in
distinct-values
(
$
prods
/
@dept
),
$
n
in
distinct-values
(
$
prods
[
@dept
=
$
d
]/
number
)
return
<result
dept
=
"{
$
d
}"
number
=
"{
$
n
}"
/>
Results
<result
dept=
"WMN"
number=
"557"
/>
<result
dept=
"ACC"
number=
"563"
/>
<result
dept=
"ACC"
number=
"443"
/>
<result
dept=
"MEN"
number=
"784"
/>
For each distinct department, bound to $d
, it generates a list of distinct product numbers within that department by using the predicate [@dept = $d]
. It then returns the resulting combination of values as a result
element. The order in which the values are returned is implementation-dependent, so it can be unpredictable.
Additional data items can be added by adding for
clauses with the appropriate predicates.
One of the major benefits of FLWORs is that they can easily join data from multiple sources. For example, suppose you want to join information from your product catalog (catalog.xml) and your order (order.xml). You want a list of all the items in the order, along with their number, name, and quantity. Example 6-13 shows a FLWOR that performs this join.
Query
for
$
item
in
doc
(
"order.xml"
)//
item
,
$
prod
in
doc
(
"catalog.xml"
)//
product
[
number
=
$
item
/
@num
]
return
<item
num
=
"{
$
item
/
@num
}"
name
=
"{
$
prod
/
name
}"
quan
=
"{
$
item
/
@quantity
}"
/>
Results
<item
num=
"557"
name=
"Fleece Pullover"
quan=
"1"
/>
<item
num=
"563"
name=
"Floppy Sun Hat"
quan=
"1"
/>
<item
num=
"443"
name=
"Deluxe Travel Bag"
quan=
"2"
/>
<item
num=
"784"
name=
"Cotton Dress Shirt"
quan=
"1"
/>
<item
num=
"784"
name=
"Cotton Dress Shirt"
quan=
"1"
/>
<item
num=
"557"
name=
"Fleece Pullover"
quan=
"1"
/>
The first part of the for
clause selects each item
from the order, and the second part selects the matching product
element from the catalog.xml document, using a predicate to identify the one whose number
matches the item’s num
attribute. Another way to accomplish the same thing is by using a where
clause instead of a predicate, as shown in Example 6-14. This query yields the same results.
where
clausefor
$
item
in
doc
(
"order.xml"
)//
item
,
$
prod
in
doc
(
"catalog.xml"
)//
product
where
$
item
/
@num
=
$
prod
/
number
return
<item
num
=
"{
$
item
/
@num
}"
name
=
"{
$
prod
/
name
}"
quan
=
"{
$
item
/
@quantity
}"
/>
Whether to use a predicate or a where
clause is a matter of personal preference. When many conditions apply, a where
clause can be more readable. However, for simple conditions, a predicate may be preferable because it is less verbose. In some implementations, predicates perform faster than where
clauses.
Joins can be extended to allow more than two sources to be joined together. For example, suppose that, along with catalog.xml and order.xml, you also want to join the prices.xml document, which contains current pricing information for each product.
The query shown in Example 6-15 joins the prices.xml document with the others to provide pricing information in the results. It uses two expressions in the where
clause to implement the two joins.
where
clauseQuery
for
$
item
in
doc
(
"order.xml"
)//
item
,
$
prod
in
doc
(
"catalog.xml"
)//
product
,
$
price
in
doc
(
"prices.xml"
)//
prod
where
$
item
/
@num
=
$
prod
/
number
and
$
prod
/
number
=
$
price
/
@num
return
<item
num
=
"{
$
item
/
@num
}"
name
=
"{
$
prod
/
name
}"
price
=
"{
$
price
/
price
}"
/>
Results
<item
num=
"557"
name=
"Fleece Pullover"
price=
"29.99"
/>
<item
num=
"563"
name=
"Floppy Sun Hat"
price=
"69.99"
/>
<item
num=
"443"
name=
"Deluxe Travel Bag"
price=
"39.99"
/>
<item
num=
"557"
name=
"Fleece Pullover"
price=
"29.99"
/>
The previous join examples in this section are known as inner joins; the results do not include items without matching products or products without matching items. Suppose you want to create a list of products and join it with the price information. Even if there is no price, you still want to include the product in the list. This is known in relational databases as an outer join.
The query in Example 6-16 performs an outer join. It uses two FLWORs, one embedded in the return
clause of the other. The outer FLWOR returns the list of products, regardless of the availability of price information. The inner FLWOR selects the price, if it is available.
Query
for
$
prod
in
doc
(
"catalog.xml"
)//
product
return
<product
number
=
"{
$
prod
/
number
}"
>
{
attribute
price
{
for
$
price
in
doc
(
"prices.xml"
)//
prod
where
$
prod
/
number
=
$
price
/
@num
return
$
price
/
price
}
}
</product>
Results
<product
number=
"557"
price=
"29.99"
/>
<product
number=
"563"
price=
"69.99"
/>
<product
number=
"443"
price=
"39.99"
/>
<product
number=
"784"
price=
""
/>
Product 784 doesn’t have a corresponding price in the prices.xml document, so the price
attribute has an empty value for that product.
allowing empty
Version 3.0 added a new feature to the for
clause that makes outer joins simpler and more explicit. Adding the keywords allowing empty
into the for
clause means that an outer join should be performed that allows that value to be absent. Example 6-17 shows an example equivalent to Example 6-16 (with the same results) that takes advantage of this new syntax.
allowing empty
xquery
version
"3.0"
;
for
$
prod
in
doc
(
"catalog.xml"
)//
product
for
$
price
allowing
empty
in
doc
(
"prices.xml"
)//
prices
/
priceList
/
prod
[
@num
=
$
prod
/
number
]
return
<product
number
=
"{
$
prod
/
number
}"
price
=
"{
$
price
/
price
}"
/>
In this example, because allowing empty
is specified in the for
clause for $price
, the return
clause is evaluated once in the case where $price
is the empty sequence. Without allowing empty
, the return
clause would have been evaluated zero times for that particular $prod
.
The where
clauses in the join examples use the =
operator to determine whether two values are equal. Keep in mind that XQuery considers type when determining whether two values are equal. If schemas are not used with these documents, both values are untyped, and the join shown in Example 6-16 compares the values as strings. Unless they are cast to numeric types, the join does not consider different representations of the same number equal, for example 0557
and 557
.
On the other hand, if number
in catalog.xml is declared as an xs:integer
, and the num
attribute in prices.xml is declared as an xs:string
, the join will not work. One value would have to be explicitly cast to the other’s type, as in:
where $prod/number = xs:integer($price/@num)