This book describes the core features of the XQuery language and its associated built-in functions and data model. There are several peripheral standards that complement, but are not central to, the XQuery language. These standards include XQuery Update Facility, XQuery and XPath Full Text (search), XQueryX, and XQJ.
The XQuery language provides expressions for querying input documents and adding to the results. However, the main XQuery specification does not have any specific expressions for inserting, updating, or deleting XML data. As we saw in “Copying Input Elements with Modifications”, it is possible to modify input elements and attributes by using user-defined functions, but this is somewhat cumbersome. In addition, this only transforms input documents to be returned as the query results; it offers no ability to specify that XML data should be permanently changed in a database.
The XQuery Update Facility is a separate W3C specification, an extension to the XQuery language, that provides specialized expressions for updates. It allows for:
Nodes can be inserted in specified positions using insert node
or insert nodes
keywords. For example, the following will insert a new product in the second position:
insert node <product><number>345</number></product> after doc("catalog.xml")//product[1]
Nodes can be deleted using delete node
or delete nodes
keywords. For example, the following will delete all the products in the ACC department:
delete nodes doc("catalog.xml")//product[@dept='ACC']
Nodes can be replaced using replace node
keywords. For example, the following will replace the second product
with a new product:
replace node doc("catalog.xml")//product[2] with <product><number>345</number></product>
Element content and attribute values can be changed using replace value of node
keywords. For example, the following will replace the number
child of the second product
with a new number:
replace value of node doc("catalog.xml")//product[2]/number with 345
Element and attribute names can be changed using rename node
keywords. For example, the following will rename the name
elements to product-name
:
for $name in doc("catalog.xml")//name return rename node $name as "product-name"
To create modified copies of nodes, you can use a transform expression, whose keywords are copy
, modify
, and return
. For example, the following will return copies of all the product
elements, but with their name
children deleted:
for $prod in doc("catalog.xml")//product return copy $prodnew := $prod modify delete node $prodnew/name return $prodnew
Not all XQuery implementations support the XQuery Update Facility. Most XML database implementations also provide their own custom functions for inserting and updating documents in the database. For more information on the XQuery Update Facility, see http://www.w3.org/TR/xquery-update-30/.
Search facilities have become an increasingly important (and complex) tool to locate relevant information in the vast amount of data that is now structured as XML, whether it is in large text databases or on the Web itself. Searching is a natural use case for XQuery because of its built-in knowledge of XML structures and its syntax, which can be written by reasonably non-technical users.
XQuery contains some limited functionality for searching text. For example, you can use the contains
or matches
function to search for specific strings inside element content. However, the current features are quite limited, especially for textual XML documents.
There is a separate recommendation entitled XQuery and XPath Full Text 3.0 that provides specialized operators for full-text searching. These operators are extensions to the XQuery syntax, and they are not supported by all XQuery implementations.
The Full-Text recommendation supports the following search functionality:
Combining search terms by using ftand
(and), ftor
(or), ftnot
(not), and not in
(mild not)
Finding words with the same linguistic stem, for example, finding both “mouse” and “mice” when the search term is “mouse”
Specifying how far apart the search terms may be, and in what order
Searching for multiple terms within the same sentence or paragraph
Determining how relevant the results are to the terms searched
Restricting results to search terms that appear a specific number of times
Considering uppercase versus lowercase letters either relevant or irrelevant
Considering, for example, accents on characters either relevant or irrelevant
Specifying wildcards in search terms, such as run.*
to match all words that start with “run”
Specifying common words to exclude from searches, such as “a” and “the”
An example of a full-text query is shown in Example 28-1.
for $b in /books/book let score $s := $b/content contains text ("web site" weight {0.5}) ftand ("usability" weight {2}) where $s > 0.5 order by $s descending return <result score="{$s}">{$b}</result>
This example uses a familiar FLWOR syntax, but with some additional operators and clauses:
The score $s
in the let
clause is used to specify that the variable $s
should contain the relevance score of the results. This variable is then used to constrain the results to those where the score is greater than 0.5, and also to sort the results, with the most relevant appearing first.
The contains
operator is used to find text containing the specific search terms “web site” and “usability.”
The ftand
operator is used to find a union of the two search terms, returning only documents that contain both terms.
The weight
keyword is used to weight the individual search terms.
For more information on the XQuery Full-Text recommendation, see http://www.w3.org/TR/xpath-full-text-30/. Some XQuery implementations, such as MarkLogic and eXist, also provide special built-in functions and operators to address these full-text requirements.
XQueryX is an alternate, XML syntax to represent XQuery queries. It is not covered in detail in this book because it is unlikely that most query authors will want to write XQueryX by hand. However, it may be useful as a syntax used by processors for storing and/or transferring queries because XML is easier to parse and/or transform than a non-XML syntax. It can also be useful for embedding queries in XML documents.
A simple FLWOR is shown in Example 28-2.
for
$
prod
in
doc
(
"catalog.xml"
)//
product
order by
$
prod
/
name
return
$
prod
/
number
The equivalent XQueryX is shown in Example 28-3. As you can see, the XQueryX syntax is far more verbose, and breaks the query down to a very granular level, with at least one element for every expression.
<?xml version="1.0"?>
<xqx:module
xmlns:xqx=
"http://www.w3.org/2005/XQueryX"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
>
<xqx:mainModule>
<xqx:queryBody>
<xqx:flworExpr>
<xqx:forClause>
<xqx:forClauseItem>
<xqx:typedVariableBinding>
<xqx:varName>
product</xqx:varName>
</xqx:typedVariableBinding>
<xqx:forExpr>
<xqx:pathExpr>
<xqx:argExpr>
<xqx:functionCallExpr>
<xqx:functionName>
doc</xqx:functionName>
<xqx:arguments>
<xqx:stringConstantExpr>
<xqx:value>
catalog.xml</xqx:value>
</xqx:stringConstantExpr>
</xqx:arguments>
</xqx:functionCallExpr>
</xqx:argExpr>
<xqx:stepExpr>
<xqx:xpathAxis>
descendant-or-self</xqx:xpathAxis>
<xqx:anyKindTest/>
</xqx:stepExpr>
<xqx:stepExpr>
<xqx:xpathAxis>
child</xqx:xpathAxis>
<xqx:nameTest>
product</xqx:nameTest>
</xqx:stepExpr>
</xqx:pathExpr>
</xqx:forExpr>
</xqx:forClauseItem>
</xqx:forClause>
<!-- ... -->
For more information on XQueryX, see the recommendation at http://www.w3.org/TR/xqueryx-30/.
RESTXQ allows you to declare interactions between HTTP requests and XQuery functions by using annotations. This allows you to define a REST API to build elegant Web applications, with XQuery functions exposing the resources. It is supported by eXist, BaseX, and MarkLogic.
An example is shown in Example 28-4, where the function prod:product-info
returns an HTML page with detailed product information for a particular product, based on product number.
xquery
version
"3.0"
;
module
namespace
prod
=
"http://datypic.com/prod"
;
declare
namespace
rest
=
"http://exquery.org/ns/restxq"
;
declare
%rest:GET
%rest:path
(
"/product/{$num}"
)
function
prod:product-info
(
$
num
)
{
let
$
prod
:=
doc
(
"catalog.xml"
)//
product
[
number
=
$
num
]
return
<html>
<body>
<p><b>
Number
</b>
:
{
data
(
$
prod
/
number
)}
</p>
<p><b>
Name
</b>
:
{
data
(
$
prod
/
name
)}
</p>
<p><b>
Department
</b>
:
{
data
(
$
prod
/
@dept
)}
</p>
</body>
</html>
};
The %rest:GET
annotation indicates that this function supports only the HTTP GET
method. It is possible to support other HTTP methods such as HEAD
, POST
, PUT
, and DELETE
, and a single function can support more than one HTTP method.
The %rest:path
annotation maps the path product/{$num}
to this XQuery function. The {$num}
segment of the path indicates a parameter name that must match a parameter name of the function. If the Web application is at http://www.datypic.com
, performing an HTTP GET on the following URL:
http://www.datypic.com/product/563
will return a resource that is an html
element showing the number, name, and department of the product with number 563.
It is also possible for RESTXQ functions to accept parameters. Example 28-5 shows a function that returns a list of products for a particular department. The %rest:query-param
annotation is used to show that the department can be included in the HTTP request URI query string, using the name dept
. It lists the default value as *
, which will return all products regardless of department.
xquery
version
"3.0"
;
module
namespace
prod
=
"http://datypic.com/prod"
;
declare
namespace
rest
=
"http://exquery.org/ns/restxq"
;
declare
%rest:GET
%rest:path
(
"/products"
)
%rest:query-param
(
"dept"
,
"{$dept}"
,
"*"
)
function
prod:prods-by-dept
(
$
dept
)
{
<html>
<body>
{
for
$
prod
in
doc
(
"catalog.xml"
)//
product
[
@dept
=
$
dept
or
$
dept
=
'*'
]
let
$
num
:=
data
(
$
prod
/
number
)
return
<p><a
href
=
"
product/
{
$
num
}"
>
{
$
num
}
</a></p>
}
</body>
</html>
};
This function might be called using the URI:
http://www.datypic.com/products?dept=ACC
which would return an html
element containing a paragraph for each product in the ACC
department, with a link to the product detail, for example:
<html> <body> <p><a href="product/563">563</a></p> <p><a href="product/443">443</a></p> </body> </html>
Parameters can also be supplied using HTML forms (via the %rest:form-param
annotation), HTTP headers (via the %rest:header-param
annotation), and cookies (via the %rest:cookie-param
annotation).
In addition to (or instead of) returning a resource, a RESTXQ function can return HTTP header information by wrapping the results in a rest:response
element, as shown in Example 28-6.
xquery
version
"3.0"
;
module
namespace
prod
=
"http://datypic.com/prod"
;
declare
namespace
rest
=
"http://exquery.org/ns/restxq"
;
declare
namespace
http
=
"http://expath.org/ns/http-client"
;
declare
%rest:GET
%rest:HEAD
%rest:POST
%rest:PUT
%rest:DELETE
function
prod:not-found
()
{
<rest:response>
<http:response
status
=
"
404
"
message
=
"
I was not found.
"
>
<http:header
name
=
"
Content-Language
"
value
=
"
en
"
/>
<http:header
name
=
"
Content-Type
"
value
=
"
text/html; charset=utf-8
"
/>
</http:response>
</rest:response>
};
The specification for RESTXQ can be found at http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html.
XQJ is a standard for calling XQuery from Java. XQJ is to XML data sources what JDBC is to relational data sources. It provides a standard set of classes for connecting to a data source, executing a query, and traversing through the result set.
Example 28-7 shows an example of Java code that connects to an XML data source and iterates through the results.
// connect to the data source
XQConnection
conn
=
xqds
.
getConnection
();
// create a new expression object
XQExpression
expr
=
conn
.
createExpression
();
// execute the query
XQResultSequence
result
=
expr
.
executeQuery
(
"for $prod in doc('catalog.xml')//product"
+
"order by $prod/number"
+
"return $prod/name"
);
// iterate through the result sequence
while
(
result
.
next
())
{
// retrieve the atomic value of the current item
String
prodName
=
result
.
getAtomicValue
();
System
.
out
.
println
(
"Product name: "
+
prodName
);
}
For more information on XQJ, see the specification at http://jcp.org/en/jsr/detail?id=225.