Cypher is a whiteboard-friendly language. Like the data on which it is used, queries in Cypher follow a diagrammatic approach in their syntax. This helps to target the use of graph databases to a greater variety of audience including database admins, developers, corporate professionals, and even the common folk. Let's take a look at some Cypher queries before diving into the best practices and optimizations for Cypher.
The following pattern shown depicts three entities interrelated through a relationship denoting the NEEDS
dependency. It is represented in the form of an ASCII art:
(A)-[:NEEDS]->(B)-[:NEEDS]->(C), (A)-[:NEEDS]->(C)
The previous statement is in the form of a path that links entity A to B, then B to C, and finally A to C. The directed relation is denoted with the ->
operator. As it is evident, patterns denoted in Cypher are a realization of how graphs are represented on a whiteboard. It is worth noting that although a graph can be constructed with edges in both directions, the query-processing languages operate in one direction, for example, from left to right as in the preceding case. This is handled using a list of patterns that are separated with commas. Cypher queries fundamentally make use of patterns of the ASCII art. What a cypher query does is hold on to some initiating part of the graph with a section of its pattern and then use the remaining parts of the pattern to search for local matching entities in the graph.
Being a language for querying data, Cypher consists of several clauses to perform different tasks. A simple basic operation with cypher makes use of the START
clause to anchor to the source, which is succeeded by a MATCH
clause that is used to conditionally traverse through desired nodes in the graph and finally a RETURN
clause that outputs the matching values or some computable action result. In the following query, we find a connecting flight path for the city of Alabama using Cypher:
START city1=node:location(name='Alabama') MATCH (city1)-[:CONNECTS]->(city2)-[:CONNECTS]->(city3), (city1)-[:CONNECTS]->(city3) RETURN city2, city3
The preceding snippet contains the following three clauses:
location
that is asked to locate a place stored with the name property set to 'Alabama'
. This statement returns a reference that we bind to an identifier called city1
in the previous example.-->
and <--
symbols that also include the direction in which the relationship exists. Within the dashes in the previous symbols for relationships, we can insert the names of the relationships within a set of [ … ]
and the name of the connecting relationship can be indicated after a colon.Since the pattern in the MATCH
clause can occur in many ways, and if the size of the dataset is increased manifold, we will get a very large set of matched results. To avoid this, we use anchoring for a part of the pattern with the help of the START
clause.
The Cypher engine can then match the rest of the querying pattern in the graph surrounding the initiating points or nodes.
RETURN
clause is used to specify the resulting nodes and connecting relationships that matched the pattern along with their properties in the form of identifiers, which in the previous example matched instances of city2
and city3
. This follows a lazy binding approach for all the nodes that matched to some identifier that is specified in the query as the traversals take place in the graph.Some other essential clauses that Cypher supports for the construction of complex queries in the graph are listed as follows:
CREATE
: You can use this clause to define a new node or a new relationship. If you want only unique occurrences of nodes/relationships in the graphs, then you can use the CREATE
UNIQUE
clause to avoid the creation of duplicate entities.MERGE
: This clause is equivalent to MATCH
or CREATE
. It can also be used with the help of indexes and unique constraints to find an existing entity or otherwise create a new one.WHERE
: This clause provides a specification of conditions that can be used to filter nodes and relationships based on their stored properties.SET
: This clause is used to assign values to properties of nodes or relationships.WITH
: This clause is used to pipeline the output of one query in the form of input into the next query, thereby making the chaining of queries possible.UNION
: This clause acts as a conjunction operation for queries in Cypher. You can combine the action of multiple queries on the data to produce a final result with the help of this clause.DELETE
: It is used for the removal of any type of entities in the graph, be it nodes or relationships or their individual properties.FOREACH
: This is an action clause that can be used to sequentially update the elements in a set of entities.Some of these query clauses are radically similar to those in SQL. Cypher is intended to be simple enough so that it can be easily and quickly grasped by developers. Its clauses indicate that the operations are applied on graphs instead of relational data stores. We'll deal with some more clause-based examples in due course in the chapter.