Neo4j is one of the most popular graph databases today. It was developed by Neo Technology, Inc. operating from the San Francisco Bay Area in the U.S. It is written in Java and is available as open source software. Neo4j is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. Most graph databases available have a storage format of two types:
In the following sections, we will see an overview of the Neo4j fundamentals, basic CRUD operations, along with the installation and configuration of Neo4j in different environments.
Contrary to popular belief, ACID does not contradict or negate the concept of NoSQL. NoSQL fundamentally provides a direct alternative to the explicit schema in classical RDBMSes. It allows the developer to treat things asymmetrically, whereas traditional engines have enforced rigid sameness across the data model. The reason this is so interesting is because it provides a different way to deal with change, and for larger datasets, it provides interesting opportunities to deal with volumes and performance. In other words, the transition is about shifting the handling of complexity from the database administrators to the database itself.
Transaction management has been the talking point of NoSQL technologies since they started to gain popularity. The trade-off of transactional attributes for performance and scalability has been the common theme in nonrelational technologies that targeted big data. Some databases (for example, BigTable, Cassandra, and CouchDB) opted to trade-off consistency. This allowed clients to read stale data and in some cases, in a distributed system (eventual consistency), or in key-value stores that concentrated on read performance, where durability of the data was not of too much interest (for example, Memcached), or atomicity on a single-operation level, without the possibility to wrap multiple database operations within a single transaction, which is typical for document-oriented databases. Although devised a long time ago for relational databases, transaction attributes are still important in the most practical use cases. Neo4j has taken a different approach here. Neo4j's goal is to be a graph database, with the emphasis on database. This means that you'll get full ACID support from the Neo4j database:
The ACID transactional support provides seamless transition to Neo4j for anyone used to relational databases and offers safety and convenience in working with graph data.
Transactional support is one of the strong points of Neo4j, which differentiates it from the majority of NoSQL solutions and makes it a good option not only for NoSQL enthusiasts but also in enterprise environments. It is also one of the reasons for its popularity in big data scenarios.
Graph databases are built with the objective of optimizing transactional performance and are engineered to persist transactional integrity and operational availability. Two properties are useful to understand when investigating graph database technologies:
Graph databases, in particular native ones such as Neo4j, don't depend heavily on indexes because the graph itself provides a natural adjacency index. In a native graph database, the relationships attached to a node naturally provide a direct connection to other related nodes of interest. Graph queries largely involve using this locality to traverse through the graph, literally chasing pointers. These operations can be carried out with extreme efficiency, traversing millions of nodes per second, in contrast to joining data through a global index, which is many orders of magnitude slower. There are several different graph data models, including property graphs, hypergraphs, and triples. Let's take a brief look at them:
Some essential characteristics of the Neo4j graph databases are as follows:
Neo4j stores data in entities called nodes. Nodes are connected to each other with the help of relationships. Both nodes and relationships can store properties or metadata in the form of key-value pairs. Thus, inherently a graph is stored in the database. In this section, we look at the basic CRUD operations to be used in working with Neo4j:
CREATE ( gates { firstname: 'Bill', lastname: 'Gates'} ) CREATE ( page { firstname: 'Larry', lastname: 'Page'}), (page) - [r:WORKS_WITH] - > (gates) RETURN gates, page, r
In this example, there are two queries; the first is about the creation of a node that has two properties. The second query performs the same operation as the first one, but also creates a relationship from page
to gates
.
START n=node(*) RETURN "The node count of the graph is "+count(*)+" !" as ncount;
A variable named ncount
is returned with the The node count of the graph is 2!
value; it's basically the same as select count(*)
.
START self=node(1) MATCH self<--friend RETURN friend
Assuming that we are using this simple database as an example, these commands will return the page
node keeping in mind the direction of the relationship:
START person=node(*) MATCH person WHERE person.firstname! ='Bill' RETURN person
This query searches through all nodes and matches the ones with the firstname
property that is equal to Bill
. The !
symbol makes sure that only nodes that possess the property are to be taken into consideration, to prevent errors.
START person=node(*) MATCH person WHERE person.firstname! ='Bill' SET person.age = '60' RETURN person
The node that has the firstname
property as Bill
is searched and adds another property called age
that has the value 60
.
START person = node(*) MATCH person WHERE person.firstname! = "Larry" DELETE person
In this query, we match all nodes that have firstname
equal to Larry
and perform a delete operation on them.
START node = node(*) MATCH node-[r]-() DELETE node, r
This query is used to fetch all nodes and relationships and performs a delete operation on them.
So, you now know how to perform basic CRUD operations on a Neo4j graph. We will encounter more of these queries in more complex forms in later chapters in the book.