Day 1: Graphs, Cypher, and CRUD

Today we’re really going to jump in with both feet. In addition to exploring the Neo4j web interface, we’ll get deep into graph database terminology and CRUD. Much of today will be learning how to query a graph using a querying language called Cypher. The concepts here differ significantly from other databases we’ve looked at so far, which have largely taken a document- or record-based view of the world. In Neo4j, nodes inside of graphs act like documents because they store properties, but what makes Neo4j special is that the relationship between those nodes takes center stage.

But before we get to all that, let’s start with the web interface to see how Neo4j represents data in graph form and how to navigate that graph. After you’ve downloaded and unzipped the Neo4j package, cd into the Neo4j directory and start up the server like this:

 $ bin/neo4j start

To make sure you’re up and running, try curling this URL:

 $ curl http://localhost:7474/db/data/

Like CouchDB, the default Neo4j package comes equipped with a fully featured web administration tool and data browser, which is excellent for experimentation. Even better, it has one of the coolest graph data browsers we’ve ever seen. This is perfect for getting started because graph traversal can feel very awkward at first try.

Neo4j’s Web Interface

Launch a web browser and navigate to the administration page.[38]

You’ll be greeted by a colorful dashboard like the one in the figure.

images/neo4j-wa-dashboard.png

In the Connect to Neo4j component, sign in using the default username and password (enter neo4j for both). That will open up a command-line-style interface at the top of the page (distinguished by the $ on the far left). Type in :server connect to connect to the database.

You can enter :help commands at any time for an in-depth explanation of the existing commands. :help cypher will bring up a help page with instructions for specific Cypher commands (more on Cypher, the querying language we’ll be using through this web interface, in a moment).

Neo4j via Cypher

There are several ways that you can interact with Neo4j. In addition to client libraries in a wide variety of programming languages (as with the other databases in this book), you can also interact with Neo4j via a REST API (more on this in Day 2), and via two querying languages created with Neo4j exclusively in mind: Gremlin and Cypher. While Gremlin has some interesting properties, Cypher is now considered standard.

Cypher is a rich, Neo4j-specific graph traversal language. In Cypher, as in mathematical graph theory, graph data points are called nodes. Unlike in graph theory, however, graphs in Cypher consist of nodes rather than vertices (as they are called in graph theory) and connections between nodes are called relationships (rather than edges). Statements used to query Neo4j graphs in Cypher typically look something like this:

 $ MATCH [some set of nodes and/or relationships]
  WHERE [some set of properties holds]
  RETURN [some set of results captured by the MATCH and WHERE clauses]

In addition to querying the graph using MATCH, you can create new nodes and relationships using CREATE, update the values associated with nodes and relationships using UPDATE, and much more. That’s fairly abstract, but don’t worry—you’ll get the hang of it via examples over the course of the next few sections.

At the moment, our not-so-exciting Neo4j graph consists of no nodes and no relationships. Let’s get our hands dirty and change that by adding a node for a specific wine to our graph. That node will have a few properties: a name property with a value of Prancing Wolf, a style property of ice wine, and a vintage property of 2015. To create this node, enter this Cypher statement into the console:

 $ CREATE (w:Wine {name:"Prancing Wolf", style: "ice wine", vintage: 2015})

In the section of the web UI immediately below the console, you should see output like that in the figure that follows.

images/neo4j-new-node.png

At the top, you’ll see the Cypher statement you just ran. The Rows tile shows you the nodes and/or relationships that you created in the last Cypher statement, and the Code tile provides in-depth information about the action you just completed (mostly info about the transaction that was made via Neo4j’s REST API).

At any time, we can access all nodes in the graph, kind of like a SELECT * FROM entire_graph statement:

 $ MATCH (n)
  RETURN n;

At this point, that will return just one solitary node. Let’s add some others. Remember that we also want to keep track of wine-reviewing publications in our graph. So let’s create a node representing the publication Wine Expert Monthly:

 $ CREATE (p:Publication {name: "Wine Expert Monthly"})

In the last two statements, Wine and Publication were labels applied to the nodes, not types. We could create a node with the label Wine that had a completely different set of properties. Labels are extremely useful for querying purposes, as you’ll see in a bit, but Neo4j doesn’t require you to have predefined types. If you do want to enforce types, you’ll have to do that at the application level.

So now we have a graph containing two nodes but they currently have no relationship with one another. Because Wine Expert Monthly reports on this Prancing Wolf wine, let’s create a reported_on relationship that connects the two nodes:

 $ MATCH (p:Publication {name: "Wine Expert Monthly"}),
  (w:Wine {name: "Prancing Wolf", vintage: 2015})
  CREATE (p)-[r:reported_on]->(w)

In this statement, we’ve MATCHed the two nodes that we want to connect via their labels (Wine and Publication) and their name property, created a reported_on relationship and stored that in the variable r, and finally RETURNed that relationship. You can see the end result in the figure that follows.

images/neo4j-new-relationship.png

If you click on the relationship between the nodes in the web UI, you can see the ID of the relationship is 0. You can use Neo4j’s REST interface to access information about the relationship at http://localhost:7474/db/data/relationship/0 or via Cypher by running:

 $ MATCH ()-[r]-()
  WHERE id(r) = 0
  RETURN r

Relationships, like nodes, can contain properties and can be thought of as objects in their own right. After all, we don’t want to know simply that a relationship exists; we want to know what constitutes that relationship. Let’s say that we want to specify which score Wine Expert Monthly gave the Prancing Wolf wine. We can do that by adding a rating property to the relationship that we just created.

 $ MATCH ()-[r]-()
  WHERE id(r) = 0
  SET r.rating = 97
  RETURN r

We also could’ve specified the rating when creating the relationship, like this:

 $ MATCH (p:Publication {name: "Wine Expert Monthly"}),
  (w:Wine {name: "Prancing Wolf"})
  CREATE (p)-[r:reported_on {rating: 97}]->(w)

At this point, if you display the entire graph again using MATCH (n) RETURN n; and click on the relationship, you’ll see that rating: 97 is now a property of the reported_on relationship. Another bit of info that we want to note is that the Prancing Wolf wine is made from the Riesling grape. We could insert this info by adding a grape_type: Riesling property to the Prancing Wolf node, but let’s do things in a more Neo4j-native fashion instead by creating a new node for the Riesling grape type and adding relationships to wines of that type:

 $ CREATE (g:GrapeType {name: "Riesling"})

Let’s add a relationship between the Riesling node and the Prancing Wolf node using the same method:

 $ MATCH (w:Wine {name: "Prancing Wolf"}),(g:GrapeType {name: "Riesling"})
  CREATE (w)-[r:grape_type]->(g)

Now we have a three-node graph: a wine, a type of grape, and a publication.

images/neo4j-three-node-graph.png

So far, we’ve created and updated both nodes and relationships. You can also delete both from a graph. The following are three Cypher statements that will create a new node, establish a relationship between that node and one of our existing nodes, delete the relationship, and then delete the node (you can’t delete a node that still has relationships associated with it):

 $ CREATE (e: EphemeralNode {name: "short lived"})
 $ MATCH (w:Wine {name: "Prancing Wolf"}),
  (e:EphemeralNode {name: "short lived"})
  CREATE (w)-[r:short_lived_relationship]->(e)
 $ MATCH ()-[r:short_lived_relationship]-()
  DELETE r
 $ MATCH (e:EphemeralNode)
  DELETE e

Our wine graph is now back to where it was before creating the short lived node. Speaking of deletion, if you ever want to burn it all down and start from scratch with an empty graph, you can use the following command at any time to delete all nodes and relationships. But beware! This command will delete the entire graph that you’re working with, so run it only if you’re sure that you’re ready to move on from a graph’s worth of data for good.

 $ MATCH (n)
  OPTIONAL MATCH (n)-[r]-()
  DELETE n, r

Now that you know how to start from scratch, let’s continue building out our wine graph. Wineries typically produce more than one wine. To express that relationship in an RDBMS, we might create a separate table for each winery and store wines that they produce as rows. The most natural way to express this in Neo4j would be—you guessed it—to represent wineries as nodes in the graph and create relationships between wineries and wines. Let’s create a node for Prancing Wolf Winery and add a relationship with the Prancing Wolf wine node that we created earlier:

 $ CREATE (wr:Winery {name: "Prancing Wolf Winery"})
 $ MATCH (w:Wine {name: "Prancing Wolf"}),
  (wr:Winery {name: "Prancing Wolf Winery"})
  CREATE (wr)-[r:produced]->(w)

We’ll also add two more wines produced by Prancing Wolf Winery—a Kabinett and a Spätlese—and also create produced relationships and specify that all of the Prancing Wolf wines are Rieslings.

 $ CREATE (w:Wine {name:"Prancing Wolf", style: "Kabinett", vintage: 2002})
 $ CREATE (w:Wine {name: "Prancing Wolf", style: "Spätlese", vintage: 2010})
 $ MATCH (wr:Winery {name: "Prancing Wolf"}),(w:Wine {name: "Prancing Wolf"})
  CREATE (wr)-[r:produced]->(w)
 $ MATCH (w:Wine),(g:GrapeType {name: "Riesling"})
  CREATE (w)-[r:grape_type]->(g)

This will result in a graph that’s fully fleshed out, like the one shown in the figure that follows.

images/neo4j-wa-graph3.png

Schemaless Social

In addition to knowing about wines, wineries, and publications, we want our wine graph to have a social component—that is, we want to know about the people affiliated with these wines and their relationships with one another. To do that, we just need to add more nodes. Suppose that you want to add three people, two who know each other and one stranger, each with their own wine preferences.

Alice has a bit of a sweet tooth so she’s a big fan of ice wine.

 $ CREATE (p:Person {name: "Alice"})
 $ MATCH (p:Person {name: "Alice"}),
  (w:Wine {name: "Prancing Wolf", style: "ice wine"})
  CREATE (p)-[r:likes]->(w)

Tom likes Kabinett and ice wine and trusts anything written by Wine Expert Monthly.

 $ CREATE (p: Person {name: "Tom"})
 $ MATCH (p:Person {name: "Tom"}),
  (w:Wine {name: "Prancing Wolf", style: "ice wine"})
  CREATE (p)-[r:likes]->(w)
 $ MATCH (p:Person {name: "Tom"}),
  (pub:Publication {name: "Wine Expert Monthly"})
  CREATE (p)-[r:trusts]->(pub)

Patty is friends with both Tom and Alice but is new to wine and has yet to choose any favorites.

 $ CREATE (p:Person {name: "Patty"})
 $ MATCH (p1:Person {name: "Patty"}),
  (p2:Person {name: "Tom"})
  CREATE (p1)-[r:friends]->(p2)
 $ MATCH (p1:Person {name: "Patty"}),
  (p2:Person {name: "Alice"})
  CREATE (p1)-[r:friends]->(p2)

Note that without changing any fundamental structure of our existing graph, we were able to superimpose behavior beyond our original intent. The new nodes are related, as you can see in the following figure.

images/neo4j-wa-graph4.png

Stepping Stones

Thus far, we’ve mostly been performing simple, almost CRUD-like operations using Cypher. You can do a lot with these simple commands, but let’s dive in and see what else Cypher has to offer. First, let’s explore Cypher’s syntax for querying all relationships that a node has with a specific type of node. The --> operator lets us do that. First, let’s see all nodes associated with Alice:

 $ MATCH (p:Person {name: "Alice"})-->(n)
  RETURN n;

Now let’s see all of the people that Alice is friends with, except let’s return only the name property of those nodes:

 $ MATCH (p:Person {name: "Alice"})-->(other: Person)
  RETURN other.name;

That should result in two returned values: Patty and Tom. Now let’s say that we want to see which nodes with the label Person are in the graph, but excluding Patty (boo, Patty!). Note the <> operator, which is used instead of != in Cypher:

 $ MATCH (p:Person)
  WHERE p.name <> 'Patty'
  RETURN p;

Thus far, all of our queries have sought out nodes adjacent to one another. But we also said at the beginning of the chapter that Neo4j is an extremely scalable database capable of storing tons of nodes and relationships. Cypher is absolutely up to the task of dealing with far more complex relationships than the ones we’ve seen thus far. Let’s add some nodes that aren’t directly related to Patty (for Alice’s friend Ahmed and Tom’s friend Kofi) and then query for a relationship.

 $ CREATE (p1:Person {name: "Ahmed"}), (p2:Person {name: "Kofi"});
 $ MATCH (p1:Person {name: "Ahmed"}),(p2:Person {name: "Alice"})
  CREATE (p1)-[r:friends]->(p2);
 $ MATCH (p1:Person {name: "Kofi"}),(p2:Person {name: "Tom"});
  CREATE (p1)-[r:friends]->(p2);

Cypher lets us query for friends of friends of Alice like this:

 $ MATCH
  (fof:Person)-[:friends]-(f:Person)-[:friends]-(p:Person {name: "Patty"})
  RETURN fof.name;

As expected, this returns two values: Ahmed and Kofi.

Indexes, Constraints, and "Schemas" in Cypher

Neo4j doesn’t enable you to enforce hard schemas the way that relational databases do, but it does enable you to provide some structure to nodes in your graphs by creating indexes and constraints for specified labels.

As with many other databases in this book, you can provide a nice speed-up for computationally expensive queries by creating indexes on labels and properties associated with that label. Remember that each Wine in our graph has a name property. You can create an index on that type/property combination like this:

 $ CREATE INDEX ON :Wine(name);

You can easily remove indexes at any time:

 $ DROP INDEX ON :Wine(name);

Indexes are super easy to use in Neo4j because you don’t really have to do much to use them. Once you’ve established an index for nodes with a specific label and property, you can continue to query those nodes as you did before, and Neo4j will figure out the rest. This query, which returns all nodes with the label Wine, would look exactly the same before and after creating an index on Wine/name:

 $ MATCH (w:Wine {name: 'Some Name'})
  RETURN w;

While indexes can help speed up queries, constraints can help you sanitize your data inputs by preventing writes that don’t satisfy criteria that you specify. If you wanted to ensure that every Wine node in your graph had a unique name, for example, you could create this constraint:

 $ CREATE CONSTRAINT ON (w:Wine) ASSERT w.name IS UNIQUE;

Now, if you try to create two Wine nodes with the same name, you’ll get an error:

 $ CREATE (w:Wine {name: "Daring Goat", style: "Spätlese", vintage: 2008});
 $ CREATE (w:Wine {name: "Daring Goat", style: "Riesling", vintage: 2006});
 WARNING: Node 219904 already exists...

Even better, when you create a constraint, Neo4j will automatically check your existing data to make sure that all nodes with the given label conform to the constraint. Like indexes, constraints can be removed using a DROP statement, though make sure to include the entire constraint statement:

 $ DROP CONSTRAINT ON (w:Wine) ASSERT w.name IS UNIQUE;

Keep in mind that you cannot apply a constraint to a label that already has an index, and if you do create a constraint on a specific label/property pair, an index will be created automatically. So usually you’ll only need to explicitly create a constraint or an index.

If you want to see the status of a label’s “schema,” you can see that information in the shell:

 $ schema ls -l :Wine
 Indexes
  ON :Wine(name) ONLINE (for uniqueness constraint)
 
 Constraints
  ON (wine:Wine) ASSERT wine.name IS UNIQUE

Although Neo4j isn’t fundamentally schema-driven the way that relational databases are, indexes and constraints will help keep your queries nice and fast and your graph sane. They are an absolute must if you want to run Neo4j in production.

Day 1 Wrap-Up

Today we began digging into the graph database Neo4j and the Cyper querying language—and what a different beast we’ve encountered! Although we didn’t cover specific design patterns per se, our brains are now buzzing with the strange and beautiful possibilities opened up by the graph database worldview. Remember that if you can draw it on a whiteboard, you can store it in a graph database.

Day 1 Homework

Find

  1. Browse through the Neo4j docs at https://neo4j.com/docs and read more about Cypher syntax. Find some Cypher features mentioned in those docs that we didn’t have a chance to use here and pick your favorite.
  2. Experiment with an example graph consisting of movie-related data by going back to the browser console at http://localhost:7474/browser, typing :play movie-graph into the web console, and following the instructions.

Do

  1. Create a simple graph describing some of your closest friends, your relationships with them, and even some relationships between your friends. Start with three nodes, including one for yourself, and create five relationships.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset