Gremlin – an overview

Gremlin is basically a wrapper to Groovy. It provides some nice constructs that make the traversal of graphs efficient. It is an expressive language written by Marko Rodriguez and uses connected operations for the traversal of a graph. Gremlin can be considered Turing complete and has simple and easy-to-understand syntax.

Note

Groovy is a powerful, optionally typed, and dynamic language, with static typing and static compilation capabilities for the Java platform aimed at multiplying developers' productivity thanks to a concise, familiar, and easy-to-learn syntax. It integrates smoothly with any Java program and immediately delivers to your application powerful features, including scripting capabilities, domain-specific language authoring, runtime and compile-time meta-programming, and functional programming. Check it out at http://groovy-lang.org/.

Gremlin integrates well with Neo4j since it was mainly designed for use with property graphs. The earlier versions sported the Gremlin console on the web interface shell, but the latest version does away with it. Gremlin is generally used with an REPL or a command line to make traversals on a graph interactively.

Let's browse through some useful queries in Gremlin for graph traversals.

You can set up the Gremlin REPL to test out the queries. Download the latest build from https://github.com/tinkerpop/gremlin/wiki/Downloads and follow the setup instructions given on the official website. Now, in order to configure your Gremlin with your Neo4j installation, you need to first create a neo4j.groovy file with the path to your neo4j/data/graph.db directory and add the following lines:

// neo4j.groovy
import org.neo4j.kernel.EmbeddedReadOnlyGraphDatabase
db = new EmbeddedReadOnlyGraphDatabase('/path/to/neo4j/data/graph.db')
g = new Neo4jGraph(db)

When you start a new Gremlin REPL, you will need to load this file in order to use Gremlin commands with your Neo4j database:

$ cd /path/to/gremlin
$ ./gremlin.sh

         ,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin> load neo4j.groovy
gremlin>

You can now try out some of the Gremlin clauses mentioned in the following points:

  • You can connect to an existing instance of a graph database such as Neo4j with the help of the following command at the Gremlin prompt:
    gremlin>  g = new Neo4jGraph ("/path/to/database")
    
  • If you want to view all the nodes or vertices and edges in the graph, you can use the following commands:
    gremlin>  g.V
    gremlin>  g.E
    
  • To get a particular vertex that has been indexed, type the following command. It returns the vertex that has a property name "Bill Gates" as the name. Since the command returns an iterator, the >> symbol is used to pop the next item in the iterator and assign it to the variable in consideration:
    gremlin> v =  g.idx(T.v)[[name: "Bill Gates"]] >> 1
    ==>v[165]
    
  • To look at the properties on the particular vertex, you need the following command:
    gremlin> v.map
    ==> name = Bill Gates
    ==> age = 60
    ==> designation = CEO
    ==> company = Microsoft
    

    To view the outgoing edges from that node, we use the following command. The result of that will print out all the outbound edges from that graph in the format that consists of the node indices:

    e[212][165-knows->180]
    ==> v.outE
    
  • You can also write very simple queries to retrieve the node at the other end of a relationship based on its label in the following manner:
    gremlin> v.outE[[label:'knows']].inV.name
    ==> Steve Jobs
    
  • Gremlin also allows you to trace the path it takes to achieve a particular result with the help of an in-built property. All you need to do is append a .path to the end of the query whose path you want to view:
    gremlin> v.outE[[label:'knows']].inV.name.path
    ==> [v[165], e[212][ 165-knows->180], v[180], Steve Jobs]
    
  • If we need to find the names of all the vertices in the graph that are known by the vertex with the ID 165 and that have exceeded 30 years. Note that conditions in the Gremlin statements are expressed in a pair of {} similar to that in Groovy:
    gremlin>  v.outE{it.label=='knows'}.inV{it.age > 30}.name
    
  • Finally, let's see how we can use collaborative filters on the vertex with the ID 165 to make calculations:
    gremlin> m = [:]
    gremlin> v.outE.inV.name.groupCount(m).back(2).loop(3){it.loops<4}
    gremlin> m.sort{a,b -> a.value <=> b.value}
    

The preceding statements first create a map in Groovy called m. Next, we find all the outgoing edges from v, the incoming vertices at the end of those edges, and then the name property. Since we cannot get the outgoing edges of the name, we go back two steps to the actual vertex and then loop back three times in the statement to go to the required entity. This maps the count retrieved from the looping to the ID of the vertex and then stores them in the m map. The final statement sorts the results in the map based on the count value. So, Gremlin is quite interesting for quick tinkering with graph data and constructing small complex queries for analysis. However, since it is a Groovy wrapper for the Pipes framework, it lacks scope for optimizations or abstractions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset