Useful code snippets

Data storage and operations on data are essentially well framed and documented for Neo4j. When it comes to the analysis of data, it is much easier for the data scientists to get the data out of the database in a raw format, such as CSV and JSON, so that it can be viewed and analyzed in batches or as a whole.

Importing data to Neo4j

Cypher can be used to create graphs or include data in your existing graphs from common data formats such as CSV. Cypher uses the LOAD CSV command to parse CSV data into the form that can be incorporated in a Neo4j graph. In this section, we demonstrate this functionality with the help of an example.

We have three CSV files: one contains players, the second has a list of games, and the third has a list of which of these players played in each game. You can access the CSV files by keeping them on the Neo4j server and using file://, or by using FTP, HTTP, or HTTPS for remote access to the data.

Let's consider sample data about cricketers (players) and the matches (games) that were played by them. Your CSV file would look like this:

id,name
1,Adam Gilchrist
2,Sachin Tendulkar
3,Jonty Rhodes
4,Don Bradman
5,Brian Lara

You can now load the CSV data into Neo4j and create nodes out of them using the following commands, where the headers are treated as the labels of the nodes and the data from every line is treated as nodes:

LOAD CSV WITH HEADERS FROM "http://192.168.0.1/data/players.csv" AS LineOfCsv 
CREATE (p:Person { id: toInt(LineOfCsv.id), name: LineOfCsv.name })

Now, let's load the games.csv file. The format of the game data will be in the following format where each line would have the ID, the name of the game, the country it was played in, and the year of the game:

id,game,nation,year
1,Ashes,Australia,1987
2,Asia Cup,India,1999
3,World Cup,London,2000

The query to import the data would now also have the code to create a country node and relate the game with that country:

LOAD CSV WITH HEADERS FROM " http://192.168.0.1/data/games.csv" AS LineOfCsv
MERGE (nation:Nation { name: LineOfCsv.nation })
CREATE (game:Game { id: toInt(LineOfCsv.id), game: LineOfCsv.game, year:toInt(LineOfCsv.year)})
CREATE (game)-[:PLAYED_IN]->(nation)

Now, we go for importing the relationship data between the players and the games to complete the graph. The association would be many to many in nature since a game is related to many players and a player has played in many games; hence, the relationship data is stored separately. The user-defined field id in players and games needs to be unique for faster access while relating and also to avoid conflicts due to common IDs in the two sets. Hence, we index the ID fields from both the previous imports:

CREATE CONSTRAINT ON (person:Person) ASSERT person.id IS UNIQUE
CREATE CONSTRAINT ON (movie:Movie) ASSERT movie.id IS UNIQUE

To import the relationships, we read a line from the CSV file, find the IDs in players and games, and create a relationship between them:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "http://path/to/your/csv/file.csv" AS csvLine

MATCH (player:Player { id: toInt(csvLine.playerId)}), (game:Game { id: toInt(csvLine.movieId)})
CREATE (player)-[:PLAYED {role: csvLine.role }]->(game)

The CSV file that is to be used for snippets such as the previous one will vary according to the dataset and operations at hand, a basic version of which is represented here:

playerId,gameId,role
1,1,Batsman
4,1,WicketKeeper
2,1,Batsman
4,2,Bowler
2,2,Bowler
5,3,All-Rounder

In the preceding query, the use of PERIODIC COMMIT indicates to the Neo4j system that the query can lead to the generation of inordinate amounts of transaction states and therefore would require to be committed periodically to the database instead of once at the end. Your graph is now ready. To improve efficiency, you can remove the indexing from the id fields and also the field themselves the nodes since they were only needed for the creation of the graph.

Exporting data from Neo4j

Inherently, Neo4j has no direct format to export data. For the purpose of relocation or backup, the Neo4j database as a .db file can be stored, which is located under the DATA directory of your Neo4j base installation directory.

Cypher query results are returned in the form of JSON documents, and we can directly export the json documents by using curl to query Neo4j with Cypher. A sample query format is as follows:

curl -o output.json -H accept:application/json -H content-type:application/json --data '{"query" : "your_query_here" }' http://127.0.0.1:7474/db/data/cypher

You can also use the structr graph application platform (http://structr.org) to export data in the CSV format. The following curl format is used to export all the nodes in the graph:

curl http://127.0.0.1:7474/structr/csv/node_interfaces/export

To export relationships using the structr interface, the following commands are used:

curl http://127.0.0.1:7474/structr/csv/node_interfaces/out
curl http://127.0.0.1:7474/structr/csv/node_interfaces/in

These store the incoming and outgoing relationships in the graph. Although the two sets of results overlap, this is necessary with respect to nodes in order to retrieve the graph. A lot more can be done with structr other than exporting data, which you can find at its official website. Apart from the previously mentioned techniques, you can always use the Java API for retrieval by reading the data by entity, transforming it into your required format (CSV/JSON), and writing it to a file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset