Getting ready

Let's build a small dataset of the followers:

Follower Followee
John Barack
Pat Barack
Gary Barack
Chris Mitt
Rob Mitt

Our goal is to find out how many followers each node has. Let's load this data in the form of two files: nodes.csv and edges.csv.

The following is the content of nodes.csv:

1,Barack 
2,John
3,Pat
4,Gary
5,Mitt
6,Chris
7,Rob

The following is the content of edges.csv:

2,1,follows 
3,1,follows
4,1,follows
6,5,follows
7,5,follows

You can load the files to hdfs using the following commands:

$ hdfs dfs -mkdir data/na
$ hdfs dfs -put nodes.csv data/na/nodes.csv
$ hdfs dfs -put edges.csv data/na/edges.csv
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset