As seen earlier, the VertexRDD is an RDD containing the vertices and their associated attributes. Each element in the RDD represents a vertex or node in the graph. In order to maintain the uniqueness of the vertex, we need to have a way of assigning a unique ID to each of the vertexes. For this purpose, GraphX defines a very important identifier known as VertexId.
The declaration of VertexId is as follows as simply an alias for a 64-bit Long number:
type VertexId = Long
The VertexRDD extends an RDD of a pair of VertexID and vertex attributes represented by RDD[(VertexId, VD)]. It also ensures that there is only one entry for each vertex and by preindexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently.
class VertexRDD[VD]() extends RDD[(VertexId, VD)]
VertexRDD also implements many functions, which provide important functionality related to graph operations. Each function typically accepts inputs of vertices represented by VertexRDD.
Let's load vertices into a VertexRDD of users. For this, we shall first declare a case class User as shown here:
case class User(name: String, occupation: String)
Now, using the file users.txt, create the VertexRDD:
VertexID | Name | Occupation |
1 | John | Accountant |
2 | Mark | Doctor |
3 | Sam | Lawyer |
4 | Liz | Doctor |
5 | Eric | Accountant |
6 | Beth | Accountant |
7 | Larry | Engineer |
8 | Marry | Cashier |
9 | Dan | Doctor |
10 | Ken | Librarian |
Each line of the file users.txt contains VertexId , the Name, and the Occupation, so we can use the String split function here:
scala> val users = sc.textFile("users.txt").map{ line =>
val fields = line.split(",")
(fields(0).toLong, User(fields(1), fields(2)))
}
users: org.apache.spark.rdd.RDD[(Long, User)] = MapPartitionsRDD[2645] at map at <console>:127
scala> users.take(10)
res103: Array[(Long, User)] = Array((1,User(John,Accountant)), (2,User(Mark,Doctor)), (3,User(Sam,Lawyer)), (4,User(Liz,Doctor)), (5,User(Eric,Accountant)), (6,User(Beth,Accountant)), (7,User(Larry,Engineer)), (8,User(Mary,Cashier)), (9,User(Dan,Doctor)), (10,User(Ken,Librarian)))