Computing the degrees of the network nodes

We are now going to explore the three graphs, and introduce an important property of a network node, which is the degree of the node.

The degree of a node represents the number of links it has to other nodes. In a directed graph, we can make a distinction between the incoming degree of a node or an in-degree, which is the number of its incoming links, and its outgoing degree or out-degree, which is the number of nodes that it points to. In the following sections, we will explore the degree distributions of the three example networks.

In-degree and out-degree of the Enron email network

For the Enron email network, we can confirm that there are roughly ten times more links than nodes:

scala> emailGraph.numEdges
res: Long = 367662

scala> emailGraph.numVertices
res: Long = 36692

Indeed, the in-degree and out-degree of the employees are exactly the same in this example as the email graph is bi-directed. This can be confirmed by looking at the average degrees:

scala> emailGraph.inDegrees.map(_._2).sum / emailGraph.numVertices
res: Double = 10.020222391802028

scala> emailGraph.outDegrees.map(_._2).sum / emailGraph.numVertices
res: Double = 10.020222391802028

If we want to find the person that has e-mailed to the largest number of people, we can define and use the following max function:

def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
  if (a._2 > b._2) a else b
}

Let's see the output:

scala> emailGraph.outDegrees.reduce(max)
res: (org.apache.spark.graphx.VertexId, Int) = (5038,1383)

This person could be an executive or an employee, acting as a hub to the organization. Similarly, we can define a min function to find people. Now, let's check if there are some isolated groups of employees at Enron using the following code:

scala> emailGraph.outDegrees.filter(_._2 <= 1).count
res83: Long = 11211

It seems that there are many employees who receive e-mails from only one employee—perhaps their bosses or from the human resources department.

Degrees in the bipartite food network

For the bipartite ingredient-compound graph, we can also explore which food has the largest number of compounds, or which compound is the most prevalent in our list of ingredients:

scala> foodNetwork.outDegrees.reduce(max)
res: (org.apache.spark.graphx.VertexId, Int) = (908,239)

scala> foodNetwork.vertices.filter(_._1 == 908).collect()
res: Array[(org.apache.spark.graphx.VertexId, FNNode)] = Array((908,Ingredient(black_tea,plant derivative)))

scala> foodNetwork.inDegrees.reduce(max)
res: (org.apache.spark.graphx.VertexId, Int) = (10292,299)

scala> foodNetwork.vertices.filter(_._1 == 10292).collect()
res: Array[(org.apache.spark.graphx.VertexId, FNNode)] = Array((10292,Compound(1-octanol,111-87-5)))

The answers to the earlier two questions turn out to be the black tea and the compound 1-octanol.

Degree histogram of the social ego networks

Similarly, we can compute the degrees of the connections in the ego network. Let's look at the maximum and minimum degrees in the network:

scala> egoNetwork.degrees.reduce(max)
res91: (org.apache.spark.graphx.VertexId, Int) = (1643293729,1084)

scala> egoNetwork.degrees.reduce(min)
res92: (org.apache.spark.graphx.VertexId, Int) = (550756674,1)

Suppose that we now want to have the histogram data of the degrees. Then, we can write the following code to do just that:

egoNetwork.degrees.
  map(t => (t._2,t._1)).
  groupByKey.map(t => (t._1,t._2.size)).
  sortBy(_._1).collect()

res: Array[(Int, Int)] = Array((1,15), (2,19), (3,12), (4,17), (5,11), (6,19), (7,14), (8,9), (9,8), (10,10), (11,1), (12,9), (13,6), (14,7), (15,8), (16,6), (17,5), (18,5), (19,7), (20,6), (21,8), (22,5), (23,8), (24,1), (25,2), (26,5), (27,8), (28,4), (29,6), (30,7), (31,5), (32,10), (33,6), (34,10), (35,5), (36,9), (37,7), (38,8), (39,5), (40,4), (41,3), (42,1), (43,3), (44,5), (45,7), (46,6), (47,3), (48,6), (49,1), (50,9), (51,5),...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset