Community detection through label propagation

In the following section, we are going to implement a community detection algorithm using the Pregel interface. Label Propagation Algorithm (LPA) is a simple and fast method for detecting communities within graphs. By construction, the communities obtained by the label propagation process require each node to have at least as many neighbors within its community as it has with each of the other communities.

Let's quickly describe how the LPA works. First, each node is initially given its vertex ID as its label. At the subsequent iterations, each node determines its community, based on the labels of its neighbors. Specifically, the node chooses to join the community to which the maximum number of its neighbors belong to. If there is a tie, one of the majority labels is picked randomly. As we propagate the labels in this way across the graph, most labels will disappear, whereas the remaining ones define the communities. Ideally, this iterative algorithm converges when no node in the network changes its label. As a result, nodes having the same labels are grouped together as one community.

By implementing this algorithm in Pregel, we want to obtain a graph in which the vertex attributes are the labels of the community affiliations. Hence, we'll first initialize the LPA graph by setting the label of each vertex to its identifier:

val lpaGraph = graph.mapVertices { case (vid, _) => vid }

Next, we'll define the type of message to Map[Label, Long], which associates a community label to the number of neighbors that have this label. The initial message that will be sent to each node is simply an empty map:

type Label = VertexId
val initialMessage = Map[Label, Long]()

Following the Pregel programming model, we define a sendMsg function, which is used by each node to inform its neighbors of its current label. For each triplet, the source node will receive the destination node's label, and vice versa:

def sendMsg(e: EdgeTriplet[Label, ED]): Iterator[(VertexId, Map[Label, Long])] = 
    Iterator((e.srcId, Map(e.dstAttr -> 1L)), (e.dstId, Map(e.srcAttr -> 1L)))

After receiving the messages from its neighbors, a node determines its community label as the one to which the majority of its neighbors currently belong to. Hence, each node will use the following vertex program function to do so:

def vprog(vid: VertexId, attr: Long, message: Map[Label, Long]): VertexId = if (message.isEmpty) attr else message.maxBy(_._2)._1

The previous function returns, in each iteration, the label (that is, a VertexId attribute) of the community to which the majority of its neighbors currently belong to.

We also need a mergeMsg function to combine all the messages, received by a node from its neighbors into a single map. If both the messages contain the same label, we simply sum up the corresponding number of neighbors for this label:

def mergeMsg(count1: Map[Label, Long], count2: Map[Label, Long])
  : Map[VertexId, Long] = {
  (count1.keySet ++ count2.keySet).map { i =>
    val count1Val = count1.getOrElse(i, 0L)
    val count2Val = count2.getOrElse(i, 0L)
    i -> (count1Val + count2Val)
  }.toMap
}

Finally, we can run the LPA algorithm as we did for equalizing the social wealth by calling the pregel method on the graph:

lpaGraph.pregel(initialMessage, 50)(vprog, sendMsg, mergeMsg)

Note

The main benefits of LPA are its simplicity and time efficiency. In fact, the number of iterations to convergence has been observed to be independent of the graph size whereas each iteration has a linear time complexity. Despite its advantages, the label propagation algorithm may not necessarily converge and it may also result in uninteresting solutions, such as each node being identified as a single community. Actually, the algorithm may oscillate for graphs that are bipartite or have a nearly bipartite structure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset