Minimum-cost spanning tree

A Minimum Spanning Tree (MST) works on graphs with directed and weighted (non-negative costs) edges. Consider a graph G with n vertices. The spanning tree is a subgraph of graph G with all its n vertices connected to each other using n-1 edges. Thus, there is no possibility of a cycle with the subgraph. If the spanning tree does have a cycle, then it is advisable to remove any one edge, probably the one with the highest cost. The spanning tree with the least sum of edge weights is termed as a MST. It is widely used in applications such as laying of power cables across the city, connecting all houses using the least length of power cables. Here, the weight of each edge is the length of the cable, and the vertices are houses in the city. The most common algorithms to find the minimum cost spanning tree are Prim's algorithm and Kruskal's algorithm. Figure 8.11 shows the minimum cost spanning tree for an undirected-weighted graph.

Minimum-cost spanning tree

Figure 8.11: Illustration of an undirected graph (left) and its minimum-cost spanning tree (right)

Prim's algorithm

Prim's algorithm works on lines similar to Dijkstra's algorithm to find the least cost edges connecting all the vertices in the graph. In case of Dijkstra's algorithm, the selection of least cost edges depends primarily on the source vertex, whereas in case of Prim's algorithm, the least cost edge does not depend on any source vertex. Now let us understand the working of Prim's algorithm in detail. Consider a graph G with n vertices, all edges weighted with non-negative costs. Initially, select any vertex V from the graph to begin the traverse. Then, look for all its connected edges, and select the one with the least cost. Let us assume the next selected vertex based on least cost is W. Now, again look for all connected edges of V and W, and select the edge with the least cost. Now, add this new vertex with the V and W. Continue searching for least cost edges until all the vertices are traversed across the graph. The selected edges will then form a MST of the graph such that one can traverse from any vertex to any other vertex. Prim's algorithm works greedily, because at each step, the algorithm tries to select the new (unmarked) vertex, which has the least edge cost compared to all the edges connected to a certain marked vertex.

The algorithm's key purpose is to select least cost edges, but it does not care whether the selected vertices form a minimum-cost spanning tree or not. This trickles down to a question of whether the output of Prim's algorithm is actually a minimum-cost spanning tree? The proof for the output being a MST is an exercise question for the readers.

Now, let us understand the working of Prim's algorithm with an example graph shown in Figure 8.12. Initialize the algorithm with vertex A, and scan for all its connected edges. It leads to vertex D, as the connecting edge (A, D) has the least cost, 8. Now, assign edge (A, D) to MST, and scan for all other edges connected to vertices A and D. It leads to vertex E, as the connected edge (D, E) has the least cost, 10. Now, assign edge (D, E) to MST, and scan for all other edges connected to vertices A, D, and E. This leads to vertex C, as the connecting edge (A, C) has the least cost, 12. Now, assign edge (A, C) to MST, and scan for all other edges connected to vertices A, C, D, and E. This leads to vertex B, as the connecting edge (C, B) has the least cost, 15. Now, assign edge (C, B) to MST. Thus, a minimum-cost spanning tree is obtained with all the vertices connected using least cost edges.

The key difference in the implementation of Prim's algorithm and Dijkstra's algorithm is in the way the value of a vertex is updated over an extraction. In Dijkstra's algorithm, the distance value of each vertex is updated based on the value of the current vertex and the connecting edge value with the current vertex. However, in case of Prim's algorithm, the distance value of each vertex depends only on the edge value of the current connecting vertex. In the former approach, each vertex seeks closeness towards the source vertex, whereas in the latter approach, each vertex can seek closeness towards any vertex in the graph.

The following R code implements Prim's algorithm. It uses the same priority queue as that of Dijkstra's algorithm:

primMST <- function(Graph_ADT,vertices,n) 
{ 
  library(hashmap)  ## To create new hashmap instances 
  ## Initiate a new priority queue 
  priorityQueue <- PriorityQueueInit$new() 
   
  ## Initiate a hashmap to store shortest distance from source  
  ## vertex to every vertex 
  distanceMap <- hashmap(keys=vertices, values = rep(0,n)) 
   
  ## Initialise a list to store final MST result 
  MSTResult <- list() 
   
  # initialize priority queue with value of all vertices to infinity 
  for( i in vertices) priorityQueue$push(vertices[i],Inf) 
   
  ## begin with a random vertex 
  startVertex <<- vertices[sample(1:n, 1)] 
   
  ## Set the distance of startVertex as zero 
  priorityQueue$values[which(priorityQueue$keys==startVertex)] <- 0 
   
  ## Begin iteration till the all the vertices from priorityQueue  
  ## becomes empty 
  while(length(priorityQueue$keys) != 0){ 
     
    ## Extract vertex with minimum value from priority queue along  
    ## with its value 
    headVertex <- priorityQueue$extractMinVertex() 
     
    ## Assign the key of the head vertex as current vertex 
    currentVertex <- headVertex$key 
     
    ## Append distancemap with current key and its value 
    distanceMap[[currentVertex]] <- headVertex$value 
     
    # Check for all directly connected vertices for current vertex 
    for(conVert in getConVertex(graph,currentvertex)){ 
      ## get all the corresponding edge value 
      edgeValue <- getEdgeValue(graph,currentvertex,conVert) 
       
      ## Check priority queue contains the adjacent connected vertex     
      ## (conVert) or not 
      if(!priorityQueue$keys %in% conVert){ next } 
       
      ## Update the distance with the edge value 
      updDistance <- edgeValue 
       
      ## Check whether the value of the adjacent vertex 
      if(priorityQueue$values[which(priorityQueue$keys==conVert)]  
                                               > updDistance){ 
        priorityQueue$values[which(priorityQueue$keys==conVert)]  
                                               <- updDistance 
        MSTResult[[currentVertex]] <- conVert 
      } 
    } 
  } 
}

Kruskal's algorithm

Similar to Prim's algorithm, Kruskal's algorithm is also a greedy algorithm in which the algorithm greedily selects edges based on edge value to generate a MST. Initially, partition all the vertices into an equivalent number of |V| sets, each with an individual vertex. Then, select an edge with the least cost and combine the equivalent sets of the from and to vertices into a single set. Also add the edge into the MST. Continue the process of selecting the minimum edge until all the vertices combine into a single set. In case of combining two inequivalent sets, first find the sets containing the from and to vertices, and accordingly merge those two sets. If both, from and to vertices for a particular edge, lie in the same set, then ignore the edge and proceed ahead.

Let us understand the implementation of Kruskal's algorithm for the graph in Figure 8.9. First, split the five vertices into five different sets. Then, select edge (A, D), as it has the least weight among all the other edges. As A and D are in two different sets, combine them into a single set, and add the edge (A, D) into MST. Then, select edge (D, E), as it has the second least edge weight. As D and E are in two different sets, combine them into a single set, and add edge (D, E) into MST. Then, select edge (A, C), as it has the third least edge weight. As A and C are in two different sets, combine them into a single set, and add edge (A, C) into MST. Then, select edge (C, B), as it has the fourth least edge weight. As C and B are in two different sets, combine them into a single set, and add edge (C, B) into MST. Thus, all the vertices, A, B, C, D, and E, are present in a single set and the selected edges form a MST. Figure 8.12 illustrates the steps involved in Kruskal's algorithm for the graph in Figure 8.9.

Kruskal's algorithm

Figure 8.12: Illustration of Kruskal's algorithm on an example graph

The edges are processed in the order of edge weights using a priority queue reference class (kruskalArray). Here, pre-sorting of edge weights is not required, hence it reduces system runtime. In the kruskalArray reference class, the edges, along with their from and to vertices can be appended using the push function, and the edge with minimum weight can be extracted using the extractMinEdge function. Once the edge is extracted out, it is then removed from the array. The kruskalArray function is implemented using the R5 class, and it is as follows:

kruskalArray <- setRefClass("kruskalArray", 
  fields = list(fromVertex = "numeric", 
  toVertex = "numeric", 
  weight = "numeric"), 
  methods = list( 
    ## insert new from and to vertices along with edge 
    push = function(f, t, w){ 
      fromVertex <<- append(fromVertex,f) 
      toVertex <<- append(toVertex,t) 
      weight <<- append(weight,w) 
    }, 
    ## extract from and to vertices having minimum edge value 
    ## also remove from, to and edge value from the array 
    extractMinEdge = function() { 
      minPos <- which(weight==min(weight)) 
      from <- fromVertex[[minPos]] 
      to <- toVertex[[minPos]] 
      fromVertex <<- fromVertex[[-minPos]] 
      toVertex <<- toVertex[[-minPos]] 
      weight <<- weight[[-minPos]] 
      return(list(from=from,to=to)) 
    } 
))

Using the disjoinSetPointer function, the operations of union, differ, and find are undertaken. Two different sets of vertices are combined using the union operation, and the differ operation is used to check whether two sets are disjoint or not. In case of a set with more than one vertex, the find operation is used to check whether a vertex belongs to that set or not. The disjoinSetPointer function is implemented using the R5 class, and it is given as follows:

disjoinSetPointer <- setRefClass("disjoinSetPointer", 
  fields = list(vertex = "vector", 
  set1 = "vector", 
  set2 = "vector", 
  currentVertex = "integer"), 
  methods = list( 
    ## merge two sets 
    union = function(set1,set2){ 
      return(c(set1,set2)) 
    }, 
    ## check whether set1 and set 2 are disjoint 
    ## return TRUE if they are disjoint 
    differ = function(set1,set2){ 
    if(sum(set1 %in% set1) ==0){ 
      return(TRUE)} else(return(FALSE)) 
    }, 
    ## Find whether a vertex is in a set or not 
    ## returns root of the currentVertex 
    ## function ROOT returns root of the vector 
    find = function(currentVertex){ 
      return(ROOT(vertex[currentvertex])) 
    } 
))

The following R code implements Kruskal's algorithm using the preceding two reference classes. The inputs are Graph_ADT, the number of vertices, n, and the number of edges, e, in the graph:

kruskalMST <- function(Graph_ADT,n,e) 
{ 
  ## initialize reference classes disjoinSetPointer and    kruskalArray 
  vertexArray <- disjoinSetPointer$new() 
  edgeArray <- kruskalArray$new() 
 
  ## Initialise a list to store final MST result 
  MSTResult <- list() 
 
  ##  Put all the edges in the edgeArray 
  for(i in 1:n){ 
    j <- firstVertex(i) 
    while(i <= n){ 
      edgeArray$push(i,j, Graph_ADT$weightEdge(i,j)) 
    } 
  } 
 
  ## Initialise n equivalent sets 
  numMST <- n 
 
  ## Iteratively combine equivalent sets based on edge weights 
  ## edges are extracted based on their value. Smallest edges are    
  extracted first 
  for(i in 1:e){ 
    while(numMST >= 1){ 
      # get the from and to vertices having minimum edge value 
      temp <- edgeArray$extractMinEdge() 
      fromVertex <- temp$from 
      toVertex <- temp$to 
 
      ## Check whether two vertices are in different sets 
      if(vertexArray$differ(fromvertex,toVertex)){ 
        ## if yes, then combine from and to vertices into one set 
        vertexArray$union(fromvertex,toVertex) 
        ## add this edge to MST 
        MSTResult[[i]] <- c(fromVertex,toVertex) 
        ## decrease the sets by 1 
        numMST <- numMST - 1 
      } 
    } 
  } 
  return(MSTResult) 
}

The asymptote of Kruskal's algorithm, based on system runtime in a worst-case scenario is θ(|E| log|E|), as all the edges need to be processed before the completion of generating a minimum-cost spanning tree. However, quite often, the number of minimum-value edge extractions is equivalent to the number of vertices in the graph (as shown in the preceding example). This makes the algorithm have a system runtime asymptote of θ(|V| log|E|), generally observed in average-and best-case scenarios.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset