Finding the minimum spanning tree

With the properties we just discussed, we can now define an algorithm for finding the minimum spanning tree of a graph. Suppose a set of edges F is already given and they are members of the minimum spanning tree G. Now we are trying to find another edge that is also a member of the minimum spanning tree. First, we choose an edge e whose cost is minimum when compared to the rest of the edges, E and F in this case. Since some of the edges are already given, some of the vertices are already connected. If the chosen edge e is between two vertices that are already connected, we simply reject this edge and find the next edge with minimum cost. We do this until we find an edge f between two vertices that are not already connected. Our claim is that f is a new member of the minimum spanning tree. To confirm this, let's assume that f is between the vertices A and B. From the description of our procedure, A and B are not connected. Let's make two partitions of the vertices: H and J. Let's have all the vertices that are connected to A, including A in H, and all the vertices connected to B, including B in J. The rest of the vertices are assigned to the set H. Since H and J are partitions of the vertices in the original graph G, we have a cut X in the original graph G that splits the graph G in a way that all the vertices in H are placed in one of the subgraphs and all the vertices in J in the other. We know that the member of X that has the minimum cost is a member of the minimum spanning tree G. Now, of course, f is a member of X as it connects A to B. It is also the edge with the minimum cost among all the edges in X. This is because all the edges in X are in the remaining edges (otherwise some vertices in H and some in J would be connected, which cannot be true because of the way we have created the two sets), and f is the minimum cost in all the remaining edges. This means f is a new member of the spanning tree. Therefore, we use the following steps to build the minimum spanning tree:

  1. Start with an empty set of edges as the spanning tree.
  2. If more edges are remaining and all the vertices are not connected, choose the one with the minimum cost.
  3. If the edge is between two connected vertices, discard it and go back to step 2.
  4. Otherwise, add the edge to the set of edges of the minimum spanning tree.
  5. Repeat from step 1.

The problem now is how to efficiently know whether the two vertices are connected. The solution is a data structure called a union set forest.

Union find

The purpose of union find is to be able to tell whether the two given objects are members of the same set. This data structure allows you to first specify all the members of a universal set and then specify which ones are members of the same partition, thus joining the two partitions to make a single partition. It represents a collection of partitions of the universal set, and it lets us query whether the two members of the universal set are members of the same partition.

A tree is kept in an opposite pointer form, that is, the child knows its parent; however, the parent does not have any pointers to the children. The idea is to have connected values in the same tree. Each tree in a forest has a representative node that is its root. If two nodes have the same representative roots, they are in the same partition; otherwise, they are not.

This data structure has three important operations:

  • Add a new object to the universal set.
  • Union two objects: This will result in the partitions those objects belong to joining together to make a single partition.
  • Find: This will return the representative object of the partition that the object passed belongs to. If the result of the find operations of two different objects is the same, the object would belong to the same partition.

The following is an implementation of a Union find that can hold comparable objects:

public class UnionFind<E extends Comparable<E>> {

Each node holds a reference to its parent. If there is no parent, that is, if the node is the root of its tree, the parent is null. All nodes also store its rank, which is the height of the tree rooted by itself:

    private class Node implements Comparable<Node>{
        Node parent;
        E object;
        int rank;

        public Node(E  object) {
            this.object = object;
            rank = 0;
        }


        @Override
        public int compareTo(Node o) {
            return object.compareTo(o.object);
        }
    }

All the nodes are stored in a red black tree so they have a logarithmic search:

    BinarySearchTree<Node> allNodes = new RedBlackTree<>();

Additionally, we want to keep a count of the number of partitions available:

    int partitionCount;

The add operation adds a new object to the universal set, which is implemented using a red black tree:

    public void add(E object){
        Node n = new Node(object);
        allNodes.insertValue(n);
        partitionCount++;
    }

This is an internal method that traverses the parents one by one until it finds the root of the tree that the object passed belongs to:

    Node findRoot(Node n){
        if(n.parent==null){
            return n;
        }else{
            return findRoot(n.parent);
        }
    }
Union find

Figure 6. Union operation in a union find forest. The partitions it represents are shown on the side.

The union operation merges two trees. This is achieved by setting one of the roots of the two trees as the parent of the root of the other tree. When merging two trees of unequal height, the root of the taller tree is set as the parent of the root of the shorter tree; otherwise, any one of the two is chosen as the root. The rank of the root increases only when two equal trees are merged. When unequal trees are merged, the height of the merged tree is the same as the height of the taller tree. Figure 6 shows the union operation:

    public void union(E o1, E o2){
        BinaryTree.Node<Node> node1 = allNodes.searchValue(new Node(o1));
        BinaryTree.Node<Node> node2 = allNodes.searchValue(new Node(o2));
        if(node1==null || node2==null){
            throw new IllegalArgumentException("Objects not found");
        }
        Node n1 = node1.getValue();
        Node n2 = node2.getValue();
        Node p1 = findRoot(n1);
        Node p2 = findRoot(n2);
        if(p1==p2){
            return;
        }
        int r1 = n1.rank;
        int r2 = n2.rank;
        if(r1>r2){
            p2.parent = p1;
        }else if(r2>r1){
            p1.parent = p2;
        }else{
            p2.parent = p1;
            p1.rank++;
        }
        partitionCount--;
    }

The find operation involves looking up the node related to the object first and then finding the root node. The object contained in the root node is returned:

    public E find(E object){
        BinaryTree.Node<Node> node1 = allNodes.searchValue(new Node(object));
        if(node1==null){
            throw new IllegalArgumentException("Objects not found");
        }
        Node n = node1.getValue();
        return findRoot(n).object;
    }
}

Complexity of operations in UnionFind

First, let's consider the complexity of finding the root node of any node; this is an internal operation. The complexity of this operation is θ(h), where h is the height of the tree. Now what is the upper bound of the height of the tree? Let f(h) be the minimum number of nodes in a tree of height h. Trees are always created by merging two smaller trees. If the trees being merged have unequal heights, the height of the merged tree is the same as the taller of the two initial trees, and now it has more nodes than the original taller tree. This is not the way to create the worst tree, which is the tree with minimum nodes with the same height. The worst tree must be created by merging two equal trees, both of which are worst trees themselves. After you merge them, the height of the merged tree is one more than the height of either of the trees being merged. So, for creating a worst tree of height h+1, we must merge two worst trees of height h. The operation to do this is f(h+1) = 2 f(h). So, if f(0) = C, where C is some constant, f(1) = 2C, f(2)=4C, …, f(h) = 2 h C. Therefore, if the number of nodes is n, then n ≥ f(h) = 2 h C => lg n ≥ lg (2 h C) = h + lg C => h ≤ lg n – lg C => h = O(lg n). This means the complexity of finding the root of a given node is also O(lg n).

  • Adding a new object involves inserting a new node in the red black tree, so the complexity is θ(lg n).
  • Find: This operation involves looking up the node that corresponds to the object first. This is a search operation in the red black tree; hence it is θ(lg n). After this, it involves looking up the root of this node, which is O(lg n). And at the end, we return the object in the node, which is constant time. Hence, the complexity of the entire find operation is θ(lg n).
  • Union involves three operations. The first is to search the nodes for the objects, which is θ(lg n). The second is to find the roots of each of the trees associated with the nodes, which is also O(lg n). And finally, it involves the merging of the trees, which is a constant time operation. So the complexity of the entire operation is θ(lg n).

Implementation of the minimum spanning tree algorithm

Now we can implement our minimum spanning tree algorithm. First, we create a class that we will use for the implantation. The CostEdge class represents an edge along with its cost. The compareTo method is overridden to compare the costs instead of IDs:

class CostEdge extends Edge{
        Integer cost;

        public CostEdge(int source, int target, int cost) {
            super(source, target);
            this.cost = cost;
        }

        @Override
        public int compareTo(Edge o) {
            return cost - ((CostEdge)o).cost;
        }
    }

The argument costFinder is a lambda that returns the cost of an edge from the value that is stored in it. edgeQueue is a priority queue that lets us consider the edges in the order of their costs. We can dequeue the edge with the minimum cost every time, as our algorithm requires. The purpose of unionFind is to keep track of which vertices are connected after some edges are already chosen. First, we traverse through all the edges and enqueue them to the priority queue, then we traverse through all the vertices to add them to unionFind. After this, as described in our algorithm, we pick the edges in the order of their costs and add them only when they are not between the vertices that are already connected. The unionFind keeps track of which vertices are connected. The edges of the spanning tree are returned in a linked list:

default LinkedList<Edge> minimumSpanningTree(OneArgumentExpression<E,Integer> costFinder){
        if(!isUndirected()){
            throw new IllegalStateException(
              "Spanning tree only applicable to undirected trees");
        }
        LinkedList<Edge> subGraph = new LinkedList<>();

        PriorityQueue<CostEdge> edgeQueue = new LinkedHeap<>((x, y)->x.compareTo(y));

        UnionFind<Integer> unionFind = new UnionFind<>();

        this.visitAllConnectedEdges(getAllVertices().getRoot().getValue(), (s,t,v)-> edgeQueue.enqueue(
            new CostEdge(s,t,costFinder.compute(v))), TraversalType.DFT);

        this.getAllVertices().traverseDepthFirstNonRecursive(
          (x)->unionFind.add(x),
                BinaryTree.DepthFirstTraversalType.PREORDER);

        while((unionFind.getPartitionCount()>1 
                               && edgeQueue.checkMinimum()!=null){
            Edge e = edgeQueue.dequeueMinimum();
            int sGroup = unionFind.find(e.source);
            int tGroup = unionFind.find(e.target);
            if(sGroup!=tGroup){
                subGraph.appendLast(e);
                unionFind.union(e.source, e.target);
            }
        }
        return subGraph;
    }

Complexity of the minimum spanning tree algorithm

Visiting all the edges and adding them to the priority queue can be as low as Ө (|V| + |E| + |E| lg |E|) because traversing through the edges is Ө (|V| + |E| ) and adding all of them to the priority queue is Ө (|E| lg |E|). Since a connected graph has |E| ≥|V| -1, Ө (|V| + |E| + |E| lg |E|) = Ө (|E| lg |E|). Inserting all the vertices to the union find is done through Ө (|V| lg |V|) because adding each vertex has the complexity Ө (lg |V|).

Now let's consider the core of the algorithm. For each edge, dequeueing the minimum edge is Ө (lg |E|), finding each of the source and the target in unionFind is Ө (lg |V|), adding to the linked list is constant time, and doing a union on union find is Ө (lg |V|). So, for each edge, the complexity is Ө (lg |V| + lg |E|). This is Ө (lg |E|) for each edge, as |E| ≥|V| -1. Therefore, the complexity of the core part is Ө (|V| lg |E|) because we stop after |V| - 1 number of edges are added and all the vertices are already connected.

Adding the complexity of all the preceding steps, we get the total complexity of the minimum spanning tree algorithm as Ө (|E| lg |E|) + Ө (lg |V|) + Ө (|V| lg |E|) = Ө (|E| lg |E| + |V| lg |V|) = Ө (|E| lg |E| ) as |E| ≥|V| -1.

This algorithm is called Kruskal's algorithm, invented by Joseph Kruskal. Kruskal's algorithm works with the complexity Ө (|V| lg |E| ) if a sorted list of edges is already available. Since we have checked until all the edges are processed, if a graph is passed that is not connected, it will give a set of minimum spanning trees, one for each connected subgraph.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset