The concept of traversing across various nodes (or vertices) in a graph along connected edges is termed as graph traversal. The traversing across nodes is typically organized, but can sometimes be random. However, in both the scenarios, traversing begins from a specified start node and ends at a specified final node. Usually, the start and final nodes are not directly connected. In order to establish an indirect connection, a selective organized search is instantiated across the various connected paths. Graph traversal algorithms are generally designed to begin at a given start node and then search for subsequent connected nodes before terminating at the given final node.
Additionally, these traversing algorithms also need to factor in some plausible issues such as the following:
These issues are typically addressed by keeping track of the nodes encountered along the paths traversed. The nodes once marked are generally not visited again, unless otherwise specified. Thus, infinite looping can be prevented from happening. In addition, if all the nodes are not visited in a traverse, then a new traversal is initiated from an unmarked node, which continues until all the nodes are visited at least once.
The following R code shows the structure of a graph traverse algorithm. It takes in Graph_ADT
, number of vertices n
, and a vector of vertices
as inputs:
graph_Traverse <- function(Graph_ADT,n,vertices)
{
## Initialise marks to zero
verticesMarks <- list()
for( i in 1:n)
verticesMarks[[i]] <- Graph_ADT$initMark(vertices[i],0) ## 0 means
not visited
## Initiate traversing upon checking for unmarked nodes
for(i in 1:n)
if(Graph_ADT$getMark(vertices[i])==0)
initTraverse(verticesMarks,vertices[i])
}
The following are some approaches to implement graph traversal algorithms (the initTraverse
function):
DFS is a recursive implementation of the graph traversal algorithm, applicable to both directed and undirected graphs. At each step during the traverse, DFS recursively checks and visits all the unvisited nodes, which are directly connected with the node under consideration. Simultaneously, all the visited nodes along the path are pushed into a stack in the order of the traverse. During the traverse, if nodes with no unvisited nodes directly connected to them are encountered, then such nodes are popped out of the stack leaving behind the ones with directly connected unvisited nodes. This keeps track of those nodes that determine the forward traverse path such that all the nodes are visited prior to culmination. The following R code implements the DFS algorithm with three inputs. The inputs are Graph_ADT
, n
(number of nodes in the graph), and v
(node under consideration to perform DFS):
DepthFirstSearch <- function(Graph_ADT, n, v)
{
## Ensure all nodes are visited and processed prior node v
preVisit(v)
## mark the node v under consideration as 1 (i.e. visited)
VerticesMarks <- list()
VerticesMarks[[v]] <- Graph_ADT$initMark(v,1)
## Recursively visit all connected nodes of v till all are marked
as 1
## get the first vertex
node <- Graph_ADT$firstVertex(v)
## check node belongs to neighboring nodes using conVert function
while(node %in% conVert(v)){
## check if the node is unvisited
if(Graph_ADT$getMark(VerticesMarks[[node]] == 0))
## recursively run DFS
DepthFirstSearch(Graph_ADT,n, node)
## assign next neighbouring vertex
node <- Graph_ADT$nextVertex(v,node)
}
## Run post processing remaining un-visited nodes
postVisit(v)
}
An example of a DFS algorithm on an undirected graph is shown in Figure 8.4.
Figure 8.4: Represents an undirected graph (left) along with its final search path (right) using a DFS algorithm
Figure 8.4 illustrates an initial undirected graph and its corresponding final search path obtained using the DFS algorithm. Figure 8.5 illustrates the working of DFS in detail using stacks. The sub-function nextVertex
selects the node with the lowest index value among all the directly connected unvisited nodes. As the graph is undirected, the DFS algorithm can move in either of the directions as against the directed graph, where the DFS algorithm can move only in a single direction.
The asymptote of the DFS algorithm is θ(|E|+|V|). |E| represents a visit to each node (traversing all edges only once) and |V| represents visiting each node (only once).
Fig 8.5: Illustration of recursive processing using the DFS algorithm and using A as a start vertex
BFS works on a principle similar to the DFS algorithm except the following:
The following R code implements the BFS algorithm using four inputs. The inputs are Graph_ADT
, startVertex
(starting nodes of the graph to begin the traverse), queue
(an empty queue to keep track of connected nodes in the order of visit), and n
(total number of nodes in the graph):
BreadthFirstSearch <- function(Graph_ADT,startVertex, queue, n)
{
## initialise an empty queue with a start vertex
queue <- initQueue(startVertex)
## Initialise first vertex by marking it as 1 (visited)
VerticesMarks <- list()
VerticesMarks[[v]] <- Graph_ADT$initMark(v,1)
## Subsequently start processing in queues
while(length(queue) != 0){
## extract first element in the queue
v <- extQueue(queue)
## Pre-Process all directly connected nodes of v
preVisit(v)
## Mark visited nodes with 1 and accordingly queue the nodes
node <- firstVertex(v)
while(node %in% conVert(v)){
if(getMark(graph[node] == 0)){
graph <- Graph_ADT$initMark(node,1)
queue <- initQueue(node)
}
node <- Graph_ADT$nextVertex(startVertex,node)
}
}
}
Figure 8.6 illustrates an initial undirected graph and its corresponding final search path obtained using the BFS algorithm.
Figure 8.6 : Represents an undirected graph (left) along with its final search path (right) using the BFS algorithm
Application of the BFS algorithm on queues is illustrated in Figure 8.7.
Figure 8.7 : Illustration of processing using the BFS algorithm and using 1 as a start vertex
The topological sort algorithm is primarily used in scenarios where nodes are conditionally dependent on previous nodes. In other words, the graph traversal can only happen if all its predecessor-connected nodes are visited (or processed). It is generally used in jobs where each stage is scheduled one after the other. For example, during construction of a tower, the columns cannot be raised till the foundation is complete, and roofing cannot be done till the columns are erected. Here, laying the foundation is followed by the erection of columns, which is followed by laying of the roof.
DAG form the basis of topological sort algorithms. In DAG, all the nodes are directionally connected, which takes care of order, and none of the nodes form a cycle, which ensures no conflict with any predecessor nodes (already visited and marked). Thus, DAG safeguards the linearity order among interconnected nodes, thereby being suitable for implementation of the topological sort algorithm. Figure 8.8 illustrates an example of a DAG acceptable for implementing the topological sort algorithm. The topological sort of this graph is 1, 2, 3, 4, 5, 6, 7, 8.
Figure 8.8: An example graph to perform topological sort
The topological sort algorithm is performed using both the DFS and BFS algorithm on DAGs.
In the case of a DFS approach, when a node is visited, no pre-processing is performed (using the preVisit
function), whereas during recursive implementation, if the same node is revisited, then that node is returned as an output (using the postVisit
function). Thus, the order of output-returned nodes is a reverse sort. The output for the preceding example DAG using the DFS algorithm is 8, 7, 6, 4, 5, 2, 3, 1. Thus, the topological sort is reverse of the output, that is-1, 3, 2, 5, 4, 6, 7, 8. This is also called In-order search (pre-order and post-order). Also, in-order search leads to a sorted output.
The following R code implements a topological sort using the recursive DFS algorithm. The inputs are Graph_ADT
, n
(total number of nodes in the graph), and vertices
(a vector of vertices of the graph):
## Main function to perform topological sort
Topological_DFS_sort <- function(Graph_ADT, n, vertices)
{
## initialise all nodes with 0 (unvisited)
verticesMarks <- list()
for(i in 1:n)
verticesMarks[[i]] <<- Graph_ADT$initMark(vertices[i],0)
## Process all nodes by recursive traversing
for(i in 1:n)
if(Graph_ADT$getMark(vertices[i]) == 0)
topological_secondary(Graph_ADT,i)
}
## recursive secondary function to help main function
topological_secondary <- function(Graph_ADT,i)
{
## Mark the node as 1 (visited)
verticesMarks[[i]] <<- Graph_ADT$initMark(vertices[i], 1)
## Perform traversing across connected nodes
v <- Graph_ADT$firstVertex(vertices[i])
while(v %in% conVert(vertices[i])){
if(Graph_ADT$getMark(vertices[i] == 0))
topological_secondary(vertices,v)
v <- Graph_ADT$nextVertex(vertices[i],v)
}
return(v)
}
In the case of the BFS approach, the topological sort algorithm is implemented using queuing logic. Here, the nodes are inserted into the queue, not only purely based on their index value (as described in the previous section), but also taking the account of each node's pre-requisites. One of the widely used pre-requisites is the count of inward edges for each node. These counts determine the constraints for each node. Once each node is assigned its respective counts, the nodes with zero count are considered as starting nodes, and are placed in the queue in a predefined order (for example, based on their index value). Then the queuing process begins, where each node is pushed out of the queue, and all its relevant connected nodes are pushed into the queue. Once a node is pushed out, the counts of its directly connected nodes are decreased by a value of one, and the nodes, which have their current count reduced to zero, are pushed into the queue. The order in which the nodes are pushed out of the queue determines the output of the topological sort. Sometimes, the queue becomes empty and not all the nodes are visited.
These situations arise because of some cyclicity present in the graph or on violation of any of the node's prerequisites. The output of the preceding example using the BFS algorithm in topological sort is 1, 2, 3, 4, 5, 6, 7, 8, 9.
The following R code shows an implementation of BFS in topological sort. The inputs are Graph_ADT
, x
(total number of nodes in the graph), vertices
(a vector of vertices of the graph), and queue
(an empty queue to keep track of connected nodes in the order of visit):
Topological_BFS_sort <- function(Graph_ADT, queue, n, vertices)
{
## Initialise a list to track count of inwards edges for each node
countEdge <- list()
## initialise count of each node to 0
for (i in vertices)
countEdge[[i]] <- 0
## Assign count (inward nodes) prerequisite to each node
for(i in vertices){
v <- Graph_ADT$firstVertex(vertices[i])
while(v %in% conVert(vertices[i])){
countEdge[[v]] <- countEdge[[v]] + 1
v <- Graph_ADT$nextVertex(vertices[[i]],v)
}
}
## Initialize queue with nodes which have zero count of inward
edges
for(i in vertices)
if(countEdge[[i]] == 0)
queue <- Graph_ADT$initQueue(i)
## Process the nodes which are in the queue
while(length(queue) != 0){
v <- extQueue(queue)
print(v)
w <- Graph_ADT$firstVertex(v)
while(w %in% conVert(vertices[v])){
## Decrease the count prerequisite by 1
countEdge[[w]] <- countEdge[[w]] - 1
if(countEdge[[w]] == 0) ## no prerequisites
queue <- initQueue(w)
w <- Graph_ADT$nextVertex(vertices[v],w)
}
}
}