Data structures form an integral part of any algorithm. Heaps, queues, stacks, trees, and hash tables are various forms of data structures widely used across programming languages. Some of them are primarily used for look-ups, such as trees and hash tables, while others, such as heaps, queues, and stacks, are used for update modifications, such as insertions and deletions. The current chapter will build foundation to extend conventional data structure to functional data structure. In this chapter, you will learn the following concepts:
Functional data structures are special forms of data structure, which are implemented primarily in functional programming languages. R supports functional programming by providing tools for creation and manipulation of functions. For example, R support assigning functions to variables and passing them as an argument within a function. The R support generating the function dynamically and returning them as a result of the function is also known as a closure function. For example, the function which takes a function as an argument is shown as follows:
arg_function <- function(g) g(seq(1, 100, by=1))
The function arg_function
can take functions as an argument, such as mean
or sd
as shown in the following code snippet:
> arg_function(mean) [1] 50.5 > arg_function(sd) [1] 29.01149
The functional data structure is also known as persistent data structure as they are immutable in the sense that any operation performed on function data structure will create a new copy of data structure with an updated operation. The original functional data structure will always remain intact. The functional data structure possesses different characteristics compared to traditional data structures. These are highly flexible in implementation and support the properties of immutability and persistency. In addition, they are thread-safe; that is, functional data structure ensures safe execution even in multithreaded environment during data structure manipulation.
The following are some benefits of immutability:
The following are some benefits of persistency:
The following are some benefits of thread safety:
Functional data structures used in functional programming languages support lazy evaluation. In lazy implementation, the arguments of a function are evaluated if and only if the computational results are used further within the function. Furthermore, once the arguments are evaluated, the computational results are cached and can be reused later, if needed again, using look-up instead of recomputation. This kind of caching (also termed as memoization) makes it highly difficult to estimate the asymptotic complexity of the algorithm as it is not straightforward to determine when a given argument (or a subexpression) will be evaluated.
Lazy evaluation plays a key role in the implementation of purely functional amortized data structures. It is extremely difficult to analyze the asymptotic performance of algorithms involving lazy evaluation. However, the following framework provides a basic support to calculate the asymptotic performances of algorithms involving lazy evaluation. Firstly, the costs of any given operation are classified, as follows:
Thus, a deferred operation can only be forced and memoized once the accumulated debt is completely paid off. As the total amount of accumulated cost is capped at realized shared costs, the amortized cost cannot increase beyond the total actual cost.
A stack is a First In Last Out (FILO) form of a data structure, wherein both insertions and deletions usually happen at the beginning (or top) of the stack. These forms are widely used in Depth-first search (DFS) algorithms. In ideal scenarios, the insertion, removal, and peeking operations require very little time generally with an asymptote of O(1) for the worst case scenario. Thus, during a worst case scenario, n insertions or removals would have an asymptote of O(n), provided average asymptote of each operation of O(1).
Now, let's understand the implementation of fully-persistent stacks in R. The implementation of a fully-persistent stack data structure is available in an rstackdeque
cran package contributed by Shawn T. O'Neil. In R, mutability can be ensured using environment variables using side-effect-free interface functions.
In this package, stacks are implemented using unordered linked lists, wherein each node (list) consists of both data elements and reference to the next node. These stacks are S3 objects, accessible using the head stack node.
Let's consider a stack with five character elements:
a <- as.rstack(c("p", "q", "r","s","t"))
Analogous to a traditional push
function, insert_top
is used to return a stack with new elements at the top. Here, instead of creating a new stack, the head element of b
is pointed towards the o
element, which is in-turn pointed toward stack a
, as shown in Figure 10.1:
b <- insert_top(a, c("o"))
In case of withdrawal of top element from stack (pop
), the without_top
function is used. Similar to insert_top
, the primary a
stack is not destructively updated, but the pointer of stack c
is shifted towards right by one element, as shown in Figure 10.1:
c <- without_top(a)
The peek_top
function is used to return the data element present at the top of the stack:
d <- peek_top(a)
The following Figure 10.1 illustrates the implementation of fully-persistent stacks in R:
Figure 10.1: Working of fully-persistent stacks based on different types of insertion or deletion operations. (a) represents a stack of five characters, each character linked to another character as in case of linked lists. (b) represents a stack after inserting a new element o on top of the stack (a). (c) represents a stack after removing the top element of the stack (a). (d) represents a character vector with top element of stack (a).
A queue is a First In First Out (FIFO) form of a data structure, wherein deletions happen in the same order of insertion. One way of implementing queue is to insert elements from end (top of the queue) and delete elements from opposite end (bottom of the queue). Some queues support insertions and deletions from both the ends. These are termed as deques or double-ended queues. Queues and deques are used in Breadth-first search (BFS) algorithms. Similar to stacks, ideally the insertion, removal, and peeking operations require very little time with an asymptote of O(1) for the worst case scenario. Thus, during a worst case scenario, n insertions or removals would have an asymptote of O(n), provided average asymptote of each operation of O(1).
Now, let's understand the implementation of fast, fully-persistent and slowly-persistent queues and deques in R using the rstackdeque
cran package contributed by Shawn T. O'Neil.
Fast and fully-persistent queues are governed primarily by recursively-defined operations and delayed evaluations. In other words, these queues are implemented using lazy lists wherein the first elements are immediately accessible and rest of the elements are evaluated with a delay. This is helpful, in the case of recursive large lists where elements are only evaluated whenever accessed.
The R function to implement fast persistent queues is rpqueue
. It comprises of three last lists: l
, r
, lhat
. Here, the elements are inserted at the back of the queue and removed or deleted from the front of the queue. In other words, the insertion of new elements occurs at the top of the r
list and the deletion of existing elements occurs from front of the l
list. These lazy lists are implemented as rstacks
and every node's nextnode
elements are assigned using the delayedAssign
function. These are subsequently memoized on first evaluation.
Traditionally, the queue tries to ensure that the length of l
stack is at least equal to the length of the r
stack. However, during any insertion or deletion operation, the lengths are disturbed. The length of r
stack increases upon insertion and the length of l
stack decreases upon deletion. Post these operations, the l
and r
stacks are readjusted by appending the last elements of r
stack with l
stack. This readjustment requires an asymptote of O(1) for any sort of insertion or deletion happening within l
and r
stacks. Upon readjustment, lhat
is assigned with data of stack l
. Then, for each subsequent insertion or deletion, the elements of lhat
are removed one by one until the lhat
stack becomes empty. Once lhat
becomes empty, the lengths of l
and r
stacks are evaluated and the elements are readjusted, again delaying the time of iteration.
Let's understand the working of fully-persistent queues using an example. Consider a persistent queue of length four:
a <- as.rpqueue(c("p","q","r","s"))
The "p"
, "q"
, and "r"
elements are assigned to left stack l
, element "s"
is assigned to right stack "r"
, and, elements "q"
and "r"
are assigned to stack lhat
. The following Figure 10.2 illustrates working of persistent queues based on insertion and deletion operations. The insertion operation is performed using the insert_back
function and deletion operation is performed using the without_front
function.
Figure 10.2: Working of fully-persistent queues based on different types of insertion or deletion operations. (a) represents a new persistent queue. (b) represents a queue after inserting a new element "t" in the back of queue (a). (c) represents a queue after deleting the front element from queue (b). (d) represents queue after inserting a new element "v" at the back of queue (c). (e) represents queue after inserting a new element "w" at the back of queue (d). (f) represents a queue after deleting an element from front of queue (e).
The following are R codes, which are used in the preceding illustration:
b <- insert_back(a, "t") # insert a new element "t" c <- without_front(b) # remove front element "b" d <- insert_back(c,"v") # insert a new element "v" e <- insert_back(d,"w") # insert a new element "w" f <- without_front(e) # remove front element "e"
The queues are implemented as two stacks, that is, the left stack and the right stack. The left stack holds the first set of elements of the queue, which is used for deletion operations, and the right stack holds the last set of elements of the queue, which is used for insertion operations. On the other hand, the left stack can also be used for insertion and the right stack for deletion. The right stack holds elements in the reverse order as shown in Figure 10.3.
Let's consider an a queue with seven character elements:
a <- as.rdeque(c("p", "q", "r","s","t","u","v"))
The a
queue is split into left and right queues, as illustrated in Figure 10.3 and Figure 10.4.
Using the insert_front
or insert_back
functions, the elements are inserted at the front or back of the queue respectively:
b <- insert_front(a, c("o")) c <- insert_back(a, c("w"))
Similarly, elements can also be deleted from the front or back of the queue using the without_front
or without_back
functions:
d <- without_front(a) e <- without_back(a)
The following Figure 10.3 and Figure 10.4 describes implementation of slowly-persistent queues in R:
Figure 10.3: Working of slowly-persistent queues and dequeus based on different types of insertion or deletion operations. (a) represents a queue with left and right stacks. (b) represents queue after inserting a new element "o" at the front of queue (a).
The Figure 10.3 presents how slowly-persistent queues stores initial seven character element passed to a
. The left side of the queue can be extracted using a$l
and similarly right side of queue can be extracted using a$r
. The b
part of image demonstrate how functional queue a
will be updated when an element o
is added using function insert_front(a, c("o"))
.
Figure 10.4: (c) represents queue after inserting a new element w in the back of queue (a). (d) represents queue after deleting the front element from queue (a). (e) represents queue after deleting the last element from queue (a).
In the current implementation, double-ended queues, which rebalance after every insertion or deletion, are used as shown in Figure 10.4. If both left and right stacks become highly unbalanced, then they are both first decomposed into a list and then recomposed back into two nearly balanced stacks.