A 2-3 tree is a type of tree-based indexing where each internal node in the tree has either two nodes with one key or three nodes with two keys thus, it is classified as a balanced tree. Also, all the nodes in a 2-3 tree are at the same level of tree height. An example of a two-node structure is shown in Figure 7.10:
Figure 7.10: Example of a two-node tree structure
A two-node structure consists of one key and two children/subtrees. All the keys on the left side are smaller, and all the keys on the right subtree are bigger than the key. Similarly, three-node structure has two keys with three children/subtrees, as shown in Figure 7.11:
Figure 7.11: Example of a three-node tree structure
In the three-node structure, the keys are arranged in a sorted order, the first key being the smallest. All keys on the left of the subtree are smaller than the first key. All the keys in the middle of the subtree are greater than the first key and smaller than the second key. All the keys on the right of the subtree are greater than the second key. The node structure for a 2-3 tree can be represented as follows:
tttnode <- function(lkey=NULL, lvalue=NULL, rkey=NULL, rvalue=NULL) {
node <- new.env(hash = FALSE, parent = emptyenv())
node$lkey <- lkey # left Node key
node$lvalue <- lvalue # Node Value
node$rkey <- rkey # right Node key
node$rvalue <- rvalue # right Node Value
node$left <- NULL # left children key
node$center <- NULL # left children key
node$right <- NULL # Right children key
class(node) <- "tttnode"
return(node)
}
The node structure for a 2-3 tree consists of two keys and a value pair with three children. The insertion in 2-3 nodes can be performed using steps listed next:
The insertion, as shown in the preceding image, can be obtained by creating a node using tttnode
, with the following line of code:
extttree <- tttnode(70, 70)
extttree
will check whether it's vacant, and add the element accordingly:
Figure 7.12(b): Insert second value to extttree
# Function to check if node is empty
check_empty<-function(node){
ifelse((is.null(node$lkey) & is.null(node$rkey)), T, F)
}
The insertion script at the root is as follows:
# Function to insert if the node has empty space
leaf_insert<-function(node, key, val){
if(check_empty(node)) return(tttnode(lkey=key, lvalue=val))
if(is.null(node$rkey)){
if(key>node$lkey){
node$rkey<-key
node$rvalue<-val
} else
{
node$rkey<-node$lkey
node$rvalue<-node$lvalue
node$lkey<-key
node$lvalue<-val
}
} else
{
node$left<-tttnode(key, val)
}
return(node)
}
extttree
, then the tree is updated as shown in Figure 7.12(c):
Figure 7.12(c): Insert element 80 to extttree
The preceding step will be repeated to generate the tree. The generalized pseudocode for element insertion in a 2-3 tree is as follows:
ttinsert<-function(node=NULL, key, val){
if(check_empty(node)) return(tttnode(lkey=key, lvalue=val))
if(is.null(node$left)) node<-leaf_insert(node, key, val)
## Add element to internal nodes
if(key<node$lkey){
subtree = ttinsert(node$left, key, val)
if (identical(subtree, node$left))
{
return(node);
} else
{
assign("left", subtree, envir = node)
return(node)
}
} else if(ifelse(is.null(node$rkey), T, key<node$rkey)){
subtree = ttinsert(node$center, key, val)
if(identical(subtree, node$center))
{
return(node)
} else
{
assign("center", subtree, envir = node)
return(node)
}
} else
{
subtree = ttinsert(node$right, key, val)
if(identical(subtree, node$right)) {
return(node)
} else
{
assign("right", subtree, envir = node)
return(node)
}
}
}
The implementation involves recursively determining the location where the insertion has to be made. There are two major parts to the process of insertion:
key
and val
are keys and the values/records to be inserted to the tree respectively
Figure 7.12(d): Insert element 95 to extttree
Figure 7.12(e): Insert element 99 to extttree
search_keys<-function(node, key){
if (is.null(node)) return(NULL) # empty node
if (node$lkey== key) return(node$lvalue)
if(!is.null(node$rkey) & node$rkey==key) return(node$rvalue)
if(key<node$lkey) {
sort_keys(node$left, key)
} else if(is.null(node$rkey)){
sort_keys(node$center, key)
} else if(key<node$rkey) {
sort_keys(node$center, key)
} else
{
sort_keys(node$right, key)
}
}
Another important aspect is deletion. Deleting a key in 2-3 tree (pretty much in all tree-based data structures) requires deleting the key only in the leaf; thus, the easiest scenario for deletion is when the leaf only contains two keys. However, if deletion is required in an internal node, then deletion occurs at the leaf, and then the deletion effect is possibly propagated up the tree. The details of deletion in tree-based data structure will be covered in Chapter 8, Graphs, while discussing the generalization of B-trees.
The B-tree is the other most widely used data structure within databases. B-trees are a generalization of the 2-3 trees which address the data retrieval issue from storage devices, as discussed in the next section.