2-3 trees

A 2-3 tree is a type of tree-based indexing where each internal node in the tree has either two nodes with one key or three nodes with two keys thus, it is classified as a balanced tree. Also, all the nodes in a 2-3 tree are at the same level of tree height. An example of a two-node structure is shown in Figure 7.10:

2-3 trees

Figure 7.10: Example of a two-node tree structure

A two-node structure consists of one key and two children/subtrees. All the keys on the left side are smaller, and all the keys on the right subtree are bigger than the key. Similarly, three-node structure has two keys with three children/subtrees, as shown in Figure 7.11:

2-3 trees

Figure 7.11: Example of a three-node tree structure

In the three-node structure, the keys are arranged in a sorted order, the first key being the smallest. All keys on the left of the subtree are smaller than the first key. All the keys in the middle of the subtree are greater than the first key and smaller than the second key. All the keys on the right of the subtree are greater than the second key. The node structure for a 2-3 tree can be represented as follows:

tttnode <- function(lkey=NULL, lvalue=NULL, rkey=NULL, rvalue=NULL) { 
  node <- new.env(hash = FALSE, parent = emptyenv()) 
  node$lkey <- lkey              # left Node key 
  node$lvalue <- lvalue          # Node Value 
  node$rkey <- rkey              # right Node key 
  node$rvalue <- rvalue          # right Node Value 
  node$left <- NULL              # left children key 
  node$center <- NULL            # left children key 
  node$right <- NULL             # Right children key 
  class(node) <- "tttnode" 
  return(node) 
}

The node structure for a 2-3 tree consists of two keys and a value pair with three children. The insertion in 2-3 nodes can be performed using steps listed next:

  • If the tree is empty, create a new node otherwise find a leaf where the value belongs. For example, let's try to add 70 to a new tree:
    2-3 trees

The insertion, as shown in the preceding image, can be obtained by creating a node using tttnode, with the following line of code:

      extttree <- tttnode(70, 70)
  • If another value is to be inserted, it checks whether the root node has space for insertion, else finds the leaf node where the value is to be inserted. For example, adding a new value to extttree will check whether it's vacant, and add the element accordingly:

    2-3 trees

    Figure 7.12(b): Insert second value to extttree

  • The empty tree can be checked by evaluating both key values, and if there is empty space, then the value will be inserted in the leaf. Before insertion, the value may need to be sorted:
          # Function to check if node is empty 
          check_empty<-function(node){ 
            ifelse((is.null(node$lkey) & is.null(node$rkey)), T, F) 
          }
    

    The insertion script at the root is as follows:

          # Function to insert if the node has empty space 
          leaf_insert<-function(node, key, val){ 
            if(check_empty(node)) return(tttnode(lkey=key, lvalue=val)) 
            if(is.null(node$rkey)){ 
              if(key>node$lkey){ 
                node$rkey<-key 
                node$rvalue<-val 
              } else 
              { 
                node$rkey<-node$lkey 
                node$rvalue<-node$lvalue 
                node$lkey<-key 
                node$lvalue<-val 
              } 
            } else 
            { 
              node$left<-tttnode(key, val) 
            } 
            return(node) 
          }
    
  • If there are three elements in the node, then the median of the node is promoted. For example, if element 80 is added to extttree, then the tree is updated as shown in Figure 7.12(c):

    2-3 trees

    Figure 7.12(c): Insert element 80 to extttree

The preceding step will be repeated to generate the tree. The generalized pseudocode for element insertion in a 2-3 tree is as follows:

      ttinsert<-function(node=NULL, key, val){ 
        if(check_empty(node)) return(tttnode(lkey=key, lvalue=val)) 
        if(is.null(node$left)) node<-leaf_insert(node, key, val) 
        ## Add element to internal nodes 
        if(key<node$lkey){ 
          subtree = ttinsert(node$left, key, val) 
          if (identical(subtree, node$left)) 
          { 
            return(node); 
          } else 
          { 
            assign("left", subtree, envir = node) 
            return(node) 
          } 
        } else if(ifelse(is.null(node$rkey), T,      key<node$rkey)){ 
          subtree = ttinsert(node$center, key, val) 
          if(identical(subtree, node$center))  
          { 
            return(node) 
          } else 
          { 
            assign("center", subtree, envir = node) 
            return(node) 
          } 
        } else 
        { 
          subtree = ttinsert(node$right, key, val) 
          if(identical(subtree, node$right)) { 
            return(node) 
          } else 
          { 
            assign("right", subtree, envir = node) 
            return(node) 
          } 
        } 
      }

The implementation involves recursively determining the location where the insertion has to be made. There are two major parts to the process of insertion:

  • Inserting the key and value in the root node
  • Inserting the key and value in the internal node; the node is the current tree or subtree where insertion will be performed, and key and val are keys and the values/records to be inserted to the tree respectively
  • The aforementioned insertion process is repeated, and nodes are splitted and promoted as required. For example, let's try and add the values 95 and 99 in two steps. The tree will be updated as shown in Figure 7.12(d) and Figure 7.12(e):

    2-3 trees

    Figure 7.12(d): Insert element 95 to extttree

    2-3 trees

    Figure 7.12(e): Insert element 99 to extttree

  • Another critical aspect is searching for values using the key in a 2-3 tree. The search pseudocode within a 2-3 tree is shown as the following R function:
          search_keys<-function(node, key){ 
            if (is.null(node)) return(NULL) # empty node 
            if (node$lkey== key) return(node$lvalue) 
            if(!is.null(node$rkey) & node$rkey==key) return(node$rvalue) 
            if(key<node$lkey) { 
              sort_keys(node$left, key) 
            } else if(is.null(node$rkey)){ 
              sort_keys(node$center, key) 
            } else if(key<node$rkey) { 
              sort_keys(node$center, key) 
            } else 
            { 
              sort_keys(node$right, key) 
            } 
          }
    

Another important aspect is deletion. Deleting a key in 2-3 tree (pretty much in all tree-based data structures) requires deleting the key only in the leaf; thus, the easiest scenario for deletion is when the leaf only contains two keys. However, if deletion is required in an internal node, then deletion occurs at the leaf, and then the deletion effect is possibly propagated up the tree. The details of deletion in tree-based data structure will be covered in Chapter 8, Graphs, while discussing the generalization of B-trees.

The B-tree is the other most widely used data structure within databases. B-trees are a generalization of the 2-3 trees which address the data retrieval issue from storage devices, as discussed in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset