Binary Trees

Binary trees, sometimes called binary search trees (bst), are popular storage structures. You can search through trees much faster than through linked lists. A well-balanced tree can be searched as efficiently as a sorted array: O(n log2 n). This is because tree searching has the same properties as a binary search algorithm for an array—see the binsearch function in the section on arrays. However, the overhead per stored element is higher with trees than with a single linked list. A binary tree has references to (at most) two child nodes so, just like with double linked lists, two pointer fields need to be added to each stored element. As with linked lists, the elements of a tree are stored in memory separately, causing the same danger of memory fragmentation. Still, the advantages of binary trees have earned them a place in the standard toolset of every programmer. What binary trees look like and how they can be used was introduced in Chapter 10 in the section on heaps. This section looks at the positive and negative characteristics of binary trees.

The file 11Source01.cpp that can be found on the Web site compares the performance of balanced and unbalanced trees against that of other data structures. There is a section at the end of this chapter that presents the timing results of these tests.

Inserting

The function used for inserting elements into a binary tree in effect also takes care of sorting the tree. This is because the placement of each element in the tree has to adhere to the rules of the internal tree structure. The structure of a tree is only valid when each child on the left of its parent has a key value that is equal to or lower than the key value of the parent, and each child on the right of its parent has a key value that is larger than that of its parent. Sadly, it is still possible to create a very inefficient tree even when your insert function observes the structuring rules perfectly. The following example will demonstrate this.

Assume we have an empty tree in which we want to insert the following set of values: 50, 25, 80, 32, 81, 12, 63. The first element, 50, is added and becomes the root. The second element, 25, is lower than 50 and placed left of the root. The third element, 80, is higher than the root and placed on its right. This goes on until all elements have been added and a tree is created as shown in Figure 11.7.

Figure 11.7. A tree fed with random ordered data.


When we add the same elements to an empty tree but in sorted order, we get an entirely different picture. The first element, 12, is added as the root. The second element, 25, is higher than the root and is placed to its right. The third element, 32, is higher than the root and higher than the child of the root and is placed on the right of the child of the root. The fourth element, 50, is higher again than the last element inserted and is placed to its right. This goes on until all elements have been added and a treei is created as shown in Figure 11.8.

Figure 11.8. A tree fed with ordered data.


The storage structures of Figure 11.7 and Figure 11.8 have a valid tree structure. However, the structure in Figure 11.8 is totally unbalanced and is actually nothing more than a single linked list with the added overhead of a NULL pointer for each element. The efficiency of the unbalanced tree is reduced to that of a linked list also. Instead of O(n log2 n), tree access is now an O(n) operation. This means that if searching or adding an element to a balanced binary tree with 50,000 elements takes 50 ms, for instance, that same operation can take up to two minutes for an unbalanced tree. So, the better a tree is balanced, the more efficient it will be. So the order in which elements are fed to a binary tree actually influences its efficiency. It is of course possible to write a function that can be used to periodically attempt to balance the tree, but executing such a function will be very time-consuming and will most likely result in O(n*n) behavior. (See Chapter 5, "Measuring Time and Complexity.") A later section, "Red/Black Trees," shows an example of a tree modified for balanced inserting and deleting.

Deleting

Deleting elements from a binary tree can be more involved than inserting elements. When the element to delete is a leaf, it can be removed without much difficulty. When the element to delete has only one child, that child can simply replace the deleted element. However, the element to delete can also have two children, each of which is a possible root of a subtree. These subtrees must be reintegrated into the remaining tree after the parent element is deleted. The most straightforward way of doing this is by simply feeding the elements of the subtrees back into the tree using the insert function of the tree. This can be time-consuming, however, and needs to be done with care because feeding elements to a tree in a sorted order can result in a very slow tree structure. Another way to reintegrate subtrees that are left over after a delete action is to find the in-order successor of the deleted element and have it take the place of the deleted element. An in-order successor is explained in the section on traversing trees. Figure 11.9 demonstrates this deletion strategy.

Figure 11.9. Deleting an element from a binary tree.


In Figure 11.9, the element with key value 50 is deleted. The delete action creates two subtrees: one headed by the key value 25 and another headed by the key value 80. The in-order successor of key value 50 is key value 63. It can by found by traversing the subtree of the right child of the deleted element. As you will recall, in-order traversal first moves down a (sub)tree by passing along the left child of each node it encounters. When no further nodes can be found to the left, the in-order successor is reached. This in-order successor is moved to replace the deleted element, creating a valid tree structure once more. Yet again, as with inserting elements into a tree, the balance of the tree can be upset, causing slower subsequent search results. In order to keep trees nicely balanced, a modification to the standard structure is needed. A later section, "Red/Black Trees," shows an example of a tree modified for balanced inserting and deleting.

Searching

The speed with which your algorithm can search a tree is very much dependent on the shape of the tree. This is because the height of the tree (number of nodes between root and leaf) determines the number of search steps your algorithm may have to take before finding an item or deciding that an item is not present in the tree. A balanced binary tree will have left and right branches for all nodes that do not contain leafs. This means that each step of your searching algorithm will choose a branch based on comparison of the item key with the node key. Listing 11.5 gives an example of a simple search algorithm for finding a node in a binary tree based on the value of its key.

Code Listing 11.5. Searching for a Node in a Binary Tree
TreeNode *Tree::GetNode(TreeNode *current, char *key)
{

    if (current == NULL)
        return(NULL);

     int result = strcmp(current->key, key);

     if (result == 0)
         return(current);

    if (result < 0)
        return(GetNode(current->left, id));
    else
        return(GetNode(current->right,  id));
}

As with a binary search, in the best case each test will split the data set to search in half. Searching a balanced binary tree is then an O(log2 n) operation. However, not all trees are balanced. A worst-case setup of an unbalanced tree is a tree in which each node has only one child, making it impossible to choose between branches during a search. This type of tree is nothing more than a linked list, and searching it is therefore an O(n) operation. Generally speaking, building a tree with randomly received data will produce a more or less balanced tree; however, when elements are added to a tree in sorted order, a linked list will be the result. This implies that in order to obtain good searching times (O(log2 n)), more work needs to be done during the inserting (and also sorting and deleting) operations on trees in order to keep the structure balanced. Sadly, the standard tree structure does not lend itself well to easy balancing during these operations. Because the tree structure itself can do so much for your searching times, it is a valid effort to think of updates to ensure better performance for these kinds of operations. Later sections represent different kinds of trees that take this into account.

Sorting

Doing external sorting on a binary tree structure should not be necessary because tree properties exist only due to the relations between the elements within the tree. The tree setup is therefore already sorted, and the sorted order is maintained during addition (and deletion) of elements. An unsorted tree would be a collection of elements with seemingly random connections. Finding an element in that kind of tree would mean traversing all elements, resulting in an O(n) operation. This in fact implies that you are dealing with linked list properties with the additional overhead of useless links.

Traversing Fundamentals

There are three different ways of traversing binary trees: Pre-order, In-order, and Post-order.

Pre-order

Pre-order traverse means visit the root, traverse left subtree of the root, traverse right subtree of the root. Listing 11.6 shows an implementation of a pre-order function that traverses a tree to print information from each node.

Code Listing 11.6. Pre-order Traverse Function for a Binary Tree
void Tree::PreorderTraverse(TreeNode *current)
{
    if(current != NULL)
    {
        current->info.PrintInfo();
        PreorderTraverse(current->left);
        PreorderTraverse(current->right);
    }
}

The order in which the keys of the balanced tree at the beginning of this section would appear through the traverse function of Listing 11.6 is 50, 25, 12, 32, 80, 63, 81.

In-order

In-order traverse means traverse left subtree of the root, visit the root, traverse right subtree of the root. Listing 11.7 shows an implementation of an in-order function that traverses a tree to print information from each node.

Code Listing 11.7. In-order Traverse Function for a Binary Tree
void Tree::InorderTraverse(TreeNode *current)
{
    if(current != NULL)
    {
        InorderTraverse(current->left);
        current->info.PrintInfo();
        InorderTraverse(current->right);
    }
}

The order in which the keys of the balanced tree at the beginning of this section would appear through the traverse function of Listing 11.7 is 12, 25, 32, 50, 63, 80, 81.

Post-order

Post-order traverse means traverse left subtree of the root, traverse right subtree of the root, visit the root. Listing 11.8 shows an implementation of a post-order function that traverses a tree to print information from each node.

Code Listing 11.8. Post-order Traverse Function for a Binary Tree
void Tree::PostorderTraverse(TreeNode *current)
{
    if(current != NULL)
    {
        PostorderTraverse(current->left);
        PostorderTraverse(current->right);
        current->info.PrintInfo();
    }
}

The order in which the keys of the balanced tree at the beginning of this section would appear through the traverse function of Listing 11.8 is 12, 22, 25, 63, 81, 80, 50.

Traversal Continued

Looking at the order in which the example keys pop from the different traverse methods, you can see that in-order traversal comes across all tree elements in sorted order. However, recursion is needed to visit all elements, which means that the amount of stack used (see Chapter 8) increases with the depth of the tree. Not only can this cause memory problems when traversing large bases of data, it is also performance-inefficient as function call after function call is made. Luckily, it is possible to rewrite the traversal routine into an iterative function; however, changes to the standard tree structure are needed. The changes consist of replacing the null pointers of the leaf nodes with something useful. For instance, the left child pointer of a leaf node is made to point at the node's in-order predecessor, and the right child pointer is made to point at the node's in-order successor. Figure 11.10 shows what this would make the balanced tree from the beginning of this section look like.

The added references are depicted in Figure 11.10 by curved arrows. Note that two NULL pointers still remain. These denote the first and last elements of the tree. This kind of binary tree is called a threaded tree. By following the links of a threaded tree, a traversal function can access the tree as if it were a linked list, while other operations (such as a search) still retain the positive characteristics of the tree structure. For each link to a child, however, some extra administration is needed. A traversal routine has to know whether a child link is a normal link or a link to a threaded successor/predecessor. Also, creating the threaded links slows down the insert and delete operations, so there is some overhead to consider when using threaded trees. Listing 11.9 shows what an in-order traversal function for a threaded binary tree can look like.

Figure 11.10. A threaded binary tree.


Code Listing 11.9. In-order Traverse Function for a Threaded Binary Tree
void InorderTraverse(TreeNode *current)
{
    while (current != NULL)
    {
        if (current->ltype == 1 && current->left != NULL)
        {
            // Normal link to left child.
            current = current->left;
        }
        else
        {
            // Nothing on the left so process current node.
            current->info.PrintInfo();

            if (current->rtype == 1)
            {
                // Normal link to right child or last element.
                current = current->right;
            }
            else
            {
                // Right child threads to successor.
                // Keep following threaded links till right is normal.
                do
                {
                    current = current->right;

                    current->info.PrintInfo();
                }
                while (current->rtype == 0);
                current = current->right;
            }
        }
    }
}

In addition to the member variables left and right, the nodes of the threaded tree in Listing 11.9 have two new member variables ltype and rtype. These variables denote the type of connection pointed to by left and right. When left points to a threaded link, ltype will be 0; when left points to a normal child on its left, ltype will be 1. The same is true for rtype and right.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset