Summary

This section summarizes the chapter by discussing the test results of the files 11Source01.cpp and 11Source02.cpp. Note that timing values may differ somewhat between systems but the relationships between values in a table should be fairly constant.

The first test done by 11Source01.cpp is placing initial data into the various data structures. Table 11.3 shows the results of 11Source01.cpp for placing initial data into various structures.

Table 11.3. Timing Results Storing Initial Data in Storage Structures Filling Data Structures
Array List Hash UnbalTree BalTree
60 100 60 102710 60

Arrays are the fastest structures here. Note that in the 11Source01.cpp file the array is treated as a non-dynamic structure; that is, no reallocation of the structure is done to accommodate new elements. Memory is allocated once and only field values are set during the filling of the array. This stands in contrast to the other structures where a new element must be created each time data is added to the structure.

The next test done by the 11Source01.cpp file is searching for an element with a certain key value. Table 11.4 shows the results of 11Source01.cpp for searching for elements in various structures.

Table 11.4. Timing Results Searching Elements in Storage Structures Looking Up the First and Last 250 Entries
Array1 Array2 Array3 List Hash UnbalTree BalTree
0 0 540 3630 0 3570 0

Note that three different timing values are included for arrays. Array1 reflects the time needed to find a specific element when its key can be used directly to access the element. Array2 reflects the time needed to find a specific element from a sorted array using a binary search algorithm. Array3 reflects the time needed to find a specific element in an unsorted array. This means simply traversing the array from the beginning until the correct key value is found or the end of the array is reached. You may find it interesting to play around a bit with the number of buckets used by the hash table. You can do this by changing the value of the HASHTABLESIZE definition. You should find that when the number of buckets decreases, and thus the list size per bucket increases, performance of the hash table becomes less and less efficient.

The first test done by 11Source02.cpp is timing the filling of a normal tree against a Red/Black tree:

Tree RBtree
220 270

We see that the filling of the Red/Black tree is slightly slower. This has everything to do with the additional administration we have to do and the fact that filling the normal tree is done in such a way that it always results in a perfect tree.

The next test done by 11Source02.cpp is looking up and checking all entries:

Tree RBTree
110 110

As expected, the lookup of an element is as fast in a Red/Black tree as in a normal (balanced tree). But since the Red/Black tree is guaranteed to always be balanced, good lookup times are therefore guaranteed. This cannot be said of a normal tree, where lookup times increase when the shape of the tree becomes more and more unbalanced!

The last test done by 11Source02.cpp is removing all entries one by one:

Tree RBtree
49870 110

Deleting nodes shows a tremendous time difference between a normal tree and a Red/Black tree. The reason for this is that the normal tree doesn't have a rebalancing mechanism and entire subtrees of nodes that are removed need to be reinserted, which is very time-consuming. Because the Red/Black tree does have a delete and rebalance mechanism, no such problems occur.

In conclusion, we can say the following about the data storage structures discussed in this chapter:

Arrays

An array is the best structure for static data because it has no overhead and there is no danger of fragmentation within the bounds of the array. Dynamic data can be more complicated as data needs to be copied. This can also mean that a larger footprint is needed.

Lists

A list is a good structure for storing small data sets with a highly dynamic nature. It has more overhead per element than the array, and fragmentation is a danger. However, dynamically adding and deleting elements is fast and does not require extra footprint beyond the memory needed for storing elements. Searching for an element can be quite time-consuming, depending on the type of list (double- or single-linked).

Hash Tables

A hash table is a good structure for larger sets of data for which a unique or distributing key function (hash function) can be designed. The hash table implementation can make use of as many or as few other data structures as necessary. Depending on this, the overhead per element can be as small as that of the array or as large as other structures combined.

Binary Trees

A binary tree is a good structure for larger amounts of highly dynamic data, as long as the tree structure can remain relatively balanced. This means data must be added and deleted in an unsorted order. The binary tree has more overhead per element than the list and becomes as inefficient as the list when it is unbalanced.

Red/Black Trees

A Red/Black tree is a good structure for larger amounts of highly dynamic data. The Red/Black tree has more overhead per element than a normal binary tree but stays balanced independent of the order in which elements are added and deleted. Danger for memory fragmentation is as high as with the other structures—excluding the array—and inserting and deleting elements will be slower than with a normal binary tree. For larger amounts of data, this time is won back in faster traversal (because good balance is guaranteed) and searching.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset