Profiling data structures

The algorithm scalability choices also often manifest themselves in the choice of data structures. This table gives the time and space complexity of common data structures and their operations:

Data structure

Time complexity

Space complexity

Average

Worst

Worst case

Search

Insert

Delete

Search

Insert

Delete

Array

O(n)

O(n)

O(n)

O(n)

O(n)

O(n)

O(n)

Linked list

O(n)

O(1)

O(1)

O(n)

O(1)

O(1)

O(n)

Skip list

O(logn)

O(logn)

O(logn)

O(n)

O(n)

O(n)

O(nlogn)

Hash table

O(1)

O(1)

O(1)

O(n)

O(n)

O(n)

O(n)

Binary search tree

O(logn)

O(logn)

O(logn)

O(n)

O(n)

O(n)

O(n)

Red black tree

O(logn)

O(logn)

O(logn)

O(logn)

O(logn)

O(logn)

O(n)

 

Just as an example, to clarify the worst-case scenario performance, consider inserting the following numbers:

3, 4, 5, 6, 7, 8, 9

For an empty Binary Search Tree (BST) and a Red-Black Tree.

In the case of a BST, this is the worst case, and it would degenerate to a linked list as shown:

This is because there is no re-balancing in the plain BST. However, for a Red-Black Tree, there are periodic rotations to keep the invariant.

Red-Black Tree is a self-balancing BST, where the following invariants have to be maintained at every stage at every node:

  • Every node has a color; either red or black.
  • The root of tree is always black.
  • There are no two adjacent red nodes (a red node cannot have a red parent or a red child).
  • Every path from the root to the leaves has same number of black nodes.

At every insertion, the initial procedure for insertion is the same as the BST, but if the invariants change, there is rotation so that the self-balancing occurs. For example, for the same insertion, the Red-Black Tree looks like this:

To summarize, based on what operation you want to do, how often, with a data structure, you can choose the right one for a job.

Another aspect to consider in data structure is how space allocation plays out with an increase in data. For example, arrays are fixed length, and thus the data structure scalability is limited to the size at the time of allocation. In contrast, linked lists need no upfront total capacity allocation, but they do not offer the O(1) access time characteristics of arrays. That said, there are some hybrid data structures such as array lists that offer the best of both worlds.

Also, for a large amount of data, it is useful to think about how efficiently we can store the information. For example, if we have to store a Boolean flag for all users on a website, then there are two options:

  • Boolean array: one byte per flag/user
  • Bitset: a bit for each user/flag

The first option is slightly easier to code, but 50 M users will need 47 MB for the first option verses about 6 MB for the second. If there are multiple such flags for different use cases, you can imagine that bit sets will allow us to store more data in RAM, leading to better performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset