Chapter III.5. Graphs and Trees

Most data structures (such as arrays, dictionaries, and queues) store data in a linear format with one chunk of data neatly sandwiched in between exactly two other chunks of data. Linear data structures can be fine for just storing data, but what if you want a data structure that can model a real-life problem?

Picking the right data structure to model a real-life problem can greatly simplify programming. One advantage of a queue is that it closely mimics a line of people (or orders on a Web site) that need to be handled one at a time, starting with the oldest item. Using a queue data structure to handle incoming orders from a Web site makes logical sense but using a dictionary or even an array makes less sense because dictionaries require keys assigned to each data (which isn't needed) and arrays need to be resized constantly.

So if you have a problem that doesn't mimic a linear data structure, using a linear data structure can just make programming harder, much like trying to use a screwdriver to pound in a nail when you really need a hammer.

For example, suppose you're creating a chess game. You could use a collection data structure to represent each space on the chessboard, but a more intuitive data structure would just be a two-dimensional array.

Modeling a chessboard with a two-dimensional array works because both a two-dimensional array and a chessboard are a regular shaped grid. Now what if you need to model a real-life problem that doesn't neatly match a uniformly shaped grid?

Suppose you need to keep track of trucks traveling to different cities so you can route them to the shortest distance between two cities, as shown in Figure 5-1.

Modeling a map of different cities.

Figure III.5-1. Modeling a map of different cities.

You could model this problem as a two-dimensional array like this:

 

Los Angeles

San Diego

Las Vegas

San Francisco

Salt Lake City

Los Angeles

0

153

287

387

X

San Diego

153

0

325

509

X

Las Vegas

287

325

0

504

406

San Francisco

387

X

504

0

457

Salt Lake City

X

X

406

457

0

Although this two-dimensional array accurately models the map of different city distances, it's not easy to understand what this data represents. A better data structure would be a graph.

Understanding Graphs

A graph is typically created by using a linked list that can point to multiple nodes. As a result, a graph doesn't follow a linear structure but a more haphazard appearance, which makes it perfect for modeling non-linear data, such as the map of different cities and distances, as shown in Figure 5-2.

A graph data structure can model the mapping problem better than an array.

Figure III.5-2. A graph data structure can model the mapping problem better than an array.

The two parts of every graph are

  • Nodes (or vertices): Contain the actual data.

  • Connections (or edges): Represent some relationship between two or more nodes.

In the map example, nodes represent cities and connections represent distances.

Note

An entire branch of mathematics is dedicated to studying graphs (graph theory).

Graph data structures are used less for storing and retrieving data and more for understanding the relationship between data.

Types of graphs

Three types of graphs are shown in Figure 5-3:

  • Undirected graph: Connects data in different ways to show a relationship, such as modeling all the links connecting the pages of a Web site.

  • Directed graph: Adds arrows to show you the direction of a relationship between data. For example, a directed graph could model the flow of messages passed between computers in a network.

  • Weighted graph: Labels each link with a value or weight; each weight might measure distances, resistance, or cost between nodes. In the example of the graph in Figure 5-3, each weight represents a distance measured in miles. The higher the weight, the farther the distance.

The three types of graphs.

Figure III.5-3. The three types of graphs.

Warning

You can also combine weights with direction and create a directed, weighted graph.

Uses for graphs

Graphs are used to model a variety of real-world problems, such as finding the most efficient way to route e-mail through a computer network to finding the shortest way for airplanes to travel to different cities. Molecular biologists even use graphs to model the structure of molecules.

Designing a single path with the Seven Bridges of Königsberg

One of the first uses for graphs appeared in 1736 in a problem known as the Seven Bridges of Königsberg. The question was whether it was possible to walk across all seven bridges of Königsberg exactly once, as shown in Figure 5-4.

The Seven Bridges of Königsberg represented as a graph.

Figure III.5-4. The Seven Bridges of Königsberg represented as a graph.

A mathematician named Leonhard Euler used a graph to prove that this was impossible. In the seven bridges problem, one node has five bridges leading to it (5 degrees) while the other three nodes only have three bridges leading to it (3 degrees). Euler proved that the only way you could cross every bridge exactly once was if a graph had, at the most, two nodes with an odd number of bridges (degrees), and each odd-numbered degree node had to be the starting and ending point.

Although solving a problem of walking across a bridge exactly once might seem trivial, knowing how to solve this problem can help design truck or airplane routes as efficiently as possible.

Finding the shortest path through a graph

After using Euler's proof to design a graph that can be traversed in a single path, graph theory can now help you find the shortest path through that graph.

This problem, dubbed the Chinese Postman problem after the Chinese mathematician Mei-Ko Kuan, involves finding the shortest route a postman can travel to visit every node in a graph.

The answer to this problem can help airlines choose the shortest routes for each airplane route to visit multiple cities. The shorter the route, the less fuel needed and the lower the operating costs. Another use for the Chinese Postman problem is finding the most efficient way to route e-mail from one computer to another. In both cases, one node in the graph is the starting point and a different node is the ending point.

A related problem, dubbed the Traveling Salesman problem, takes this one step further by finding the shortest round-trip route through a graph where the same node is both the starting and ending point. In this case, the shortest route through a graph may not necessarily be the shortest round-trip route to return to the same starting point.

Both the Traveling Salesman and the Chinese Postman problem can get more complicated with a weighted or directed graph. A directed graph may restrict movement in one direction, such as traveling through one-way streets in a city, whereas a weighted graph can make one route longer than two shorter routes combined. Adding in directed and weighted graphs can alter the best solution.

Note

If you ever looked up directions on a map Web site such as MapQuest, you've used a graph to find the most efficient way from one location to another.

Connecting nodes in a graph

Another use for graphs involves topological graph theory. This problem is highlighted by the Three Cottage problem with three cottages needing to connect to the gas, water, and electricity companies, but their lines can't cross each other. (It's impossible, by the way.)

Connecting lines in a graph without crossing is a problem that circuit board designers face in the placement of chips. Another example of eliminating intersections involves transportation designs, such as the design of highways or railroad tracks.

Creating Trees

Graphs typically represent a chaotic arrangement of data with little or no structure. To give graphs some form of organization, computer scientists have created special graphs dubbed trees. Like a graph, a tree consists of nodes and edges, but unlike a graph, a tree organizes data in a hierarchy, as shown in Figure 5-5.

A tree arranges a graph in a hierarchy with a single node appearing at the top (or the root node) and additional nodes appearing connected underneath (or internal nodes). If a node has no additional nodes connected underneath, those nodes are leaf nodes, as shown in Figure 5-6.

A tree is a hierarchical graph.

Figure III.5-5. A tree is a hierarchical graph.

A tree consists of one root node and multiple leaf and internal nodes.

Figure III.5-6. A tree consists of one root node and multiple leaf and internal nodes.

Ordered trees

A tree can store information at random in its different nodes, which is dubbed an unordered tree. However, the tree is already in the form of a hierarchy, so it only makes sense to take advantage of this built-in structure and create an ordered tree.

An ordered tree provides a distinct beginning node (the root node) with additional nodes organized in a hierarchy. Such a hierarchy can store and show relationships of a corporate management team or the spread of a flu epidemic through different cities. As a result, ordered trees are a common data structure used to both model and organize data.

Tip

One common use for ordered trees involves storing data. Under each root node, you can have 26 internal nodes that each represent a single letter of the alphabet from A to Z. Under each of these letter nodes, you can have multiple nodes that contain the actual data, as shown in Figure 5-7.

A tree can organize names alphabetically by last name.

Figure III.5-7. A tree can organize names alphabetically by last name.

To save the name David Bally, the computer stores the name under the B node. To save the name John Burkins, the computer also stores this name under the B node. To determine whether to store this new name before or after any existing data, the computer examines the data and sorts it alphabetically. In this way, the tree data structure not only stores data, but sorts and organizes it as well.

If you had a long list of names stored in an array or a collection, finding a name would require searching through the entire array or collection. However, if that same list of names is stored in a tree, a name would be much simpler to find because you'd only need to look at the first letter of a person's last name to find where it might be stored.

So if you want to find the name John Bally, start at the B node and ignore any data stored under the other nodes. This makes searching and retrieving data much faster than other types of data structures, which is why trees are so commonly used in databases.

Binary trees

A variation of an ordered tree is a binary tree. Unlike an ordinary tree, every node in a binary tree has at the most two nodes connected underneath. To sort data, the left node contains values less than its parent node whereas the right node contains values greater than its parent node, as shown in Figure 5-8. By limiting each node to a maximum of two connected nodes, binary trees make searching and sorting data fast and efficient.

Binary trees store and sort data by value.

Figure III.5-8. Binary trees store and sort data by value.

For example, an ordinary tree allows each node to have multiple nodes underneath. As a result, the more data an ordinary tree stores, the more nodes that can appear directly underneath a single node, as shown in Figure 5-9.

An ordinary tree is more difficult to search than a binary tree.

Figure III.5-9. An ordinary tree is more difficult to search than a binary tree.

To find the number 11 in an ordered binary tree is simple. Start with the root (top) node and compare the number 11 to the root node value (10). Because the number you want to find is greater than the root node, you'd branch to the right. At this next node (12), the computer repeats the process and determines that 11 is less than 12, so it branches to the left node, which is where it finds the number 11.

Searching through a sorted binary tree is simple, especially when compared to an unordered tree. Because an unordered tree can scatter data anywhere, searching an unordered tree means methodically searching each node, one by one, until you find the data you want. In a large unordered tree, this search time can be slow and inefficient.

Warning

Ordered binary trees are one of the most common tree data structures used in many types of programs, such as databases.

B-trees

Another variation of a tree is a B-tree. The two main features of a B-tree are

  • All nodes can only have a fixed number of other nodes underneath it (such as three nodes compared to a binary tree, which has a maximum of two nodes).

  • All leaf nodes remain at the same level or depth of the tree, as shown in Figure 5-10.

In a B-tree, all leaf nodes appear at the same level.

Figure III.5-10. In a B-tree, all leaf nodes appear at the same level.

When you add or subtract data, the binary tree constantly adjusts to keep all leaf nodes at the same level. Keeping all leaf nodes at the same level ensures that searching for some data (stored farther down a tree) won't take a long time compared to searching for other data (stored closer to the root node of the tree).

A variation of a B-tree is a B+ tree. The main difference is that a B-tree can store data in all its nodes. Because some nodes appear closer to the root node than others, this makes searching for data closer to the root node fast but searching for data farther from the root node slower in comparison.

A B+ tree only stores data in its leaf nodes and connects all leaf nodes together as a linked list. Because all leaf nodes are the same distance from the root node, this makes searching for any type of data equal.

Note

Microsoft Windows and Linux are two operating systems that use B+ trees for keeping track of files on a disk.

Taking Action on Trees

Trees are flexible data structures because they organize data and allow fast retrieval of that data. Some of the different actions you can perform on trees include

  • Searching for a specific item

  • Adding a new node or a sub-tree

  • Deleting data or a sub-tree

Traversing a tree

When you store data in an array or a collection, that data is stored in a line so you can search the entire data structure by starting at one end and examining each data chunk one by one until you get to the end. However, trees are different because they offer multiple branches.

To search a tree, the computer must examine multiple nodes exactly once, which is known as traversing a tree. Four popular ways to search a tree, as shown in Figure 5-11, include

The four different ways to traverse a tree.

Figure III.5-11. The four different ways to traverse a tree.

  • Preorder

  • In-order

  • Postorder

  • Level order

Preorder traversal

Preorder traversal starts at the top of a tree (the root node) and then traverses the left nodes. When it reaches a leaf node, it backtracks and goes down the right nodes, as follows:

  1. Visit the root node.

  2. Traverse the left sub-tree in preorder.

  3. Traverse the right sub-tree in preorder.

In-order traversal

When traversing an ordered binary tree, the in-order traversal retrieves data in order by following these steps:

  1. Traverse the left sub-tree using in-order.

  2. Visit the root node.

  3. Traverse the right sub-tree by using in-order.

Postorder traversal

Postorder traversal traverses the left and right sub-trees first and then visits the root node, as follows:

  1. Traverse the left sub-tree in postorder.

  2. Traverse the right sub-tree in postorder.

  3. Visit the root node.

Level order traversal

Level order traversal starts at the top level of a tree and traverses the row of nodes on the same level from left to right. Then it drops to the next lower level and repeats the process all over again.

Warning

When writing actual code to traverse a tree, it's often easier to write a recursive subprogram that calls itself and traverses a successively smaller part of the tree (sub-tree) until it finally stops.

Adding new data

Adding data to a linear structure, like a collection or a stack, is straightforward because the data structure simply gets bigger. Adding data to a tree is slightly more complicated because you can add new data at the end of a tree (on one of its leaf nodes) or anywhere in the middle of the tree.

In an unordered tree, you can insert data anywhere, but in an ordered binary tree, inserting new data means sorting the data at the same time, as shown in Figure 5-12.

Warning

The order that you store data determines the position of that data in a tree because newly added data gets added based on the existing data's values.

Deleting data

Deleting data from a tree can cause special problems because after deleting data, you may need to rearrange any nodes that were underneath the deleted data, as shown in Figure 5-13.

Inserting new data in an ordered binary tree.

Figure III.5-12. Inserting new data in an ordered binary tree.

Warning

If you delete data and immediately add it back to the tree again, the tree looks different because reinserting the data sorts and organizes it based on the existing data. So if you delete the 12 node and immediately add it back again, it now appears as a left node under the 19 node.

Pruning and grafting sub-trees

A sub-tree is a smaller tree, such as part of an existing tree. Rather than delete a single node, it's possible to delete an entire sub-tree, which is known as pruning a tree, as shown in Figure 5-14.

After deleting data from a tree, you may need to rearrange the remaining data to keep the tree sorted.

Figure III.5-13. After deleting data from a tree, you may need to rearrange the remaining data to keep the tree sorted.

Warning

After pruning a sub-tree, there may be more than one way to rearrange the remaining nodes.

Adding or grafting a sub-tree to an existing tree can cause problems if the sub-tree data contains different values than the original tree. In that case, you can't just graft the sub-tree to the existing tree; you have to rearrange the data in both the tree and the grafted sub-tree to sort data once more, as shown in Figure 5-15.

Tree data structures are most useful for storing and sorting data, such as for databases. However, tree data structures are also handy for creating artificial intelligence in games, such as chess.

Pruning a tree removes two or more nodes from a tree.

Figure III.5-14. Pruning a tree removes two or more nodes from a tree.

The computer might use a tree data structure to represent all possible moves when the root node represents one possible move. Then each alternating level in the tree represents the possible human responses and the best possible computer responses, as shown in Figure 5-16.

By organizing possible moves in a tree, a computer can determine the best possible move to make that will give its human opponent the worst possible moves later. This strategy is min-max — the computer minimizes the human's best possible moves and maximizes its own best moves.

Despite having to create trees out of other data structures (such as linked lists) and being more complicated to create and manage than other data structures, trees are one of the most useful and flexible data structures available.

Grafting a sub-tree can require rearranging the entire modified tree.

Figure III.5-15. Grafting a sub-tree can require rearranging the entire modified tree.

A tree can help a computer plan its next move.

Figure III.5-16. A tree can help a computer plan its next move.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset