Chapter 3
The directed graph is another term from graph theory. A directed graph is a set of nodes and a set of directed edges. Each directed edge originates at a source node and terminates at a target node (which may be the same as the source node). The nodes of a directed graph can have any number of edges. Directed graphs arise for applications with important topology or connectivity. For example, a directed graph is a natural representation for airline flights between airports.
Figure 3.1 shows two examples of directed graphs.
There are six templates for directed graphs.
There appears to be no directed graph counterpart to the overlapping trees template.
In the simple directed graph template (Figure 3.2) all nodes have the same kind of data. Figure 3.2 corresponds to the simple tree template and relaxes the multiplicity constraints for the parent and root nodes. A directed graph may have multiple roots and each node may have multiple parents.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. (Note: in a directed graph all the nodes do not have to be connected.) You need not include DG in a use of the template. A Node is an entity type whose records are organized as a directed graph. The distinction between parent and child causes the sense of direction that effects directed edges. There can be at most one coupling between a pair of nodes.
Figure 3.2 adds the constraint that each node has a parent except for root nodes. (The template itself is more permissive and lacks this constraint.) In general, a directed graph may have cycles but this template disallows them. (A cycle starts at a node and after traversing a series of edges reaches the starting node.) Since this template is based on a tree template, it doesn't make sense to permit cycles.
Each node may have a name. As Figure 3.3 shows, node names can be globally unique (left template) or unique within a context (right template).
Figure 3.4 restates Figure 3.3 with the IDEF1X notation. The following are foreign keys: dgID references DG, childID references Node, parentID references Node, and rootID references Node. For brevity, Figure 3.4 omits DG data.
Figure 3.4b defines parentID + nodeName as the Node_Node primary key. (This is an arbitrary choice and parentID + childID could be the primary key instead.) Node_Node defines node names for all nodes except for the roots. The root nodes obtain their name from the combination of dgID + nodeName. Once again the choice of primary key is arbitrary for DG_Node and could have been the alternate key instead.
Figure 3.5 and Figure 3.6 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.7 shows simple directed graph tables populated with data. The ID values are arbitrary, but internally consistent. In accordance with the template, there are no explicit edges; edges are represented via the coupling between nodes. The populated tables use different data than Figure 3.1 because the premise of the template is a tree that is generalized to having multiple parents.
Simple directed graphs occasionally arise.
Figure 3.8 revisits the management hierarchy model of Figure 2.16. Some organizations do not limit their reporting structure to a strict hierarchy and can have subordinates reporting to more than one manager.
Figure 3.9 shows the UML template for structured directed graphs when there is a need to differentiate leaf nodes from branch nodes. This template is based on the structured tree and relaxes the multiplicity constraints for the parent and root nodes. A directed graph may have multiple roots and each node may have multiple parents.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. (Note: in a directed graph all the nodes do not have to be connected.) You need not include DG in a use of the template. A Node is either a leaf node or a branch node. A Leaf node (such as M, X, and N in Figure 3.7a) terminates the recursion. A Branch node (such as J, K, and L in Figure 3.7a) can have child nodes each of which in turn can be a leaf node or a further branch node. The distinction between parent and child causes the sense of direction that effects directed edges. Note that with this template there can be at most one coupling between a pair of nodes.
Figure 3.9 adds the constraint that each node has a parent except for root nodes. (The template itself is more permissive and lacks this constraint.) In general, a directed graph may have cycles but this template disallows them. (A cycle starts at a node and after traversing a series of edges reaches the starting node.) Since this template is based on a tree template, it doesn't make sense to permit cycles.
Each node may have a name. As Figure 3.10 shows, node names can be globally unique (left template) or unique within a context (right template).
Figure 3.11 restates Figure 3.10 with the IDEF1X notation. The following are foreign keys: dgID references DG, rootID references Node, parentID references Branch, childID references Node, leafID references Node, and branchID references Node. The generalization is exhaustive—every Node record must have a corresponding Leaf record or Branch record. The nodeDiscrim field is an enumeration with values “Leaf” and “Branch” indicating the subtype record.
Figure 3.11b defines parentID + nodeName as the Node_Branch primary key, but parentID + childID could be the primary key instead. Node_Branch defines node names for all nodes except for the roots. The root nodes obtain their name from the combination of dgID + nodeName. Once again the choice of primary key is arbitrary for DG_Node and could have been the alternate key instead.
Figure 3.12 and Figure 3.13 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.14 shows structured directed graph tables populated with data. The ID values are arbitrary, but internally consistent. In accordance with the template, there are no explicit edges; edges are represented via the coupling between nodes. The populated tables use different data than Figure 3.1 because the premise of the template is a tree that is generalized to having multiple parents.
Directed graphs often arise in applications and sometimes the structured directed graph is a good choice.
Figure 3.15 revisits the file directory model of Figure 2.26. With a UNIX™ file directory a hierarchy no longer suffices because a file can belong to multiple directories via symbolic links. A file may have a different name in each directory where it is referenced.
In Figure 3.15 a File may be a DataFile or a DirectoryFile. Directories contain multiple files, some or all of which may be subdirectories. The combination of a DirectoryFile and a fileName yields a specific File—file names are unique within the context of a directory. All Files belong to one or more directories except for the root File. Directories can be nested to an arbitrary depth, with DataFiles and empty DirectoryFiles terminating the recursion. Files require an acyclic graph, so the model has an added constraint. (The term “acyclic” means that you cannot start with a file and traverse some sequence of files and reach the starting file.)
Web email services (such as Yahoo, Google, and Hotmail) let users define lists that group together email addresses. Although most services don't support it, there is no reason why a list could not contain lesser lists. In Figure 3.16 the friends list contains the family list, the colleagues list, and additional email addresses. Bill P and Paul B belong to both the friends and colleagues lists. Figure 3.17 models nested email address lists. For this example, names are globally unique and do not vary by context.
A third template for directed graphs has explicit edges (Figure 3.18). Nodes and the edges that connect them can both bear information.
Figure 3.2 and Figure 3.9 describe directed graphs with at most one edge between nodes; edges are implicitly represented by parent-child relationships. Figure 3.18 is more powerful and can describe any directed graph.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. (Note: in a directed graph all nodes do not have to be connected.) You need not show DG in a use of the template. A Node is an entity type whose records are organized as a directed graph. An Edge is a coupling from a source Node to a sink Node. Coupled nodes and edges must belong to the same directed graph.
Note that this template permits cycles. In practice, some applications of the template permit cycles and others do not.
With this template the names of nodes and edges are globally unique. There is no context to provide an alternative approach to naming.
Figure 3.19 restates Figure 3.18 with the IDEF1X notation. The following are foreign keys: dgID references DG, sourceNodeID references Node, and sinkNodeID references Node.
Figure 3.20, Figure 3.21, and Figure 3.22 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.23 and Figure 3.24 show node and edge directed graph tables populated with data. The values of the IDs are arbitrary, but internally consistent.
The node and edge directed graph is the most common representation. Figure 3.25 shows an example for published flights.
Airlines operate flights between airports. A PublishedFlight refers to the published description of air travel between two airports. The frequency indicates the days of the week for which the PublishedFlight applies. A PublishedFlight consists of a sequence of PublishedFlightLegs that describe the travel from airport to airport.
Figure 3.26 shows an excerpt of a model for supply chain tracing for food manufacture. The application traces foodstuffs (MaterialLots) as they proceed from the farm to manufacturers, distributors, and eventually the marketplace (various SupplyChainStages). Using the node and edge template, intervening MaterialLots connect a network of SupplyChainStages.
A SupplyChainStage may have any number of MaterialLots as input and any number as output. A MaterialLot may enter and exit at most one SupplyChainStage. Although the model does not enforce it (application code must enforce it), the entering and exiting MaterialLot for a SupplyChainStage must be different.
Figure 3.27 elaborates the node and edge template, promoting the connection between nodes and edges to an entity type. Figure 3.27, as stated, does not permit unconnected Nodes. You could add a relationship between DG and Node if unconnected Nodes were important.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. You need not show DG in a use of the template. A Node is an entity type whose records are organized as a directed graph. An Edge is a coupling from a source Node to a sink Node. A Connection is the linking between a Node and an Edge. Each Connection may be a source or a sink.
With this template the names of nodes and edges are globally unique. There is no context to provide an alternative approach to naming.
Figure 3.27 lacks the constraint that nodes and edges may only have connections for one directed graph. The template also lacks the constraint that an edge must have one source node and one sink node.
Figure 3.28 restates Figure 3.27 with the IDEF1X notation. The following are foreign keys: dgID references DG, nodeID references Node, and edgeID references Edge.
Figure 3.29, Figure 3.30, and Figure 3.31 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.32 and Figure 3.33 show connection directed graph tables populated with data. The values of the IDs are arbitrary, but internally consistent. For brevity, the Connection tables omit the dgID column.
The connection template is sometimes helpful for representing directed graphs as Figure 3.34 illustrates. A manufacturing plant can have a variety of equipment that is connected by piping. Nozzles connect piping to equipment and have a direction of normal fluid flow.
There are two templates for directed graphs that can change over time. The first (this section) is based on the simple directed graph. The second (next section) is based on the node and edge directed graph. It is also possible to extend the connection template for variations over time, but I have never seen it used in practice.
The template for simple directed graphs that change over time (Figure 3.35) is similar to the tree template in Section 2.5. Figure 3.35 separates an entity from its directed graph position (node) because the timeline for an entity can differ from that of its involvement in a graph.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. (Note: in a directed graph all the nodes do not have to be connected.) You need not show DG in a use of the template. A Node is a position within a directed graph. The distinction between parent and child causes the sense of direction that effects directed edges.
Figure 3.35 adds the constraint that each node has at least one parent except for root nodes. (The template itself is more permissive and lacks this constraint.) In general, a directed graph may have cycles but this template disallows them. (A cycle starts at a node and after traversing a series of edges reaches the starting node.) Since this template is based on a tree template, it doesn't make sense to permit cycles.
This template is already complex, so it is best to handle node names in a simple manner. Each node has a globally unique name and there is no provision to vary node name by context. The effective and expiration dates permit Node and Entity data to vary over time.
Figure 3.36 restates Figure 3.35 with the IDEF1X notation. The following are foreign keys: dgID references DG, nodeID references Node, entityID references Entity, parentNodeID references Node, and childNodeID references Node.
In Figure 3.36 the node name can change over time (three part candidate key—nodeName + effectiveDate + expirationDate), but the node name could also be invariant over time (candidate key of nodeName alone). Note that the handling of time reflects a limitation of relational DBMSs. It would be better to use time intervals but most relational DBMSs only support points in time.
Figure 3.37 and Figure 3.38 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.39 shows tables populated with data for the simple directed graph changing over time. The values of the IDs are arbitrary, but internally consistent. A null effectiveDate means that a Node applies indefinitely from the past. A null expirationDate means that a Node applies indefinitely into the future. In accordance with the template, there are no explicit edges and edges are represented via the coupling between nodes.
As in Section 3.1, the template cannot express directed graphs with multiple edges between a pair of nodes. For brevity, the Node table omits the dgID column. This simple example does not populate Binding and Entity.
Figure 3.40 revisits the example from Section 2.5. The management structure can be a matrix, instead of a simple hierarchy; then a person can report to more than one manager at the same time. For example, a chief information officer may report to both the chief operating officer and chief financial officer. The model can store the current reporting structure, reporting structures of the past, and planned structures of the future. The structure changes as persons join and leave a company. The structure also changes due to promotions and demotions and management changes.
This template adds time intervals to the Node and Edge entity types from Section 3.3.
Figure 3.41 shows the template for node and edge directed graphs that change over time. Unlike the simple directed graph changing over time, Figure 3.41 does not separate an entity from its position in a directed graph. It is not clear how to make such a distinction with nodes and edges as peer concepts; I have not needed such a distinction in practice.
A DG (directed graph) is a set of nodes and a set of directed edges that connect nodes. (Note: in a directed graph all the nodes need not be connected.) You need not show DG in a use of the template. A Node is an entity type whose records are organized as a directed graph. An Edge is a coupling from a source Node to a sink Node.
With this template the names of nodes and edges are globally unique. There is no context to provide an alternative approach to naming.
Figure 3.42 restates Figure 3.41 with the IDEF1X notation. The following are foreign keys: dgID references DG, sourceNodeID references Node, and sinkNodeID references Node. Similar to Section 3.5, we allow node and edge names to change over time.
Figure 3.43, Figure 3.44, and Figure 3.45 show SQL queries for common traversals of the template. The colon prefix denotes variable values that must be provided for each query.
Figure 3.46 shows tables populated with data for the node and edge directed graph changing over time. The values of the IDs are arbitrary, but internally consistent. A null effectiveDate means that a Node applies indefinitely from the past. A null expirationDate means that a Node applies indefinitely into the future. For brevity, the Node and Edge tables omit the dgID column.
This template is often the best way to handle a directed graph that changes over time. Figure 3.47 takes the airline flight model from Figure 3.25 and extends it for time. The revised model can store the scheduled flights of the past as well as those planned for the future. The example adds time variance to the edges (PublishedFlight + PublishedFlightLeg) but the nodes (Airports) can also vary by time if that is desirable.
Figure 3.48 shows a simple model for CurrencyConversion. Examples of Currency include U.S. dollars, Euros, and Japanese Yen. There is an exchangeRate for a source Currency to a target Currency in effect for some time interval.
Directed graphs occur in many application models and are often a critical issue for representation. There are six templates for directed graphs with different trade-offs.
Table 3.1 summarizes the directed graph templates.
Summary of the Directed Graph Templates
Template name |
Synopsis |
UML template |
Use when |
Frequency |
---|---|---|---|---|
Simple DG |
Treats all nodes the same. |
Edges are unimportant; nodes have the same kind of data. The DG is acyclic. |
Occasional |
|
Structured DG |
Differentiates leaf nodes from branch nodes. |
Edges are unimportant; branch nodes and leaf nodes have different data. The DG is acyclic. |
Occasional |
|
Node and edge DG |
Treats nodes and edges as peers. |
Nodes and edges can both have data; there can be multiple edges between a pair of nodes. |
Common |
|
Connection DG |
Promotes the connection between a node and an edge to an entity type. |
There is data for the connection itself as well as for nodes and edges. |
Occasional |
|
Simple DG changing over time |
Stores multiple variants of a DG. Extract a particular DG by specifying a time. |
A DG changes over time; edges are unimportant. The DG is acyclic. |
Seldom |
|
Node and edge DG changing over time |
Stores multiple variants of a DG. Extract a particular DG by specifying a time. |
A DG changes over time; edges are important. |
Occasional |
Note: This table can help you choose among the directed graph templates.
Page 89 of [Hay-1996] has an example of projects that involve the node and edge directed graph.
[Hay-1996] David C. Hay. Data Model Patterns: Conventions of Thought. New York, New York: Dorsett House, 1996.