Graph modeling techniques

Graph databases including Neo4j are versatile pieces of software that can be used to model and store almost any form of data including ones that would be traditionally stored in RDBMS or document databases. Neo4j in particular is designed to have capabilities as a high-performance store for day-to-day transactional data as well as being usable for some level of analytics. Almost all domains including social, medical, and finance bring up problems that can easily be handled by modeling data in the form of graphs.

Aggregation in graphs

Aggregation is the process in which we can model trees or any other arbitrary graph structures with the help of denormalization into a single record or document entity.

  • The maximum efficiency in this technique is achieved when the tree to be aggregated is to be accessed in a single read (for example, a complete hierarchy of comments of a post is to be read when the page with the post is loaded)
  • Random accesses to the entries or searching on them can cause problems
  • Aggregated nodes can lead to inefficient updates in contrast with independent nodes
    Aggregation in graphs

    Aggregation of entities in a blog post tree

Graphs for adjacency lists

The simplest method of graph modeling is adjacency lists where every node can be modeled in the form of isolated records containing arrays with immediate descendants or ancestors. It facilitates the searching of nodes with the help of the identifiers and keys of their parents or ancestors and also graph traversal by pursuing hops for each query. This technique is, however, usually inefficient for retrieving complete trees for any given node and for depth- or breadth-based traversals.

Materialized paths

Traversal of tree-like hierarchical structures can sometimes lead to recursive traversals. These can be avoided with the help of materialized paths that are considered as a form of denormalization technique. We make the identifying keys of the node's parents and children as attributes or properties of the node. In this way, we minimize traversals by direct reference to the predecessors and descendants.

Materialized paths

Since the technique allows the conversion of graph-like structures into flat documents, we can use it for full text-based searches. In the previous data scenario, the product list or even the subcategories can be retrieved using the category name in the query.

You can store materialized paths in the form of an ID set, or you can concatenate the IDs into a single string. Storing as a string allows us to make use of regular expressions to search the nodes for complete or partial criteria. This is shown in the following diagram (the node is included in the path):

Materialized paths

Modeling with nested sets

We can also use model-graph-based structures with the help of nested sets. Although it is used consistently with relational database systems, it is also applicable to NoSQL data stores. In this technique, we store the leaf nodes in the tree in the form of an array and then map the intermediate nodes to a range of child nodes using the initial and final indexes. This is illustrated in the following diagram:

Modeling with nested sets

In due course, for data that is not modified, this structure will prove to be quite efficient since it takes up comparatively small memory and it fetches all the leaf nodes without traversals. On frequently changing data, it is not as effective since insertions and updation lead to extensive index updates and therefore is a costly affair.

Flattening with ordered field names

The operation of search engines is based on flattened documents of fields and values. In datasets for such applications, the goal of modeling is to map existing entities to plain documents or unified nodes that can be challenging when the structure of the graph is complex. We can combine multiple related nodes or relationships into single entities based on their use. For example, you can combine nodes. This technique is not really scalable since the complexity of the query is seen to grow quite rapidly as a function of a count of the structures that are combined.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset