This chapter provides an overview from DAMA-DMBOK2 on data modeling and design (excerpted from pages 123-148), and then covers the additional data modeling and design responsibilities needed for blockchain to work well within our organizations.

Overview from DAMA-DMBOK2

Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model.

There are a number of different schemes used to represent data. The six most commonly used schemes are: Relational, Dimensional, Object-Oriented, Fact-Based, Time-Based, and NoSQL. Models of these schemes exist at three levels of detail: conceptual, logical, and physical. Each model contains a set of components. Examples of components are entities, relationships, facts, keys, and attributes. Once a model is built, it needs to be reviewed; once approved, it must be maintained.

The goal of data modeling is to confirm and document understanding of different perspectives. This leads to applications that more closely align with current and future business requirements, and creates a foundation for broad-scoped initiatives such as Master Data Management and data governance programs. Proper data modeling leads to lower support costs and increases reusability opportunities for future initiatives, thereby reducing the costs of building new applications.

A conceptual data model captures the high-level data requirements as a collection of related concepts. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities.

For example, if we were to create a relational conceptual data model showing the relationship between students and a school, it might look like:

Each School may contain one or many Students, and each Student must come from one School. In addition, each Student may submit one or many Applications, and each Application must be submitted by one Student.

The relationship lines capture business rules on a relational data model. For example, Bob the student can attend County High School or Queens College, but cannot attend both when applying to this particular university. In addition, an application must be submitted by a single student—not two and not zero.

A logical data model is a detailed representation of data requirements, usually in support of a specific usage context, such as application requirements. Logical data models are still independent of any technology or specific implementation constraints. A logical data model often begins as an extension of a conceptual data model.

In a relational logical data model, the conceptual data model is extended by adding attributes. Attributes are assigned to entities by applying the technique of normalization:

A physical data model (PDM) represents a detailed technical solution. The logical data model is typically used as a starting point, and then adapted to work within a specific set of hardware, software, and network tools. A physical data model is built for a particular technology.

The following figure illustrates a relational physical data model. In this data model, School has been denormalized into the Student entity, in order to accommodate a particular technology. Perhaps whenever a Student is accessed, their school information is as well. As such, storing school information with Student is a more performant structure than having two separate structures.

Additional responsibilities due to blockchain

“Data modelers are responsible for discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model.”

These data requirements can be represented at a conceptual, logical, or physical level. The conceptual and logical levels are independent of technology, but the physical model is dependent on technology. The conceptual and logical therefore, provide insight into requirements and terminology that are true regardless of whether the data is stored in Oracle, Teradata, Hadoop, or in a blockchain ledger. As such, the conceptual and logical models must be built—regardless of how the data will be stored.

Managing keys

A data modeler is responsible for capturing and documenting candidate keys. A candidate key is an attribute (or a set of attributes) that can be used to identify an entity instance. For example, the candidate key Customer Number can be used to identify customers, such as Bob and Mary.

A candidate key must be unique, stable, and mandatory. Unique means there can never be duplicate values, such as two values of “123” for Student Number. Stable means that once assigned, values can never be updated. Mandatory means we always must have a value. There can never be null or empty values in those attributes that are part of the candidate key.

One of the candidate keys in each entity must be designated as a primary key. The primary key is the candidate key that represents the entity in its relationships with other entities.

Once a one-to-many relationship is drawn from some entity on the one side to another entity on the many side, the primary key of that entity on the one side is copied over as a foreign key to the entity on the many side. For instance, Student Number in the Application entity is a foreign key back to Student. This allows us to join tables in a database.

The candidate keys that are not chosen as the primary key are called alternate keys. Primary keys and alternate keys have the same properties of being unique, stable, and mandatory.

We can add alternate keys to our existing logical data model:

Here we have an alternate key on three attributes: StudentFirstName, StudentLastName, and StudentBirthDate. When there is more than one attribute in a candidate key, it is called a “composite” candidate key. Here we have a composite alternate key on these three attributes.

Returning to the prior physical data model and adding the alternate key, we get this model:

The data modeler must denote the blockchain-specific keys on this model. STUDENT and APPLICATION will have private keys, public keys, and possibly several blockchain addresses based on public keys.

Each entity will therefore have lots of additional keys for the modeler to manage and denote. In addition, it can be challenging to maintain the mapping between logical and physical data models; all of these additional keys are shown on the physical model only, which will complicate this mapping process.

Skipping conceptual and logical modeling

Data modelers are often under pressure to deliver a physical data model quickly (and cheaply). Furthermore, project managers and Agile teams rarely see the value of conceptual and logical data models—until it’s time to support the application.

This pressure to skip two essential layers of modeling is even more apparent when the underlying database is not relational, like with graph-based or document-based databases.

Blockchain fits under this non-relational type of structure as well. Therefore, the modeler will likely face demand to just design the ledger (and hope that the requirements typically documented at the conceptual and logical levels will magically appear in the physical).

Forward and reverse engineering

It is fairly straightforward to build a relational database design from a physical data model; this process is called “forward engineering.” Going from a relational database design to a physical data model, called “reverse engineering,” is similarly simple. Often just a few button clicks in our data modeling tools can create a database structure or create a model.

When the database is non-relational, however, it can be difficult or impossible to automate the forward or reverse engineering process. This difficulty is largely due to concepts in a non-relational database that do not have corresponding symbols in our modeling toolkit. For example, nested arrays in document-based databases had no symbol in our data modeling palette until very recently.

Blockchain also introduces new concepts that do not yet exist in our data modeling notations or, therefore, in our data modeling tools. For example, is there a data model symbol to denote a public or private key, similar to notating a primary or alternate key?

Over time, data modelers will need to expand the data modeling toolset once again to automate the processes of building or reverse-engineering a blockchain database.

Emphasizing the logical

The logical data model is a detailed representation of data requirements, and is independent of any technology. This makes the logical model a very powerful communication tool for everyone involved in blockchain development, because it can be referenced to confirm requirements or business rules.

For example, one of the usages for insurance is to ensure that claims are only paid if the insurance company actually has that policy. These rules are shown on the logical data model. The developer can use these rules and enforce them in the blockchain application, just like in any application. The difference is that the model is not yet forward engineered into blockchain, so the model would be for communication purposes only and not for automatically generating a database structure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset