Chapter 7.3

Data Modeling for the Structured Environment

Abstract

The genesis of much data is the operational environment. As transactions are executed, data are generated as a by-product. Response time is essential for online transaction systems. Good response time is achieved by adherence to the “standard work unit.” The swu allows corporations to run many transactions and maintain consistent, good response time. The structure of operational data is achieved by the creation of a data model. The data model is built at several levels—the high level is the ERD, the midlevel is the dis, and the low level is the physical model. An important part of the operational environment is the component known as metadata. It is metadata that describe the structure and content of operational data.

Keywords

Operational environment; Standard work unit (swu); Response time; ERD; Dis; Physical model; Data model; Metadata

The structured environment contains a lot of complex data with a lot of possibilities for organizing and arranging that data. In the structured environment, the analyst has the opportunity to shape the data according to his/her needs. And given the many ways that data can be shaped, the organization needs a “road map” to guide the organization in its efforts to shape the data.

The Purpose of the Roadmap

The road map serves several important purposes:

  • - The road map serves as a direction for the organization to go.
  • - The road map serves as a guide to different people with different agendas who still must build a collaborative effort.
  • - The road map allows a large effort to be sustained over time.
  • - The road map serves as a guide to end users who ultimately must navigate the final product.

There are many reasons then why large, complex organizations need a data model.

The data that are modeled are the data that sit at the heart of the business of the company. The data model is shaped around whatever is at the core of the business of the organization.

Granular Data Only

The data model is shaped around ONLY the granular detailed data of the organization. Bad things happen when the data modeler allows summarized or aggregated data to enter the data model. When summarized or aggregated, data are allowed to enter the data model:

  • - There is a HUGE amount of data to be modeled.
  • - The formula for calculating the summarized data changes faster than the modeler can create and change the model.
  • - Different people have different formulas for the same or similar calculations.

The first step in building the data model is to remove all derived data—summarized or aggregated data—from the data model.

Fig. 7.3.1 shows that detailed granular data are separated from summarized or aggregated data when building the data model.

Fig. 7.3.1
Fig. 7.3.1 Separating detailed and summarized data.

After the granular data are identified, the next step is to “abstract” the data. The data are abstracted to its highest meaningful level. The highest meaningful level is called an entity.

As a simple example of abstraction, suppose a corporation has female customers, male customers, foreign customers, corporate customers, and governmental customers. The data model creates the entity known as “customer” and wraps all of the different types of customer together.

Or suppose the company produces sports cars, sedans, SUVs, and trucks. The data model abstracts the data into the entity—vehicle.

The ERD

The highest level of abstraction for the data model is called the entity relationship diagram (ERD). The ERD reflects data at its highest level of meaningful abstractions and their relationship to each other. The entities of the organization are identified, as well as the relationships between those entities.

Fig. 7.3.2 shows the symbol that identifies the entities and relationships in an ERD.

Fig. 7.3.2
Fig. 7.3.2 The high level data model.

As an example of the ERD for a manufacturing company, the ERD might look like that seen in Fig. 7.3.3.

Fig. 7.3.3
Fig. 7.3.3 Some simple entities and their relationship.

The ERD is important as a high-level statement of what the data model is all about. But—of necessity—there is very little detail found at the ERD level.

The Dis

The next level of the data model is the place where much detail is found. This level of the data model is called the “data item set” (dis).

Each entity identified in the ERD has its own dis. Using the simple example shown in Fig. 7.3.3, there would be one dis for customer, another dis for order, another dis for product, and yet another dis for shipment.

The dis contains keys and attributes, and the dis shows the organization of the data.

The symbol for a simple dis is seen in Fig. 7.3.4.

Fig. 7.3.4
Fig. 7.3.4 A data item set—dis.

The basic construct of a dis is a box. In the box are the elements of data that are closely related and that belong together. The different lines between the groupings of data have meaning. A downward-pointing line indicates multiple occurrences of data. A line to the right indicates a different type of data.

As a simple example of a dis, consider the dis shown in Fig. 7.3.5.

Fig. 7.3.5
Fig. 7.3.5 A simple dis.

The anchor or primary data are indicated by the box of data that is at the top left of the diagram. The anchor box indicates that the data that relate directly to the key of the box are description, unit of measure, unit manufacturing cost, packaged size, and packaged weight. The elements of data exist once and only once for each product.

Data that can occur multiple times are shown beneath the anchor box of data. One such grouping of data is component id. There can exist multiple components for each product. Another grouping of data that is independent of component id is inventory date and location. The product may have been inventoried in multiple places on different dates.

The lines going to the right of the anchor box indicate types of data. In this case, a product may be used in flight or in ground support.

The dis indicates the keys, attributes, and relationships for an entity.

Physical Data Base Design

Once the dis is created, the physical design of the dis is created. Each grouping of data in the dis results in a separate database design.

Fig. 7.3.6 shows the database design that has resulted from the design of the grouping of data found in the dis.

Fig. 7.3.6
Fig. 7.3.6 The physical model.

The physical database design takes into account the physical structure of the data, the physical characteristics of the data, the specification of keys, the specification of indexes, and so forth.

The result of the physical specification of the data is a database design, as shown in Fig. 7.3.7.

Fig. 7.3.7
Fig. 7.3.7 The elements of a database.

The elements of the database design include keys, attributes, records, and indexes.

Relating the Different Levels of the Data Model

The different levels of the data model are akin to the different levels of mapping that exist in the world. Fig. 7.3.8 shows how the different levels of mapping relate to each other.

Fig. 7.3.8
Fig. 7.3.8 Different levels of modelling.

In Fig. 7.3.8, it is seen that the ERD is the equivalent to a globe of the world. The dis is the equivalent to the map of Texas. And the physical database design is the equivalent of the city map of Dallas, Texas. The globe—the ERD—is complete but not detailed. The map of Texas—the dis—is incomplete in that you can’t find your way to and from Chicago with a map of Texas. But the map of Texas has a great deal more detail than the globe. The city map of Dallas—the physical data model—is even less complete. You cannot find your way from El Paso to Midland with a city map of Dallas. But you have even more details in the city map of Dallas than you do in the state map of Texas.

An Example of the Linkage

The complete linkage of the different forms of data modeling to each other is shown in Fig. 7.3.9.

Fig. 7.3.9
Fig. 7.3.9 The sequence of the steps in doing database design.

Generic Data Models

It has been noticed that when a data model is created, it oftentimes applies very nicely to other companies in the same industry. For example, a bank—ABC—creates a data model. Then one day, it is discovered that the data model for bank ABC is very similar to the data model for bank BCD, CDE, and DEF.

Because of the great similarity of data models within the same industry, there are models called “generic data models.” The idea behind a generic data model is that it is much less expensive and much faster to acquire a generic data model than it is to build a data model from scratch. It is true that any generic data model is going to need customization. But even with customization, using a generic data model is much preferable to having to build the data model by itself.

Operational Data Models/Data Warehouse Data Models

There are different types of data models. There are operational data models and data warehouse data models. An operational data model is one that models the day-to-day operations of the company. The data warehouse data model is one that is based on the informational needs of the organization. The operational data model includes some information that is needed for operational processing only, such as a specific telephone number. The data warehouse data model does not contain data that are specific to operational processing. The data warehouse data model does not contain any summarized data. The data warehouse data model does contain a time stamp for every record in the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset