The genesis of much data is the operational environment. As transactions are executed, data are generated as a by-product. Response time is essential for online transaction systems. Good response time is achieved by adherence to the “standard work unit.” The swu allows corporations to run many transactions and maintain consistent, good response time. The structure of operational data is achieved by the creation of a data model. The data model is built at several levels—the high level is the ERD, the midlevel is the dis, and the low level is the physical model. An important part of the operational environment is the component known as metadata. It is metadata that describe the structure and content of operational data.
Operational environment; Standard work unit (swu); Response time; ERD; Dis; Physical model; Data model; Metadata
The structured environment contains a lot of complex data with a lot of possibilities for organizing and arranging that data. In the structured environment, the analyst has the opportunity to shape the data according to his/her needs. And given the many ways that data can be shaped, the organization needs a “road map” to guide the organization in its efforts to shape the data.
The road map serves several important purposes:
There are many reasons then why large, complex organizations need a data model.
The data that are modeled are the data that sit at the heart of the business of the company. The data model is shaped around whatever is at the core of the business of the organization.
The data model is shaped around ONLY the granular detailed data of the organization. Bad things happen when the data modeler allows summarized or aggregated data to enter the data model. When summarized or aggregated, data are allowed to enter the data model:
The first step in building the data model is to remove all derived data—summarized or aggregated data—from the data model.
Fig. 7.3.1 shows that detailed granular data are separated from summarized or aggregated data when building the data model.
After the granular data are identified, the next step is to “abstract” the data. The data are abstracted to its highest meaningful level. The highest meaningful level is called an entity.
As a simple example of abstraction, suppose a corporation has female customers, male customers, foreign customers, corporate customers, and governmental customers. The data model creates the entity known as “customer” and wraps all of the different types of customer together.
Or suppose the company produces sports cars, sedans, SUVs, and trucks. The data model abstracts the data into the entity—vehicle.
The highest level of abstraction for the data model is called the entity relationship diagram (ERD). The ERD reflects data at its highest level of meaningful abstractions and their relationship to each other. The entities of the organization are identified, as well as the relationships between those entities.
Fig. 7.3.2 shows the symbol that identifies the entities and relationships in an ERD.
As an example of the ERD for a manufacturing company, the ERD might look like that seen in Fig. 7.3.3.
The ERD is important as a high-level statement of what the data model is all about. But—of necessity—there is very little detail found at the ERD level.
The next level of the data model is the place where much detail is found. This level of the data model is called the “data item set” (dis).
Each entity identified in the ERD has its own dis. Using the simple example shown in Fig. 7.3.3, there would be one dis for customer, another dis for order, another dis for product, and yet another dis for shipment.
The dis contains keys and attributes, and the dis shows the organization of the data.
The symbol for a simple dis is seen in Fig. 7.3.4.
The basic construct of a dis is a box. In the box are the elements of data that are closely related and that belong together. The different lines between the groupings of data have meaning. A downward-pointing line indicates multiple occurrences of data. A line to the right indicates a different type of data.
As a simple example of a dis, consider the dis shown in Fig. 7.3.5.
The anchor or primary data are indicated by the box of data that is at the top left of the diagram. The anchor box indicates that the data that relate directly to the key of the box are description, unit of measure, unit manufacturing cost, packaged size, and packaged weight. The elements of data exist once and only once for each product.
Data that can occur multiple times are shown beneath the anchor box of data. One such grouping of data is component id. There can exist multiple components for each product. Another grouping of data that is independent of component id is inventory date and location. The product may have been inventoried in multiple places on different dates.
The lines going to the right of the anchor box indicate types of data. In this case, a product may be used in flight or in ground support.
The dis indicates the keys, attributes, and relationships for an entity.
Once the dis is created, the physical design of the dis is created. Each grouping of data in the dis results in a separate database design.
Fig. 7.3.6 shows the database design that has resulted from the design of the grouping of data found in the dis.
The physical database design takes into account the physical structure of the data, the physical characteristics of the data, the specification of keys, the specification of indexes, and so forth.
The result of the physical specification of the data is a database design, as shown in Fig. 7.3.7.
The elements of the database design include keys, attributes, records, and indexes.
The different levels of the data model are akin to the different levels of mapping that exist in the world. Fig. 7.3.8 shows how the different levels of mapping relate to each other.
In Fig. 7.3.8, it is seen that the ERD is the equivalent to a globe of the world. The dis is the equivalent to the map of Texas. And the physical database design is the equivalent of the city map of Dallas, Texas. The globe—the ERD—is complete but not detailed. The map of Texas—the dis—is incomplete in that you can’t find your way to and from Chicago with a map of Texas. But the map of Texas has a great deal more detail than the globe. The city map of Dallas—the physical data model—is even less complete. You cannot find your way from El Paso to Midland with a city map of Dallas. But you have even more details in the city map of Dallas than you do in the state map of Texas.
The complete linkage of the different forms of data modeling to each other is shown in Fig. 7.3.9.
It has been noticed that when a data model is created, it oftentimes applies very nicely to other companies in the same industry. For example, a bank—ABC—creates a data model. Then one day, it is discovered that the data model for bank ABC is very similar to the data model for bank BCD, CDE, and DEF.
Because of the great similarity of data models within the same industry, there are models called “generic data models.” The idea behind a generic data model is that it is much less expensive and much faster to acquire a generic data model than it is to build a data model from scratch. It is true that any generic data model is going to need customization. But even with customization, using a generic data model is much preferable to having to build the data model by itself.
There are different types of data models. There are operational data models and data warehouse data models. An operational data model is one that models the day-to-day operations of the company. The data warehouse data model is one that is based on the informational needs of the organization. The operational data model includes some information that is needed for operational processing only, such as a specific telephone number. The data warehouse data model does not contain data that are specific to operational processing. The data warehouse data model does not contain any summarized data. The data warehouse data model does contain a time stamp for every record in the model.