Chapter 12.1

Operational Analytics

Abstract

Performance is of the essence in the operational environment. In order to enhance performance, data are denormalized. However, when data are denormalized, data become disintegrated. In order to achieve integration, data are integrated as they enter the data warehouse. Data in the warehouse are at a granular. Because data in the data warehouse are granular, the data can be shaped and reshaped into many different forms, serving the needs of many different communities. Supplementing the data warehouse are data marts. Data marts primarily support KPIs.

Keywords

Denormalization; Integration; ETL; Data warehouse; Granularity; Data marts; KPIs; Historical data

The operational environment is one where the day-to-day activities of the corporation take place. Sales are made. Bank deposits are made. Insurance policies are sold. Grocery shelves are stocked.

In short, the world operates in a modern efficient manner when the operational world runs properly.

The data that are generated by operational processing are of enormous value to the world.

Fig. 12.1.1 depicts the operational environment.

Fig. 12.1.1
Fig. 12.1.1 Operational environment.

Operational analytics consist of the decisions that are made as a result of the execution of transactions in the operational environment. Those points of data that are at the heart of operational analytics are generated by operational systems. Operational systems are those systems that run transactions and manage data inside a database management system.

There are many characteristics of operational systems. The essence of operational applications is depicted in Fig. 12.1.2.

Fig. 12.1.2
Fig. 12.1.2 The essence of operational processing.

Operational systems have the mission of being speedy execution, operation against data at a detailed level, and bound together in applications.

Because of the need for speedy execution of transaction, data are often denormalized. Denormalization is the design technique that the designer needs to use in order to enhance performance. But because data are denormalized, they are “pulled apart.” One unit of data is found in one database, and the same unit of data is found in another database. The fragmentation of data into separate databases is a natural result of the need for denormalization of data in a high-performance environment. Denormalization of data in the high-performance transaction processing environment is a normal, natural phenomenon.

But there is a side effect of denormalization of data. Because data are denormalized in the operational environment, data are not integrated. The same unit of data often exists in several places. (Or in the worst case, the same unit of data exists in many, many, many places.) The net effect of the same data existing in many places is that the data lose their integrity. One user accesses the data in one place and gets one value. Another user accesses the same data in another place and gets a very different value.

Both users think they have the correct value of data. And both users have very different values.

This lack of integrity of data is seen in Fig. 12.1.3.

Fig. 12.1.3
Fig. 12.1.3 The lack of integrity of data.

One can see the frustration across the organization. How in the world can decisions be made when no one knows what the correct value of data is?

But the lack of integrity is not the only problem with operational applications. Another problem with operational applications is that there is only a minimal amount of historical data to be found in operational applications.

There is a good reason for there being minimal history in operational applications. The reason for the minimal amount of historical data is that the need for high-performance trumps all other operational objectives. System tuners long ago discovered that the more data there are in a system, the slower the system runs. Therefore, in order to have optimal performance, system tuners jettisoned historical data. Because operational systems have a need for high performance, of necessity, there is little historical data found in the operational environment.

Fig. 12.1.4 shows that there is minimal historical data found in the operational environment.

Fig. 12.1.4
Fig. 12.1.4 Very little historical data found in the operational environment.

But there is a problem with jettisoning historical data. That problem is that historical data are useful for many purposes. Historical data are useful in the following:

  • Spotting and measuring trends
  • Understanding the long-term habits of customers
  • Looking at developing patterns
  • And so forth

Because of the lack of integrity of data and because of the need to have a place to house historical data, there arose a need for a different kind of architectural structure than the operational application. Because of the need to do analytic processing (as opposed to transactional processing), there appeared in the world a structure called the “data warehouse.”

Fig. 12.1.5 shows the emergence of the data warehouse.

Fig. 12.1.5
Fig. 12.1.5 The data warehouse—the single version of the truth.

The definition of a data warehouse has been around since the beginning of data warehousing. A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. A data warehouse contains detailed, integrated data that are historical.

Another way of thinking about a data warehouse is that a data warehouse is a “single version of the truth.” The data warehouse is the detailed, integrated bedrock data that can be used for decision-making purposes throughout the organization.

The data model that best serves as a basis for the data warehouse is the relational model. The relational model is normalized data and is good for representing data at its most granular level.

Fig. 12.1.6 shows the relational model that serves as a design foundation for the data warehouse.

Fig. 12.1.6
Fig. 12.1.6 The relational model and the data warehouse.

Data are loaded into the data warehouse from the operational applications. Data in the operational applications reside in the applications in a denormalized state. Data are loaded into the data warehouse through technology known as “ETL” (“extract/transform/load”) technology.

Fig. 12.1.7 shows the loading of data into the data warehouse from the operational environment, passing through technology known as ETL technology.

Fig. 12.1.7
Fig. 12.1.7 Transforming application data into corporate data.

In fact, data are not “loaded” into the data warehouse at all. The reality is that data are transformed as they pass from the operational environment to the data warehouse environment. In the operational environment, data are designed into a denormalized state. In the data warehouse, data are designed into the normalized state. The purpose of ETL processing is to transform application data into corporate data. To the uninitiated, this transformation doesn’t seem to be a difficult process. But in fact, it is.

In order to understand the transformation accomplished by textual ETL, refer to the transformation depicted in Fig. 12.1.8.

Fig. 12.1.8
Fig. 12.1.8 Transformations.

In Fig. 12.1.8, application data hold different renditions of data for gender and measurement. In one application, gender is indicated by the values—male and female. In another application, gender is indicated by 1 and 0. In one application, measurement is made in inches. In another application, measurement is made in centimeters.

In the data warehouse, there is one indicator for gender—m and f. In the data warehouse, there is one unit of measurement—centimeters. The transformation from application data to corporate data is made during the ETL process.

The diagram in Fig. 12.1.8 is a good illustration of what is meant by the difference between application data and corporate data.

A fundamental concept to the integrity of data and the establishment of corporate data is the “system of record.” The system of record is the definitive data of the corporation. In the operational environment, the system of record is the data that feed values to the data warehouse.

Fig. 12.1.9 illustrates the system of record in the operational environment.

Fig. 12.1.9
Fig. 12.1.9 The system of record in the operational environment.

It is worthwhile noting that the system of record moves from one environment to the next. The system of record for operational data resides in the operational environment. But as data pass into the data warehouse, the system of record also passes into the data warehouse. The difference is the timeliness of the data. Data in the operational environment are accurate as of the moment of access. Stated differently, data in the operational environment are up to the second accurate data. But when the system of record data moves to the data warehouse, the system of record data becomes accurate as to the moment in history that is reflected in the data warehouse. The system of record is historically accurate in the data warehouse.

Different Perspectives of Data

One of the most important functions of the data warehouse is the ability to serve as a foundation for different organizations to look at the same data differently and still have the same foundation of data.

Fig. 12.1.10 shows this capability.

Fig. 12.1.10
Fig. 12.1.10 Different departments look at the same data differently.

The reason the data warehouse can serve as a foundation of data for different organizations is that the data in the data warehouse are granular and integrated. You can think of the data in the data warehouse as grains of sand. Sand can be shaped into many different final goods—silicon chips, wine glasses, automobile headlights, body parts, and so forth. And by the same token, marketing can look at data in the data warehouse one way, finance can look at data in the data warehouse another way, and sales can look at data in the data warehouse yet another way. And yet, all of the organizations are looking at the same data, and there is reconcilability of data.

The ability to serve different communities is one of the most important characteristics of the data warehouse.

Data Marts

The way that the data warehouse serves the different communities is through the creation of data marts. Fig. 12.1.11 shows that the data warehouse serves as a basis for data in the data marts.

Fig. 12.1.11
Fig. 12.1.11 Data marts are fed from the data warehouse.

In Fig. 12.1.11, it is seen that there are different data marts for different organizations. The data warehouse and its granular data serve as a basis for the data found in the data marts. The granular data in the data warehouse are summarized and otherwise aggregated into the form that each data mart requires. Note that each data mart and each organization will have their own way of summarizing and aggregating data. Stated differently, the data mart for finance will be different from the data mart for marketing.

Data marts are best based on the dimensional model, as seen in Fig. 12.1.12.

Fig. 12.1.12
Fig. 12.1.12 Star joins.

In the dimensional model are found fact tables and dimension tables. Tables and dimension tables are attached together to form what is known as the “star” join. The star join is designed to be optimal for the informational needs of a department.

The data marts and the data warehouse combine to form an architecture, as seen in Fig. 12.1.13.

Fig. 12.1.13
Fig. 12.1.13 Data marts and the dimensional model.

In Fig. 12.1.13, it is seen that the integration of data occurs as data are placed in an integrated, historical fashion in the data warehouse. Once the foundation of data is built, the data are passed into the different data marts. As data are passed into the data marts, data are summarized or otherwise aggregated.

The Operational Data Store—ODS

There is another data structure that sometimes appears in data architecture, and that structure is one known as the ODS, or “operational data store.”

Fig. 12.1.14 depicts an ODS.

Fig. 12.1.14
Fig. 12.1.14 The ODS.

The ODS has some characteristics of the data warehouse and some characteristics of the operational environment. The ODS can be updated in real time, and the ODS can support high-performance transaction processing. But the ODS also contains integrated data.

In many ways, the ODS is a “halfway” store for data.

Fig. 12.1.15 shows the ODS.

Fig. 12.1.15
Fig. 12.1.15 The ODS.

The ODS is an optional data structure for corporations. Some corporations have need of an ODS; other corporations do not need the ODS. As a rule, if an organization does significant amounts of transaction processing, it will need an ODS.

The type of data that is found in the data marts usually includes what is known as a KPI. A KPI stands for a “key performance indicator.”

Fig. 12.1.16 illustrates that data marts usually contain one or more KPIs.

Fig. 12.1.16
Fig. 12.1.16 Data marts and KPIs.

Every corporation has its own set of KPIs. Some typical KPIs might include the following:

  • Cash on hand
  • Number of employees
  • Product order backlog
  • The sales pipeline
  • New product acceptance
  • Inventory for sale

KPIs are typically measured on a monthly basis. Fig. 12.1.17 shows such a periodic measurement of KPIs.

Fig. 12.1.17
Fig. 12.1.17 Data marts typically contain multiple KPIs.

There are lots of reasons to measure KPIs on a monthly basis. One value is the ability to spot trends as the trends are happening.

There is a problem with spotting trends on KPIs on a monthly basis, and that problem is that many KPIs are seasonal. By looking at a month-by-month trend line, the trend may not be accurate. To spot seasonal trends, it is necessary to have a measurement of KPIs over multiple years, as seen in Fig. 12.1.18.

Fig. 12.1.18
Fig. 12.1.18 Many KPIs are seasonal.

In addition to having KPIs, data marts oftentimes housed in what are termed “cubes.” Fig. 12.1.19 shows that cubes often appear in data marts or in conjunction with data marts.

Fig. 12.1.19
Fig. 12.1.19 OLAP technology.

A cube is an arrangement of data that allows data to be examined from different perspectives.

One of the characteristics of data marts is that they are relatively easy and fast to create. Because of the ease of creation, most organizations build new data marts rather than do maintenance to existing data marts.

Fig. 12.1.20 shows that data marts are created rather than maintaining older data marts that need to have a reflection of new requirements.

Fig. 12.1.20
Fig. 12.1.20 Throwing data marts away.

The long-term effect of constantly creating new data marts is that after a while, the organization is supporting many data marts that aren’t being used.

Because data marts contain KPIs, there is great propensity for change. That is because KPIs are constantly changing. Every time the focus of a business changes, so do its KPIs. One day, the business is interested in profitability. In this case, KPIs focus on revenue and expenses. The next day, the business focuses on market share. The KPIs now relate to new customers and customer retention. The next day, the focus changes to meeting competition. The KPIs now change to looking at product acceptance and product differentiation.

As long as the business changes (and business change is simply a fact of life), KPIs also change. And as long as KPIs change, data marts change.

The generic architecture for the data architecture component of the operational environment is depicted in Fig. 12.1.21.

Fig. 12.1.21
Fig. 12.1.21 Modern operational architecture.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset