Image241123.jpg

Chapter 7
Application Data Pond

The application data pond is where application related data is placed. Much of (but not all) application data is transaction related. A transaction occurs and an electronic record is made of the transaction. The electronic record is stored and used in the operational systems of the corporation. The electronic record is then used to conduct current business. After the electronic record has fulfilled its active life in the operational environment, the record of the transaction finds its way into the application data pond.

Another form of operational application data may find its way into the application data pond. There may be customer lists, product catalogs, packing lists, shipment schedules, delivery schedules, phone call records, and so forth that are all captured as operational application data.

DNA of Data

One of the shaping factors in application data is the infrastructure of the operational system. The original recording of the data as it is captured and settles into the operational application has a profound effect on the storage and organization of the data that arrives in the application data bank. In many ways the original operational application capturing and storing the data becomes the DNA of the application data. The DNA of the application data is as profound as the ethnic race that each person on Earth has in his/her background. In one form or another, each person has their own ethnic origin, and the DNA of the person affects that person all their life. It affects their health, their height and weight, and many other aspects of life. DNA is one of the defining characteristics of application data, just as it is with life.

Operational application data has the same profound DNA origins. The operational processing of data determines the level of data granularity, data organization, contents of the data, the business events which are noteworthy, the timing of the events, the way data is shaped and stored, and so forth. Fig 7.1 shows that the application infrastructure has a profound influence on data as it enters the application data pond.

Image250628.jpg

Fig 7.1 Influencing the data as it arrives in the data pond

Descriptors

The descriptors of the application data include such items as the source of the application data, the approximate volume of the application data, the frequency with which the application data is harvested, and other related information. The descriptor information is useful to the analyst in the application data pond in determining how to create and accurately analyze application data.

It’s normal for the application data pond to contain data from many applications. For a large corporation that is almost always true. It’s also possible but quite rare that all of the application data that resides in the data pond come from a single application of data. Nearly all large corporations run on a multiplicity of applications, both in-house and vendor solutions. Fig 7.2 depicts the descriptors of the application data pond.

Image250639.jpg

Fig 7.2 Depicting the descriptors of the application data pond

Standard Database Format

It is normal for application-based data to be entered into the application data pond in a standard relational database format. Most applications have data stored in a row and column format. So application data will usually be stored and transported into the application data pond in this standard database format.

Note that this assumption about the application data pond is very different than the assumptions made about the analog data pond. In the analog data pond, information arrives in a raw data state – usually a long list of measurements. In the application data pond, it is common for the data to arrive in a database format.

Interestingly, just because data arrives into the application data pond in a database format does not mean the advantages of a database are necessarily carried with the application data. Just because data was created in a relational database format does not mean the discipline and rigor that accompany a database will extend to the application data pond. Once the application data is in the data pond, it is governed by whatever technology is used to manage the application data pond, which most likely is not a standard database management system.

Basic Organization of Data

Because of the application origin of the data, details in the application data pond typically are divided into records. Records have attributes and some attributes can be keys, while other attributes can be indexed. Fig 7.3 shows the basic organization of data inside the application data pond.

Image250647.jpg

Fig 7.3 Organizing data inside the application data pond

Integration of Data

When data arrives in the application data pond, it may or may not have a business related structure. If the data has been integrated before being passed to the application data pond, then it may have a structure inherently embedded into the data. But if the data has not been integrated along the lines of business before entering the application data pond, then the data will not magically become integrated.

Having an integrated business orientation means the data is organized along the lines of the major subject areas of the organization. Typical corporate subjects are customer, product, shipment, order, delivery, and so forth.

It is mandatory that the data have an integrated alignment with the business if the analyst is to make any sense of the data. The biggest impediment to the effective analysis of data in the application data pond is the lack of integration.

Data Model

In order to achieve integration of the data in the application data pond, it is necessary to have a data model in place. Usually there is a corporate data model. If there is no corporate data model, then there are generic business models which are available.

Care must be taken in selecting the data model for the application data pond. Separate data models are needed for business operations than warehouse operations, for example. In most cases, the corporate data warehouse data model is the appropriate model for the application data pond. Fig 7.4 shows the data model which becomes the “target” for the application data pond.

Image250655.jpg

Fig 7.4 Creating the data model “target” of the application data pond

There are many advantages to the data model. One advantage is that the data model provides high-level guidance as to how data should be related. This high-level perspective is through entities and relationships or subject areas. But there is a lower level perspective that accompanies the data model. At the more detailed level, the data model provides a guide to such important elements as metadata. The metadata gives a detailed description of the data, such as defining records and their meaning, attributes and their meaning, keys, indexes, data relationships and so forth.

The analyst preparing to use the application data pond finds the metadata definitions very useful in preparing an analysis of the data in the application data pond. But the data model for the application pond has one complication that classical data models do not have. The application data pond holds data over a lengthy period of time, but the data model itself changes over time. As a result, the data model for the application data pond needs to be quite flexible.

The analyst needs to know what changes have been made to metadata over time, since they have to be factored in to the analysis of the data found in the application data pond. So the data model for the application data pond is a very sophisticated model.

Necessity of Integration

If data finds its way into the application data pond in an integrated state, the organization is lucky. If data finds its way into the application data pond in an unintegrated state (which is the normal case) the organization must transform the data after it has entered the application data pond. This transformation step is very similar to conditioning for the analog data pond.

If data is to be meaningfully used for analysis in the application data pond, the transformation of data into an integrated state is absolutely necessary. There are many reasons for the transformation and integration of application data pond data. Consider the following set of transformations, as seen in Fig 7.5.

The different applications have gender encoding. In order to make the analysis consistent, the application data needs to be transformed into a consistent definition of gender. The same considerations hold true for measurement of distance. Inches and feet and yards need to be converted to centimeters if consistent and meaningful analysis is to be done.

Image250663.jpg

Fig 7.5 Transforming data in the application data pond

The same sort of conversion must be done to put Australian and Canadian dollars in a consistent currency, for example.

Unfortunately the conversions for integration that need to be made in the figure are only the tip of the iceberg. There are many, many other conversions required in order to transform the data into an integrated state. And the analyst cannot do meaningful analysis unless the data has been converted. Fig 7.6 shows that a fundamental transformation of data is needed within the application data pond from the time data enters until such time as data is usable within the pond.

Image250672.jpg

Fig 7.6 Transforming until the data is useable

Pointing From one Application to the Next

In some cases when two applications are merged, the result is a pointer from one to the next. This is a simple relationship.

As an example, consider the business activity of placing an order for tickets for a Saturday night performance. There is a customer application, database, and ticket order application with database. In this case, there might be a simple structuring of the customer application that looks like:

Bill Inmon

John Williams

Carol Renne

Georgia Burleson

Jeanne Friedman

The ticket database might look like:

Sat night 7:15 seat A12

Sat night 7:15 seat A13

Sat night 7:15 seat A14

Sat night 7:15 seat A16

Once the data is integrated, the result might look like:

Bill Inmon Seat A12, seat A13

John Williams

Carol Renne Seat A15

Georgia Burleson Seat A14

Jeanne Friedman

Fig 7.7 shows the integration of a simple pointer relationship between applications.

Image250680.jpg

Fig 7.7 Integrating data within the application data pond

Intersecting Applications

A more complex relationship is that of two applications intersecting. When two applications have an intersection there is data independently created as a result of the crossover. The independently created data forms its own independent collection of data. As an example of an independently created collection of application data, suppose there was an oil company and a gasoline distribution company. On Sept 2, the distribution company makes a delivery of gasoline. The database might look like:

Oil Company Distribution Company

Standard Oil Flying Horse Shipping

Conoco Akers Distributing

Texaco

Now suppose a set of deliveries were done. A delivery might look like:

Delivery AS15-YR

From Standard Oil

To 6534 Wolfensberger Road

Castle Rock, CO

By Flying Horse Shipping

Amount: 2000 gallons

Date Sept 2

The intersecting data stands in compliance with the existing application data. Fig 7.8 shows that there can be intersection data in the application data pond as well as other types of data.

Image250687.jpg

Fig 7.8 Different types of data in the application data pond

Subsets of Data in the Application Data Pond

On occasion, the analyst may wish to select data from application data that have already been integrated. This too is a possibility. Fig 7.9 shows that a subset of data from an application can be selected and stored in the application data pond.

Image250694.jpg

Fig 7.9 Choosing an application subset to store in the application data pond

As an example of data that can be selected, suppose the application database contains all telephone calls made in the month of May. The analyst may wish to select all phone calls greater than three minutes made on May 15. In doing so, the analyst greatly narrows down the work the system has to do in order to find the data they are looking for.

In Summary

Once data has been integrated, it is now fit for analysis. Fig 7.10 shows that analysis can be done on data in an integrated application data pond.

Image250703.jpg

Fig 7.10 Analyzing data in the application data pond

The DNA of an application is the infrastructure of the data as it exists in the operational environment. The infrastructure of operational data extends well into the application data pond. The data descriptors found in the application data pond are of great use to the analyst.

The normal case is for data in the operational environment to have been stored in a relational format. The relational format has records, attributes, keys, indexes and so forth. When the data is placed in an application data pond, the data reflects its relational origins even though the data management of the application data pond is not a relational DBMS.

However it occurs, data in the application data pond must be integrated. Integration is a necessity of the business analyst that will be using the data.

Data in the application data pond goes through a conditioning process, just as data in the analog data pond must be conditioned. However, the conditioning that occurs in the application data pond is very different from the conditioning that occurs in the analog data pond.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset