Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3.1

Transformations in the End-State Architecture

Abstract

One of the essences of the end-state architecture is that transformations of data occur repeatedly throughout the architecture. Although it may appear that data are static at any one moment in time, over a long spectrum of time, data are anything but static. There are many reasons for the transformations of data that occur.

Keywords

ETL; End state architecture; Transformation; Redundant data; Textual ETL; Dimensional modelling; Corporate data; Application data; Bulk storage

When you take your first glance at the end-state data architecture, several things jump out at you. One of those things is the need for transformation processes. There are a variety of transformations that occur. There is textual ETL. There is ETL. There are data marts that are created by using the techniques of dimensional data modeling. There is refinement of bulk data, and so forth.

Redundant Data

One of the apparent by products of this transformation process is the creation (or proliferation) of redundant data. A superficial glance at the end-state architecture produces the conclusion that redundancy of data is to be found everywhere in the architecture.

Fig. 3.1.1 shows the apparent proliferation of redundant data in the end-state data architecture.

When you look at the simple example shown in Fig. 3.1.1, it is hard to argue that there is no redundancy of data found in the end-state architecture. The example proves that—in fact—there is redundancy of data. However, there is more to the example than meets the eye. The redundancy of data found in the end-state architecture deserves a more careful scrutiny.

While it is true that there is redundancy of data in the end-state data architecture, there are some very good and very powerful reasons for the redundancy.

Transformations

In order to understand the role that redundancy of data plays, it is necessary to understand the transformations of data found in the end-state data architecture. There are several major transformations of data found in the end-state data architecture. Those transformations are the following:

The transformation of text into a database format—textual ETL
The transformation of application data into corporate data—ETL
The transformation of corporate data into customized analytic data—dimensional modeling
The transformation of corporate data into bulk corporate data
The transformation of automatically generated data into a data lake
The refinement of bulk data into corporate analytic data

There is a good reason for each of these transformations.

When you look at the larger picture of what is going on, the creation and proliferation of redundancy is not nearly as simple and straightforward as it at first seems.

Consider the transformation shown in Fig. 3.1.2.

Fig. 3.1.2 shows that application data are transformed into corporate data. As a simple example of the transformation, all corporate designations of gender are converted to either male or female. It just so happens that in the application state, the data for Mary Smith just happen to be female. So no conversion is done for Mary's record. And indeed, the record for Mary Smith in the applications is redundant with the record for Mary Smith in the corporation. But conversions from other application data are made. So, looking at just one record of data may lead to the incorrect conclusion about the issue of the redundancy of data.

Customizing Data

There are many reasons why data need to be transformed. Turning data into corporate data is only one of the many reasons. Another reason data need to be transformed is to customize data for the purpose of analytic processing.

Fig. 3.1.3 shows that records are edited and collected so that a customized analysis can be done.

In order to do customized analysis, it is necessary to use data collectively. And data need to be integrated before they can be used collectively.

Transforming Text

One of the most obvious transformations is that of the reading of raw text and the conversion of the raw text into a standard database format. A lot of work goes into the creation of the database because of the need to determine both the value of text and the context of text. The database that is created contains both the text and the context of the text.

Fig. 3.1.4 shows the transformation of text into the format of a database.

Once you see the transformation found in Fig. 3.1.4, it is obvious why there is value in transformation. You cannot effectively do analytic processing on raw text. Instead, the raw text must be read; the text must be analyzed and converted into the form of a database. Once the text is converted into the form of a database, it can then be used for analytic processing. As long as the text is still in the form of text, it cannot be meaningfully used as part of analytic processing.

Transforming Application Data

Another common form of transformation is that of converting data from application data into corporate data.

Fig. 3.1.5 shows this transformation.

In Fig. 3.1.5, application data are seen in its raw state. Data are unintegrated. In one record, male is designated as 1, and female is designated as a Y. In another record, male is an X, and female is 0. And other elements of data are similarly unintegrated.

Trying to look at the application data from a corporate perspective is very difficult to do. A transformation of data occurs as data are put into a corporate format. Data are converted into a common format and placed into a data warehouse. Now, the data warehouse can be read and analyzed corporately.

Transforming Data Into a Customized State

Yet, another type of transformation occurs when data need to be customized for the purpose of analytic processing.

Fig. 3.1.6 shows this kind of transformation.

In this transformation, raw, detailed data are read. The data are then summarized and put into a data mart for further analysis.

The customization is typically done for marketing, sales, or finance. However, there are other organizations who occasionally need to do such a transformation.

Typically, the analysis that is done on the customized data is done in the fashion of establishing and measuring key performance indicators—KPIs. Typically, KPIs are calculated on a periodic basis—monthly, weekly, quarterly, etc.

Transforming Data Into Bulk Storage

Another form of transformation is the movement of data from an active component to a less active component. The movement is done when the probability of access for a given unit of data drops. A typical strategy is to move the data as they age based on the assumption that older data are accessed less frequently than current data.

There are however other occasions where the probability of access of data drops other than through aging.

The movement of data from active storage to less active storage is seen in Fig. 3.1.7.

Transforming Data Generated Automatically

Another important transformation of data occurs as automatically generated data enter the data lake.

Fig. 3.1.8 shows that data are automatically generated (often times by a machine).

In Fig. 3.1.8, data are generated quickly and in great volumes. Several things happen to the data that are generated automatically. Not all data are selected for movement into the data lake. Some data are selected randomly. Other data are selected because they are outside a preset threshold of boundaries. Other data are selected because of the time of day they were generated. There are many criteria that can be applied to the selection of data that have been automatically generated.

After the data for movement are selected, other data are typically added. Typical data that are added are the date and time of the generation, the location of the data, the machine identification of the data generated, and so forth.

After the data have been selected and modified, it is placed in the data lake.

Transforming Bulk Data

One of the more interesting transformations occurs when data go from the data lake back to the corporate data warehouse. In this case, mass amounts of data are read and filtered. The results of the filtering are sent to the data warehouse where the data can be actively analyzed. In addition, the filtered data can be combined with existing active data.

Fig. 3.1.9 shows the refinement of bulk data.

Transformation and Redundancy

There are then a number of good reasons for the transformation of data throughout the end-state architecture. There is no question that there is some degree of redundancy of data that occurs. But as data move across the architecture, they move for very valid reasons.

Fig. 3.1.10 shows some of the major reasons why there is transformation inside the end-state architecture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3.1: Transformations in the End-State Architecture

Create new playlist

Sign In

Sign Up