Chapter 8.4

Data Architecture: A High-Level Perspective

Abstract

Data architecture began with simple storage devices. But soon, the need to store lots of data and to access the data quickly caused these early devices to disappear. In its place came disk storage. With disk storage, data could be accessed directly. But the need for managing volumes of data surpassed that of disk storage. One day, there appeared big data. And with big data came the ability to store effectively unlimited amounts of data. But as big data grew, the older day-to-day systems did not go away. There began to be a need for a rational way to interface legacy systems to big data.

Keywords

Storage device; Paper tape; Punched cards; Disk storage direct access of data; Big data; Interfacing corporate data and big data

One of the aspects of architecture is to provide a high-level perspective. For a high-level perspective, data architecture looks like the diagram seen in Fig. 8.4.1.

Fig. 8.4.1
Fig. 8.4.1 A high level architecture.

A High Level Perspective

Fig. 8.4.1 shows representative components. For example, on the left-hand side where there are cathode ray tubes (CRTs) emanating from an application, the diagram is representative of online transaction processing systems. In reality, there are MANY applications and MANY databases represented by the application, database, and CRTs.

The diagram shows that there are two major types of big data—repetitive data and nonrepetitive data. And of the repetitive data, there is simple repetitive data and context-enriched repetitive data.

The typical sources of the different types of big data are shown as well.

The diagram shows that repetitive data are distilled into data that can be placed into the analytic data warehouse environment. In addition, nonrepetitive data can be disambiguated and placed either in the data warehouse or back into big data as context-enriched repetitive big data.

Redundancy

There are many issues raised by the diagram. One of the issues is that of redundant data. One looks at the diagram, and it appears that there is redundant data everywhere.

In fact, there is data that have been transformed. And if a value of data remains the same after transformation, then you may want to consider the data to be redundant. Then again, you may not.

Consider redundancy in the real world. Take the time of day. You can find the time of day on the Internet, on the telephone, on the radio, on television, and many other places, for that matter. Does the fact that time of day appears redundantly in many places becomes a bother? The only time it becomes a bother is if there is no way to determine what the accurate time is. If there were no definitive source of time, then having time appear redundantly would be a problem. But as long as there is some definitive source somewhere and as long as most redundant sources adhere to that definitive source, then there is no problem. In fact, having redundant sources of time is actually quite helpful, as long as there is no problem with the integrity of that time.

Therefore, having redundant data across the enterprise as seen in Fig. 8.4.1 is not an issue as long as the integrity of the data is not an issue.

The System of Record

The integrity of the data in data architecture is established by what can be called the “system of record.” The system of record is the one place where the value of data is definitively established. Note that the system of record applies only to detailed granular data. The system of record does not apply to summarized or derived data.

In order to understand the system of record, think of a bank and your bank account balance. For every account in every bank, there is a single system of record for account balance. There is one and only one place where the account balance is established and managed. Your bank account balance may appear in many places throughout the bank. But there is only one place where the system of record is kept.

The system of record moves throughout the data architecture that has been described.

Fig. 8.4.2 depicts the movement of the system of record.

Fig. 8.4.2
Fig. 8.4.2 The system of record.

Fig. 8.4.2 shows that as data are captured, especially in the online environment, the data have its first occurrence of the system of record. Location 1 shows that the system of record for current valued data is found in the online environment. You can think of calling the bank and asking for your account balance that exists right now, and the bank looks into its online transaction processing environment to find your account balance right now.

Then one day, you have an issue with a bank transaction that occurred 2 years ago. Your lawyer requires you to go back and prove that you made a payment 2 years ago. You can’t go to your online transaction processing environment. Instead, you go to your record in the data warehouse. As data age, the system of record moves for older data to the data warehouse. That is location 2 in the diagram.

Time passes and you get audited by the IRS. This time, you have to go back 10 years time to prove what financial activity you have had a decade ago. Now, you go to the archival store in big data. That is location 3 in the diagram.

So, as time passes, the system of record for data changes in data architecture.

Different Types of Questions

Another way to look at the data found in data architecture is in terms of what types of questions are answered in different parts of the architecture.

Fig. 8.4.3 shows that different types of questions are answered in different parts of the architecture.

Fig. 8.4.3
Fig. 8.4.3 Answering different questions throughout the architecture.

Fig. 8.4.3 shows that in location 1, details up to the second questions are answered. Here is where you ask up to the second accurate account balance information. Location 2 indicates that in the data warehouse, you look at your historical activity that has been passed through your bank account.

Location 3 is the ODS. In the ODS, you find up to the second accurate integrated information. In the ODS, you look across information such as ALL your account information—your loans, your savings accounts, your checking account, your IRA, and so forth.

In location 4, there are the data marts. In the data marts is where bank management combines your account information with thousands of other accounts and looks at the information from the perspective of a department. One department looks at the data in the data marts from an accounting perspective. Another department looks at the data from the perspective of marketing and so forth.

There is yet another perspective of data afforded by the data found in location 5. In location 5, big data is found. There is deep history there and a variety of other data. The kinds of analysis that can be done in location 5 are miscellaneous and diverse.

Of course, the differences in data and the types of analysis that can be done are different for different industries. The example that has been used is of a bank for the purposes of making the example clear. But for other industries, there are other types of usage information.

Different Communities

Different communities use the information found in data architecture. In general, the clerical community uses information found in locations 1 and 2. Everyone uses the data found in location 3. The data warehouse serves as a cross roads for information throughout the organization. Different functional departments use the information found in location 4. And location 5 serves as an omnibus for the entire organization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset