Chapter 4.1

A Brief History of Big Data

Abstract

There are different definitions of big data. The definition used here is that big data encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman census” method, and stores data in an unstructured format. There are two major types of big data—repetitive big data and nonrepetitive big data. Only a small fraction of repetitive big data has business value, whereas almost all of nonrepetitive big data has business value. In order to achieve business value, the context of data in big data must be determined. Contextualization of repetitive big data is easily achieved. But contextualization of nonrepetitive data is done by means of textual disambiguation.

Keywords

Big data; Roman census method; Unstructured data; Repetitive data; Nonrepetitive data; Contextualization; Textual disambiguation

There are many ways to describe history. When it comes to describing parts of the history of computer science, one way to describe it is in terms of technology. Another way to describe it is in terms of organizations.

The way that we will describe a brief history of big data is from a marketing standpoint.

An Analogy—Taking the High Ground

Using an analogy to describe the history of big data and how things came to be is useful. The analogy that will be used is the military tactic of taking the high ground.

Fig. 4.1.1 shows that military tacticians have long known that taking the high ground was important in any military conflict.

Fig. 4.1.1
Fig. 4.1.1 The battlefield.

In Fig. 4.1.1, we see that an army has placed a cannon on top of a ridge, thus taking a position of command.

In many ways, the maneuvering of database technology has been the moral equivalent of taking the high ground. Whatever company has the DBMS that serves the largest amount of data is the company that enjoys a commanding advantage in the battlefield. In this case, the battlefield is the database marketplace, and the battle is over market share. How many customers have signed up for and are using the DBMS is the measurement of success in the battlefield.

There are other DBMS that do not use the volume of data that can be managed as their distinctive criteria. These DBMS have their own battlefield and their own criteria of success in the battlefield. The battlefield for big data however is a battle field whose hallmark is the management of the largest amount of data.

Taking the High Ground

The progression of events that has led up to big data is seen in Fig. 4.1.2.

Fig. 4.1.2
Fig. 4.1.2 A brief marketing history of big data.

In the early dawn of the computer industry, there were many computer systems, many applications, and many operating systems. There were many vendors, and choosing technology was a risky and painful task. There were many problems with the early systems. One of the primary problems was that there was no standardization—no standardization of languages, no standardization of operating systems, and no standardization of applications. Because there was no standardization of anything, everything had to be made on a customized basis. Furthermore, all that custom code had to be maintained on a custom code basis.

In short, in the early days, there was chaos.

Standardization With the 360

Then, IBM introduced the 360 line of processors. The IBM 360 was the first broad-scale successful attempt at standardization. With the IBM 360 when you wrote code, that code could be upgraded to a larger processor in the 360 line of products with little or no alteration of the code. Today, we take the interchangeability of software and systems for granted. But there once was a day when upgrading software and systems was a real headache.

Shortly after the IBM 360 was introduced, IBM introduced the information management system—IMS. IMS ran on the IBM 360 line of products. IMS was not the first DBMS. But IMS was the first DBMS that could run on standardized software. In addition, IMS was able to manage a large amount of data. (Note: large is an entirely relative number. The amount of data that IMS could process in its early years is miniscule to what can be processed today. But the volume of data that IMS could handle was significant for the day and age.)

IBM had recognized and had taken the high ground for large-scale, standardized database management with IMS. From a military standpoint, IBM enjoyed the high ground.

Online Transaction Processing

But in short order, it was discovered that other things than database management could be done with IMS. Not only could IMS manage databases, but also when you coupled a data communications (DC) component into the mix, that IMS coupled with a data communication monitor could do what is termed online transaction processing.

Now, IBM and IMS were positioned to do something that was dramatic. Now, IBM and IMS were positioned to start to engage in online transaction processing.

The dramatic thing about online transaction processing was that with online transaction processing, the computer could be ingrained very deeply into the fabric of the business. Prior to online transaction processing, the computer was able to enhance many business processes. But with the advent of online transaction processing, the computer could be woven into the day-to-day fabric of the operations of the corporation. Never before had the computer been an essential ingredient to the running of the business. With online transaction processing, the computer took on a role of importance never before envisioned.

With online transaction processing, the organization was able to build reservation systems—airline, car rental, and other reservation systems. With online transaction processing, there appeared online bank teller systems and ATMs. In a word, online transaction processing systems enabled a business to do what had heretofore been impossible.

At this point, IBM had a firm grip on the high ground of corporate processing.

Enter Teradata and MPP Processing

Enter the mix a company called Teradata. Teradata featured a database technology called massively parallel processing (MPP). With MPP database technology, Teradata could process significantly more data than IBM. The architecture of MPP technology was such that IBM's IMS-based technology simply could not keep pace when it came to processing volumes of data. Suddenly, Teradata took the high ground.

But Teradata's entrance into the marketplace was not an immediate and resounding success. IBM had very good account control and was able to resist Teradata's intrusion for a long time. But Teradata persevered, and after much marketing, much sales effort, and much technology advancement, Teradata began to win over clients. Now, Teradata was beginning to capitalize on the holding of the high ground.

Then Came Hadoop and Big Data

Almost innocently into the fray came Hadoop technology. Hadoop was a response to the need to handle even more data than Teradata. In actuality, the limit to Teradata's management of data was an economic limitation more than a technological limitation. But Hadoop was addressing the problem of optimizing a database management system on management of volumes of data, not on the ability to manage every field of data. There was a change in emphasis on the management of volumes of data from the management of units of data within the environment.

Hadoop was the heart of big data. With Hadoop's technology, big data went from a dream to a reality.

Hadoop catered to just a few large-scale clients with specialized needs. Hadoop and its associated vendors were satisfied with being a niche player in the marketplace even though Hadoop had entered into even higher ground than Teradata.

IBM and Hadoop

After Hadoop proved that it was a viable commodity, IBM recognized that by partnering with Hadoop, it could “piggyback” its way back to the higher ground. With the advent of big data, IBM has once again achieved the high ground of large-scale database management systems.

Holding the High Ground

The advantage of holding the high ground is of inestimable importance. So many opportunities fall open when the vendor has the high ground. The vendor is free to exploit hardware, software, consulting opportunities, and more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset