Chapter 1.4

Demographics of Corporate Data

Abstract

Corporate data include everything found in the corporation in the way of data. The most basic division of corporate data is by structured data and unstructured data. As a rule, there are much more unstructured data than structured data. Unstructured data have two basic divisions—repetitive data and nonrepetitive data. Big data is made up of unstructured data. Nonrepetitive big data has a fundamentally different form than repetitive unstructured big data. In fact, the differences between nonrepetitive big data and repetitive big data are so large that they can be called the boundaries of the “great divide.” The divide is so large that many professionals are not even aware that there is this divide. As a rule, nonrepetitive big data has MUCH greater business value than repetitive big data.

Keywords

Structured data; Unstructured data; Corporate data; Repetitive data; Nonrepetitive data; Business value; The great divide of data; Big data

It is one thing to understand that corporate data can be divided up into different categories. It is another thing to understand those categories in depth.

Fig. 1.4.1 shows one way how corporate data can be divided.

Fig. 1.4.1
Fig. 1.4.1 One way to look at corporate data.

In Fig. 1.4.1, it is seen that all data in big data are unstructured and that big data can be divided up into two major categories—repetitive unstructured data and nonrepetitive unstructured data. The diagram in Fig. 1.4.1 shows the major categorization of corporate data. But the diagram can be very misleading. Some corporations have a tremendous amount of repetitive unstructured data, and other corporations have no repetitive unstructured data at all.

A more realistic representation of the demographics of repetitive unstructured data is shown by Fig. 1.4.2.

Fig. 1.4.2
Fig. 1.4.2 The spectrum of ratios of data types.

In Fig. 1.4.2, it is seen that there is a wide spectrum of ratios of repetitive data to other types of data. From a demographic standpoint, some corporations have a preponderance of repetitive unstructured data, and other corporations have no repetitive unstructured whatsoever. And other corporations are somewhere between the two extremes.

The type of business has a great deal to do with exactly how much repetitive unstructured data there are (or are not). A typical scattering of repetitive ratios by type of business is shown in Fig. 1.4.3.

Fig. 1.4.3
Fig. 1.4.3 Different environments.

In Fig 1.4.3, it is seen that certain industries have a lot of repetitive unstructured data. Weather services, manufacturing, and public utilities are at the top of the list. These types of corporations have activities that generate a huge amount of repetitive unstructured data. On the other hand, small retailing organizations may have no repetitive unstructured data at all.

There is then a spectrum of ratios of repetitive unstructured data to other types of data depending on the business.

Another way to look at the same thing is to look at types of data. The spectrum of ratios is seen in Fig. 1.4.4.

Fig. 1.4.4
Fig. 1.4.4 Some other environments.

In Fig. 1.4.4, it is seen that when it comes to repetitive unstructured data, there are a lot of meteorologic data, a lot of analog data, and a lot of click stream data for some corporations.

While the demographics of repetitive unstructured data are an interesting way to view corporate data, there are other interesting perspectives as well. Another interesting perspective is from the perspective of business relevancy. Business relevancy refers to the usefulness of data in the decision-making process. Some corporate data are highly business-relevant, and other corporate data are not really relevant to the decision-making in the corporation at all.

How business relevancy relates to corporate data is seen in Fig. 1.4.5.

Fig. 1.4.5
Fig. 1.4.5 Business value across the different types of data.

In Fig. 1.4.5, it is seen that there really are three classes of business relevancy—business-relevant data, business-irrelevant data, and potentially business-relevant data.

Each of these categories of data deserves their own explanation.

The first category of data is that of structured data. Structured data are typically managed by a DBMS. Fig. 1.4.6 shows that all structured data are (at least potentially) business-relevant.

Fig. 1.4.6
Fig. 1.4.6 Business relevant data.

Much of structured data are available for online processing. And all elements of data in the structured environment are able to be located and accessed for processing. For this reason, all structured data are categorized as business-relevant data.

Consider an example. A customer walks into the bank and asks for a withdrawal of $500. The bank teller accesses the customers’ account and sees that there is a sufficient balance in the account. The bank teller then authorizes the withdrawal for $500. The data regarding the customers’ account have been used and are certainly business-relevant.

Now, consider the data in the structured database of the bank that are not being accessed by a bank teller. Are these data still business-relevant even though they are not being used? The answer is that the data are still business-relevant even though they are not being used. They are still business-relevant if they might be used.

That is why all structured data are considered to be business-relevant. Its actual usage has little to do with its business value. The data still have business value and relevancy even if they are not being actively used.

Now, consider the business relevancy of repetitive unstructured data. Fig. 1.4.7 shows that only a tiny fraction of repetitive unstructured data are business-relevant. A larger percentage of repetitive unstructured data are potentially business-relevant. And a significant portion of repetitive unstructured data are not business-relevant.

Fig. 1.4.7
Fig. 1.4.7 Business relevancy.

In order to understand the business relevancy of repetitive unstructured data, look at one of the many examples of repetitive unstructured data. Consider log tapes. When looking at a log tape, nearly, all the records on the log tape are meaningless to the business user. Only a few important records on the log tape may have direct business relevancy.

Or consider telephone call detail records. In a days' time, many, many records will be created. And suppose you are looking for phone calls relating to terrorism. Out of the millions and millions of phone calls made, only a handful will relate to activities of terrorism.

The same phenomenon is true of click stream data, analog data, metering data, and so forth. There do exist however records that are not directly business-relevant but are potentially business-relevant. These potentially business-relevant records are records that are not immediately useful to the business but are potentially useful under other circumstances.

Now, let's consider the business relevancy of nonrepetitive unstructured data. Nonrepetitive unstructured data are made up of records such as e-mail, call center data, conversations, and insurance claims. Fig. 1.4.8 depicts nonrepetitive unstructured data.

Fig. 1.4.8
Fig. 1.4.8 Business relevancy.

In nonrepetitive unstructured data, there are data such as spam, blather, and stop words. These types of data are not business-relevant. But much of the data found in the nonrepetitive unstructured category are business-relevant (or are at least potentially business-relevant).

Now, let's stop and take a look at the demographics of business relevancy as they relate to unstructured data (big data). Fig. 1.4.9 shows where business relevancy lies.

Fig. 1.4.9
Fig. 1.4.9 Business relevancy.

Fig. 1.4.9 shows that the vast majority of the business relevancy of big data lies in the realm of nonrepetitive unstructured data. There simply is relatively little business relevancy found in repetitive unstructured data.

This graphic perhaps explains why the early proponents of big data that focused almost entirely on repetitive unstructured data had such a difficult time establishing business relevancy for big data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset