Image241123.jpg

Chapter 12
Business Value in the Data Ponds

At the end of the day, if the data lake and data ponds do not provide business value then they will not be supported by the organization for very long. Interestingly, the different data ponds do have potential for providing business value. But the value provided by each data pond and the way that business value is provided are very different.

Business Value in the Analog Data Pond

The analog data pond can provide business value in one of two manners. There can be a handful of records that are found or there can be patterns of data that are developed across a vista of many records of data.

Consider a company that manufactures airbags for cars. If an airbag malfunctions, there can be very serious consequences. Suppose an accident occurs where an airbag does not go off. The accident investigator finds the manufacturer of the airbag. Then the investigator determines that the airbag was manufactured in March 1995 at the Phoenix, Arizona facility. The company now looks back into their analog data and finds all other airbags that were manufactured in March and April of 1995 and alerts the owners of the cars that have these airbags to have their airbags checked, thus avoiding a potentially serious consequence. In this case, the analog data was examined to find a handful of records that had potentially very serious consequences.

Another business value of analog data is looking across large vistas of data in a hurry. One day, management wishes to rethink the way an airbag is manufactured because there is a new technology that triggers an airbag more efficiently and safely. The manufacturer looks at a massive amount of analog data to determine just how many airbags there are with the older firing mechanism. Fig 12.1 shows these two business values of the usage of analog data.

Image251163.jpg

Fig 12.1 Benefiting from analog data

As another example of finding a few valuable records, consider telephone call record detail records. One day the government finds telephone calls between terrorists. There may be millions and millions of telephone call detail records, but only a handful of those are from terrorists. There is no question of the value of being able to identify terrorists and thereby preventing acts of terrorism. In this case, many, many records are examined in the hopes of finding just a handful of records.

Looking across vistas of data is a different matter altogether. Instead of looking for a few points out of many, the analyst is looking for patterns of data which are manifest across many, many records. As an example of looking for patterns, the analyst may find that certain equipment starts to malfunction or function in a less than accurate fashion towards the end of the month. Upon further investigation, it is found that maintenance to equipment is done on the first of the month. By month’s end, the machinery needs to be recalibrated and cleaned. This important pattern of data is detected not just by scouring records, but by using their metaprocess information in conjunction with the records themselves. Fig 12.2 shows the types of business value that can be derived from the analog data pond.

Image251171.jpg

Fig 12.2 Types of business value derived from the analog data pond

Business Value in the Application Data Pond

Finding business value from the application data pond is a different proposition. Some typical examples of finding business value are locating a particular receipt or the determination of the average cost of shipments for 1999.

Suppose the organization is going through an audit and they are looking for documentation from a previous year. The document is needed to prove to an auditor an expense item. The operational systems only go back three years, but the audit is for five years ago. The organization looks to its application data pond to find the receipt. In this case, there was a search across many documents in the hope of finding just one. In another circumstance, management thinks that shipment costs are rising too quickly. In order to get a historical perspective on costs, management goes back to 1999 to calculate shipment costs. They find those shipment costs in the application data pond. In order to determine annual shipment costs, a calculation must be done using many, many documents. Fig 12.3 shows the type of business value that can be derived from the application data pond.

Image251179.jpg

Fig 12.3 Types of business value derived from the application data pond

Business Value in the Textual Data Pond

Yet a third type of business value can be derived from the textual data pond. Suppose that a price has been agreed upon for an order. However, the only documentation is in writing – in a paper letter. The organization searches the entire textual data pond in order to find one document.

Yet another kind of business value that can be derived from the textual data pond is determining customer sentiment. Customer sentiment is expressed in many ways – through tweets, through emails, through other forms of narration.

The organization reads and stores these documents in their textual data pond, which then passes these documents through textual disambiguation and creates a database that can be analyzed, making it easy to determine customer sentiment.

Customer sentiment is gauged by looking at many documents, reading and disambiguating the contents of the documents, and placing the results in a database, where analysis can be performed. Knowing customer sentiment is an extremely valuable thing for the business. Fig 12.4 depicts the business value that can be derived from the textual data pond.

Image251187.jpg

Fig 12.4 Types of business value derived from the textual data pond

Percent of Records That Have Business Value

Another interesting way of looking at business value provided by the different data ponds is through the percentage of records that have business value.

Some data points have records that have a very high percentage of business value. Other data have records with a very low percentage of data value. Consider telephone calls.

In the US each day, there are millions of telephone calls made. If a person were looking for telephone calls made by terrorists, it is safe to say that there are only a handful of relevant points. In fact, on any given day there may be no telephone calls made by terrorists. When you look at the percentage of terrorist telephone calls made each day versus the total number of calls, the percentage is very low. Perhaps the percentage is as low as .0000001%. And the same very low percentages of records holding business value hold true for such things as log tapes, click stream records, and lots of other data.

Now consider other types of data, like textual data. Textual data is gathered from places like call center conversations, customer feedback, and so forth. Each phone call represents a customer’s concerns or message. The content of each phone call has real business value.

For most of textual data, 100% of the data has business value. Admittedly, some phone conversations have more value than others. But every telephone conversation has some business value.

There is then a stark difference between the percentage of records that have business value in the data ponds. Fig 12.5 depicts these value differences.

Image251197.jpg

Image251205.jpg

Image251212.jpg

Fig 12.5 Understanding the value of data in the various types of data ponds

In Summary

There are two types of business value found in the data ponds. Most data value is found in a very small number of records, though high value data is often found in vast vistas of low value data.

In general, repetitive data has a very low percentage of records that contain business value, whereas non-repetitive records have a very high percentage of records that have business value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset