Chapter 10.3

Analytics From Nonrepetitive Data

Abstract

Nonrepetitive analytics begins with the contextualization of the nonrepetitive data. Unlike repetitive data, the context of nonrepetitive data is difficult to determine. The context of nonrepetitive big data is determined by textual disambiguation. In textual disambiguation, there are algorithms that relate to stop word resolution, stemming, homographic resolution, in-line contextualization, taxonomy/ontology resolution, custom variable resolution, acronym resolution, and so forth. Nonrepetitive analytics is very relevant to business value. Some typical forms of nonrepetitive analytics include the analysis of medical records, warranty analysis, insurance claim analysis, and call center analysis.

Keywords

Nonrepetitive data; Textual disambiguation; Stemming; Stop word processing; Homographic resolution; Taxonomic resolution; Custom variable resolution; Acronym resolution; Inline contextualization

There is a wealth of information hidden in nonrepetitive data that is unable to be analyzed by traditional means. Only after the nonrepetitive data have been unlocked by textual disambiguation can analysis be done.

There are many examples of rich environments where there is a wealth of information in nonrepetitive data, such as the following:

  • E-mail
  • Call center
  • Corporate contracts
  • Warranty claims
  • Insurance claims
  • Medical records

But talking about the value of analysis of nonrepetitive data and actually showing the value are two different things. The world is not convinced until it sees concrete examples.

Call Center Information

Most corporations have call centers. A call center is a corporate function where the corporation staffs phone operators to have conversations with customers. With a call center, the consumer has a voice of the corporation with whom a conversation can be made. In many ways, the call center becomes the direct interface the consumer has to the corporation.

The conversations that occur in the call center are many and diverse:

  • - Some people want to complain.
  • - Some people want to buy something.
  • - Some people want product information.
  • - Some people just want to talk.

There is then a wealth of information that transpires in the conversations that corporations with their customer or prospect base.

So, what does management of the corporation know about what takes place in their call center? The answer is that management knows very little about what transpires in the call center. At best, management knows how many calls occur daily and how long those calls are. But other than that, management knows very little about what is being discussed in their call center.

And why does management know so little about what takes place in the call center? The answer is that management needs to look at conversations and conversation is nonrepetitive data. And—prior to textual disambiguation—the computer cannot handle nonrepetitive data for the purposes of analytic processing.

However, with textual disambiguation, organizations can now start to understand the content of what is being discussed in call center conversations.

Fig. 10.3.1 shows the first step in doing analytics against telephone conversations.

Fig. 10.3.1
Fig. 10.3.1 Converting conversations into electronic text.

The first step in analyzing conversations is to capture the conversations. Recording conversations is an easy thing to do. You just get a tape recorder and record (and make sure you are not breaking a law in doing so!)

After the conversation is recorded, the next step is to use voice recognition technology to convert the conversation to an electronic form. Voice transcription technology is not perfect. There are accents that need to be accounted for. There is slurred speech. There are people that talk really softly. There are angry people. In the best of circumstances, voice to text transcription is not a perfect science.

But if enough people speak where their words can be understood, then voice transcription works adequately.

Once the voice recordings have been recorded and transcribed, a wealth of information opens up to the analyst.

Fig. 10.3.2 depicts the world that has opened up.

Fig. 10.3.2
Fig. 10.3.2 Wealth of information in electronic text.

The first step in unlocking the information found in the call center conversations is mapping the transcriptions. Mapping is the process of defining to textual disambiguation how to interpret the conversations. Typical mapping activities include the following:

  • Editing of stop words
  • Identification of homographs
  • Identification of taxonomies
  • Acronym resolution

While mapping must be done, the mapping that is created on day 1 can be used until day n. In other words, mapping is a onetime only activity. The mapping done the first day can be used thereafter. The analyst only has to do mapping once.

Fig. 10.3.3 shows that mapping is done from the transcriptions.

Fig. 10.3.3
Fig. 10.3.3 Before text can be processed it must be mapped.

Once mapping is done, textual disambiguation is ready to process the transcriptions. The input to textual disambiguation is the raw text, the mapping, and taxonomies. The output from textual disambiguation is an analytic database. The analytic database is in the form of any standard database that is used for analytic processing. By the time the analyst gets his/her hands on the database, it appears to be just like any other database the analyst has ever processed. The only difference is that the source of data for this database is nonrepetitive text.

Fig. 10.3.4 shows the processing that occurs inside textual disambiguation.

Fig. 10.3.4
Fig. 10.3.4 Transforming text into a data base.

The output of textual disambiguation is a standard database, often thought of as being in the form of relational data. In many ways, the database that has been produced has text that has been “normalized.” There are business relationships that are buried in the database. These business relationships are a result of the mapping and the text that has been interpreted by the mapping.

Fig. 10.3.5 shows the database that has been produced.

Fig. 10.3.5
Fig. 10.3.5 Text has been transformed into a standard database.

After the database has been created by textual disambiguation, the next step is the selection of an analytic tool (or tools). Depending on the analysis to be done, it may be necessary to choose more than one analytic tool for analysis.

The analytic tool that is chosen only has to be able to process relational data. That is the only requirement for the analytic tool.

Fig. 10.3.6 shows that an analytic tool needs to be selected.

Fig. 10.3.6
Fig. 10.3.6 An analytical tool needs to be selected.

After the analytic tool has been selected, then analysis can commence. The analyst takes the data derived from the database that was derived from the transcriptions and does the analysis.

(NOTE: the following analysis was done by Chris Cox, of Boulder Insights, Boulder, Colorado, in Tableau.)

Each analytic tool has its favored method of presenting data. In this case, Tableau was used, and a dashboard was created.

Fig. 10.3.7 shows a dashboard created for analyzing the call center information.

Fig. 10.3.7
Fig. 10.3.7 A dashboard showing what is going on in the call center.

The dashboard reflects the content of the activity that has transpired within the call center. With the dashboard, the analyst can see the following:

  • - When activities were processed
  • - What kind of activities were processed
  • - The actual content of calls
  • - The demographics of what was discussed
  • - And so forth

The dashboard gives a wealth of information that is organized and is graphical. In a glance, management can see what is transpiring in the call center.

As an example of the information contained in the dashboard, consider Fig. 10.3.8.

Fig. 10.3.8
Fig. 10.3.8 Ranking the calls by call type.

In Fig. 10.3.8, the diagram is a synopsis of the type of call that has passed through the call center. Each call is categorized as to what the major purpose of the call was. Then, the calls are ranked as to how many of which type occurred during the reporting period. If there were no other information on the dashboard, this information is extremely useful by itself.

Another type of information found on the dashboard is the information relating to what time of day the calls came in at. Fig. 10.3.9 shows this information.

Fig. 10.3.9
Fig. 10.3.9 Call center activity on an hour by hour basis.

Not only is the hour of day identified, but also the classification by the type of call is identified. It is worth noting that using the dashboard approach, drill down processing is a possibility. For each hour for each category of call, the analyst can invoke drill down processing to investigate more thoroughly each class of call that came in during any given hour.

A related type of information that is available is the type of phone call by day of the week. This type of information is seen in Fig. 10.3.10.

Fig. 10.3.10
Fig. 10.3.10 Call center activity on a day by day basis.

And yet, another type of information that is available on the dashboard is information about the day of the month when calls occurred. Fig. 10.3.11 shows a “heat map” depicting the pattern of calls throughout the month.

Fig. 10.3.11
Fig. 10.3.11 Call center activity on a monthly basis.

But perhaps, the most useful information on the dashboard is the information shown by Fig. 10.3.12. In Fig. 10.3.12, it is seen in the form of a histogram that the actual subjects were discussed during call center activity. The most discussed subject has the black box that is largest. The next most discussed subject is the next largest box.

Fig. 10.3.12
Fig. 10.3.12 A histogram showing the subjects discussed during call center processing.

By looking at the histogram, the management has a very good idea what subjects are on the mind of their customer base.

Looking at the dashboard tells management in a glance what management needs to know about what is going on in the call center.

As impressive as the dashboard is, the dashboard would not be possible without the data being placed in a standard database.

There is a progression of processing and data that makes possible the creation of the dashboard. That progression looks like the following:

  • Repetitive data → mapping → textual ETL → standard database → analytic tool → dashboard

Medical Records

Call center records are important and are at the center of business value. But call center records are hardly the only form of nonrepetitive records that are valuable. Another form of valuable nonrepetitive data is medical records. Medical records are written usually as a patient goes through a procedure or some event of medical care. The records—once written—are valuable to many people and organizations, to the physician, to the patient, to the hospital or provider, to research organizations, and more.

The challenge with medical records is that they contain narrative information. Narrative information is necessary and useful to the physician. But narrative information is not useful to the computer. In order to be used in analytic processing, the narrative information must be put into the form of a database in a standard database format.

This is a classical case of nonrepetitive data being placed in the form of a database. What is needed is textual ETL.

In order to see how textual ETL is used, consider a medical record. (NOTE: the medical record being shown is a real record. However, it is from a country other than the United States and is not subject to the regulations of HIPAA.)

When looking at medical records, the records start to take a recognizable pattern. The first part of the medical record is the identification part. In this part of the record, one or more identifying criteria are found (Fig. 10.3.13).

Fig. 10.3.13
Fig. 10.3.13 A medical record.

In the second part of the medical record, there is narrative information. In the narrative section, some doctor or nurse has written down some characterization of a medical event—a diagnosis, a procedure, an observation, and so forth.

In the third section of the medical record are lab results that are relevant to the reason why the patient is in medical care.

Fig. 10.3.14 shows a typical medical record.

Fig. 10.3.14
Fig. 10.3.14 Narrative in the medical record.

In a medical record, there is a narrative every time a medical event occurs. Fig. 10.3.15 shows that there is more than one narrative section relating to a patient's visit in the hospital.

Fig. 10.3.15
Fig. 10.3.15 Each episode of care has its own narrative.

The techniques used in processing the medical record include all the ways that textual ETL can process text.

Fig. 10.3.16 shows some of the ways that medical records are processed.

Fig. 10.3.16
Fig. 10.3.16 Different words are treated differently by textual ETL.

The result of textual ETL processing the medical record is a normalized database.

Fig. 10.3.17 shows the normalized textual-based database that has resulted from textual ETL processing a medical record.

Fig. 10.3.17
Fig. 10.3.17 A word and its context.

Once the text has been placed in a standard relational database, it is useful for analytic processing. Now, millions of medical records can be analyzed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset