Image241123.jpg

Chapter 14
Analytical and Integration Tools

There are a variety of tools that support the data lake/data pond environment. Each provides a different functionality that is needed in the data lake/data pond environment. Some of the most prominent tools will be mentioned here.

Visualization

Visualization is the technology that takes data (usually in a relational format), organizes and displays the data. By turning details in a database into a visualization, the organization can immediately see patterns and trends that would not otherwise be obvious. Visualization is especially useful to non-technical management.

In many cases, management cannot understand what is being said unless the data is visualized.

Visualization technology can organize data in a variety of forms. Visualization can create Pareto charts, pie charts and scatter charts, among other forms of visualization.

In order to be effective, the data going into a visualization needs to be organized into a database format first. Most visualization technology requires that the data it operates on be stored in a relational database format. Fig 14.1 shows some visualizations.

Image251360.jpg

Fig 14.1 Visualizing data that is sourced from a relational database

Search and Qualify

Another useful and sophisticated technology is search and qualify technology. Some search technology is quite simple, whereas others are very sophisticated. Search and qualify technology can do sophisticated searches where data may be less than optimally organized, such as against textual data.

One of the sophisticated forms of search technology is the machine learning and concept search technology. In the machine learning and concept search technology, textual documents can be read and qualified. The qualification of the documents is done in an extremely sophisticated manner.

Suppose that a company had an account code named “rawhide.” Search and qualify technology makes the term rawhide stand out because when mentioned, there never are terms that are normally associated with leather found near rawhide. There is no mention of saddles, or ropes or Mexican riatas or any of the terms you might expect to be associated with real rawhide. Instead, rawhide is a term that means something unique. Fig 14.2 shows search and qualify technology.

Image251367.jpg

Fig 14.2 Searching and qualifying technology

Textual Disambiguation

A most useful technology in the textual data pond is the technology known as textual disambiguation. In textual disambiguation technology, raw textual narration is read and converted to a standard database format. In addition, the context of the text is identified and written along with the text. Textual disambiguation is complex technology. It deals with language and language is inherently complicated. For those organizations doing serious textual analysis, textual disambiguation is an absolute necessity. Fig 14.3 shows the role of textual disambiguation.

Image251376.jpg

Fig 14.3 Applying textual disambiguation

Statistical Analysis

Statistical analysis is another technology that is quite useful for reading masses of data and doing sophisticated statistical analysis of the data.

Statistical analysis entails not only the calculation of analytical numbers, but the graphical display of those numbers in a meaningful manner. Fig 14.4 depicts statistical analysis.

Image251385.jpg

Fig 14.4 Applying statistical analysis

Classical ETL Processing

Classical ETL is useful for reading and integrating application data, and therefore the transformation process. Classical ETL processing reads application-based data and turns it into corporate data that has been integrated. Fig 14.5 shows classical ETL technology.

Image251393.jpg

Fig 14.5 Understanding ETL technology

In Summary

There are several technologies which are helpful for building and supporting the data lake/data pond environment. Some of these technologies are:

  • Visualization
  • Search and qualify
  • Textual disambiguation
  • Statistical analysis
  • Classical ETL
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset