|
There are a variety of tools that support the data lake/data pond environment. Each provides a different functionality that is needed in the data lake/data pond environment. Some of the most prominent tools will be mentioned here.
Visualization
Visualization is the technology that takes data (usually in a relational format), organizes and displays the data. By turning details in a database into a visualization, the organization can immediately see patterns and trends that would not otherwise be obvious. Visualization is especially useful to non-technical management.
In many cases, management cannot understand what is being said unless the data is visualized.
Visualization technology can organize data in a variety of forms. Visualization can create Pareto charts, pie charts and scatter charts, among other forms of visualization.
In order to be effective, the data going into a visualization needs to be organized into a database format first. Most visualization technology requires that the data it operates on be stored in a relational database format. Fig 14.1 shows some visualizations.
Fig 14.1 Visualizing data that is sourced from a relational database
Search and Qualify
Another useful and sophisticated technology is search and qualify technology. Some search technology is quite simple, whereas others are very sophisticated. Search and qualify technology can do sophisticated searches where data may be less than optimally organized, such as against textual data.
One of the sophisticated forms of search technology is the machine learning and concept search technology. In the machine learning and concept search technology, textual documents can be read and qualified. The qualification of the documents is done in an extremely sophisticated manner.
Suppose that a company had an account code named “rawhide.” Search and qualify technology makes the term rawhide stand out because when mentioned, there never are terms that are normally associated with leather found near rawhide. There is no mention of saddles, or ropes or Mexican riatas or any of the terms you might expect to be associated with real rawhide. Instead, rawhide is a term that means something unique. Fig 14.2 shows search and qualify technology.
Fig 14.2 Searching and qualifying technology
Textual Disambiguation
A most useful technology in the textual data pond is the technology known as textual disambiguation. In textual disambiguation technology, raw textual narration is read and converted to a standard database format. In addition, the context of the text is identified and written along with the text. Textual disambiguation is complex technology. It deals with language and language is inherently complicated. For those organizations doing serious textual analysis, textual disambiguation is an absolute necessity. Fig 14.3 shows the role of textual disambiguation.
Fig 14.3 Applying textual disambiguation
Statistical Analysis
Statistical analysis is another technology that is quite useful for reading masses of data and doing sophisticated statistical analysis of the data.
Statistical analysis entails not only the calculation of analytical numbers, but the graphical display of those numbers in a meaningful manner. Fig 14.4 depicts statistical analysis.
Fig 14.4 Applying statistical analysis
Classical ETL Processing
Classical ETL is useful for reading and integrating application data, and therefore the transformation process. Classical ETL processing reads application-based data and turns it into corporate data that has been integrated. Fig 14.5 shows classical ETL technology.
Fig 14.5 Understanding ETL technology
In Summary
There are several technologies which are helpful for building and supporting the data lake/data pond environment. Some of these technologies are: