Chapter 4. Pentaho Business Analytics Tools

This chapter gives a quick overview of the business analytics life cycle. We will look at various tools such as Pentaho Action Sequence and Pentaho Report Designer, as well as the Community Dashboard Editor (CDE) and Community Dashboard Framework (CDF) plugins and their configuration, and get some hands-on experience of them.

The business analytics life cycle

There could be various steps while performing analytics on Big Data. Generally, there are three stages as depicted in the following diagram:

The business analytics life cycle

The following list gives a brief description of the three stages depicted in the preceding diagram:

  • Data Preparation: This stage involves activities from data creation (ETL) to bringing data on to a common platform. In this stage, you will check the quality of the data, cleanse and condition it, and remove unwanted noise. The structure of the data will dictate which tools and analytic techniques can be used. For example, if it contains textual data, sentiment analysis should be used, while if it contains structured financial data, perhaps regression via R analytics platform is the right method. A few more analytical techniques are MapReduce, Natural language processing (NLP), clustering (k-means clustering), and graph theory (social network analysis).
  • Data Visualization: This is the next stage after preparation of data. Micro-level analytics will take place here, feeding this data to the reporting engine that supports various visualization plugins. Visualization is a rapidly expanding discipline that not only supports Big Data but can enable enterprises to collaborate more effectively, analyze real-time and historical data for faster trading, develop new models and theories, consolidate IT infrastructure, or demonstrate past, current, and future datacenter performance. This is very handy when you are observing a neatly composed dashboard by a business analyst team.
  • Data Discovery: This will be the final stage where data miners, statisticians, and data scientists will use enriched data and using visual analysis they can drill into data for greater insight. There are various visualization techniques to find patterns and anomalies, such as geo mapping, heat grids, and scatter/bubble charts. Predictive analysis based on the Predictive Modeling Markup Language (PMML) comes in handy here. Using standard analysis and reporting, data scientists and analysts can uncover meaningful patterns and correlations otherwise hidden. Sophisticated and advanced analytics such as time series forecasting help plan for future outcomes based on a better understanding of prior business performance.

Pentaho gives you a complete end-to-end solution to execute your analytic plan. It helps modeling the data using its rich visual development environment (drag-and-drop supported data integration platform). It is so easy that BI experts and traditional IT developers can offer Big Data to their organization almost effortlessly. It runs natively across the Hadoop clusters for releveraging its distributed data storage and processing capabilities for unmatched scalability.

It analyzes data across multiple dimensions and sources. It has rich visualization and data exploration capabilities that give business users insight into and analysis of their data, which helps in identifying patterns and trends.

Tip

About 2.5 quintillion bytes of data is created every day and the count is doubling every year. Yet only 0.5 percent of that data is being analyzed!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset