Pentaho, headquartered in Orlando, has a team of BI veterans with an excellent track record. In fact, Pentaho is the first commercial open source BI platform, which became popular quickly because of its seamless integration with many third-party software. It can comfortably talk to data sources: MongoDB, OLAP tools: Palo, or Big Data frameworks: Hadoop and Hive.
The Pentaho brand has been built up over the last 9 years to help unify and manage a suite of open source projects that provide alternatives to proprietary software BI vendors. Just to name, a few open source projects are Kettle, Mondrian, Weka, and JFreeReport. This unification helped to grow Pentaho's community and provided a centralized place. Pentaho claims that its community stands somewhere between 8,000 and 10,000 members strong, a fact that aids its ability to stay afloat offering just technical support, management services, and product enhancements for its growing list of enterprise BI users. In fact, this is how Pentaho mainly generates revenue for its growth.
For research and innovation, Pentaho has its "think tank", named Pentaho Labs, to innovate the breakthrough of Big Data-driven technologies in areas such as predictive and real-time analysis.
The core of business intelligence domain is always the underlined data. In fact, 70 years ago, they encountered the first attempt to quantify the growth rate of volume of data as "information explosion". This term first was used in 1941, according to Oxford English Dictionary. By 2010, this industrial revolution of data gained full momentum fueled by social media sites, and then scientists and computer engineers coined a new term for this phenomenon, "Big Data". Big Data is a collection of data sets, so large and complex that it becomes difficult to process with conventional database management tools. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. As of 2012, the limits on the size of data sets that are feasible to process in a reasonable amount of time was in the order of exabytes (1 billion gigabytes) of data.
Data sets grow in size partly because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, digital cameras, software logs, microphones, RFID readers, and so on, apart from scientific research data such as micro-array analysis. One EMC-sponsored IDC study projected nearly 45-fold annual data growth by 2020!
So with the pressing need for software to store this variety of huge data, Hadoop was born. To analyze this huge data, the industry needed an easily manageable, commercially viable solution, which integrates with these Big Data software. Pentaho has come up with a perfect suite of software to address all the challenges posed by Big Data.
Pentaho is a trailblazer when it comes to business intelligence and analysis, offering a full suite of capabilities for the ETL (Extract, Transform, and Load) processes, data discovery, predictive analysis, and powerful visualization. It has the flexibility of deploying on premise, in cloud, or can be embedded in custom applications.
Pentaho is a provider of a Big Data analytics solution that spans data integration, interactive data visualization, and predictive analytics. As depicted in the following diagram, this platform contains multiple components, which are divided into three layers: data, server, and presentation:
Let us take a detailed look at each of the components in the previous diagram.
This is one of the biggest advantages of Pentaho; that it integrates with multiple data sources seamlessly. In fact, Pentaho Data Integration 4.4 Community Edition (referred as CE hereafter) supports 44 open source and proprietary databases, flat files, spreadsheets, and more out of box third-party software. Pentaho introduced Adaptive Big Data Layer as part of the Pentaho Data Integration engine to support the evolution of the Big Data stores. This layer accelerates access and integration to the latest version and capabilities of the Big Data stores. It natively supports third-party Hadoop distributions from MapR, Cloudera, Hortonworks, as well as popular NoSQL databases such as Cassandra and MongoDB. These new Pentaho Big Data initiatives bring greater adaptability, abstraction from change, and increased competitive advantage to companies facing the never-ceasing evolution of the Big Data ecosystem. Pentaho also supports analytic databases such as Greenplum and Vertica.
The Pentaho Administration Console (PAC) server in CE or Pentaho Enterprise Console (PEC) server in EE (Enterprise Edition) is a web interface used to create, view, schedule, and apply permissions to reports and dashboards. It also provides an easy way to manage security, scheduling, and configuration for the Business Application Server and Data Integration Server along with repository management. The server applications are as follows:
The Thin Client Tools all run inside Pentaho User Console (PUC) in a web browser (such as Internet Explorer, Chrome, or Firefox). Let's have a look at each of the tools:
Let's take a quick look at each of these tools:
.ppt
template to process reports. This file is in a ZIP format with XML resources to define the report design.xactions
and other forms of automation in the platform. Action sequences define a lightweight, result-oriented business flow within the Pentaho BA Server.