Data with no analysis has no value, and similarly, for analysis we need data. Therefore, the two terms, data and analytics, go together and form big data, Business Intelligence analytics. There are several factors that drive Business Intelligence (BI), and a few of these are:
Speed: Every organization wants to eliminate delays in getting processed information such as reports and dashboards in order to make early and quick business decisions.
Intelligence data: This means use of lots of data and information for predictive and proactive analysis. This data can be of different data types and from multiple sources.
Effectiveness: This helps manage costs by increasing productivity for a business.
Big data is a solution to many of the BI needs. Hadoop has become synonymous with big data; it is an open source software framework for processing large amounts of data and for distributive storage across large clusters of computers. Therefore, we can say there are several big data capabilities that bring Hadoop closer to BI, such as the following:
High retention of data
Additional data sources, structured or unstructured
Resilience to failure, that is, great fault tolerance
Reduction of the data transfer between data sources
So, in this chapter we will cover the following topics:
An overview of big data technologies
Hadoop and Splunk architecture
Connecting Hadoop and Splunk to MicroStrategy
Hadoop and MicroStrategy
First off, lets look at Hadoop.
Hadoop architecture
The Hadoop framework consists of two main layers:
Hadoop Distributed File System (HDFS)
Execution engine (MapReduce)
HDFS is a distributed file system that allows storage of a large volume of data across all the machines in a Hadoop cluster.
MapReduce is a programming model that is used to process the large volume of data that is stored in HDFS. It divides large tasks into smaller tasks and finally joins the smaller tasks together to provide a single result.
The following is the high-level architectural design of the Hadoop architecture:
MicroStrategy Analytics Platform over Hadoop
The following diagram shows how MicroStrategy and Hadoop are tied together:
Hadoop and MicroStrategy use cases
MicroStrategy is an analytics tool that uses data from Hadoop to perform analysis. There are several use cases that are difficult to implement by using a data warehouse.
Sample use cases are as follows:
Analysis of social media posts, pictures, videos, or information for customer retention, marketing, and so on
Analysis of sensor data for pricing auto insurance or health insurance
Analysis of web applications or mobile data logs from digital marketing for new product design and customer service
Genomic, DNA sequence analysis based on data from multiple sequencing technologies
Traffic analytics, that is, predictive congestion analysis and alternate route detection based on road segment geolocation data
Weather analytics is a use case of big data that could even be used for pricing catastrophic insurance
Example of log file analysis in a Hadoop system
Here:
Log files capture network or server operational data. In our example web server, logs are being collected and sent to HDFS.
Flume streams logs into Hadoop.
HDFS, as discussed previously, is a storage file system.
Pig is a platform that parses these log files into a structured format using various user-defined functions.
Hive defines schemata for this structured data, which is later stored in the Hive metastore.
MicroStrategy is a visualization tool that provides connectivity to the Hive server.
Configuring Hortonworks and MicroStrategy
Before we start the installation and integration of Hadoop with MicroStrategy, we need to set up a Hadoop environment. For this chapter we will be using our existing MicroStrategy environment and a virtual machine with a Hortonworks Hadoop distribution.
Steps:
Download a virtual box or VMware virtual machine from either of these links:
Open the .ovf file with VMware by accepting the default settings and clicking Import:
This will build up a Hortonworks VMware virtual machine
Before starting the Hortonworks appliance, change the network card settings by adding a new network card, as shown in the following screenshot:
Now we need to add this virtual machine to the same domain as our MicroStrategy. For that, do the following:
Click Edit | Virtual Network Editor | select the network card that was added in the previous step
Click DHCP Settings on the following screen:
Enter the starting and ending network address based on the MicroStrategy machine's network address and click OK.
Start the machine; after some time, the user will be presented with a URL to access the web interface.
In our case, the URL is: http://192.168.81.130:8080
Enter the username and password on this screen. The default username and password is maria_dev/maria_dev
Upon entering the username and password, the sample screen looks like the following screenshot:
Note
Note: To validate whether the IP address has been changed successfully, open the terminal window by pressing Alt + F5, in the case of Linux. In the terminal type the ifconfig command. The result should include the IP address that we changed previously in the DHCP settings.