Design your first Hadoop dashboard

Download any sample big data files or extract logs from systems using Flume and so on. For the purpose of the book we will be downloading the dataset from the following URL:

http://www.seanlahman.com/?s=lahman591-csv.zip

Extract the ZIP file.

Upload the data file to HDFS by following these steps:

  1. Navigate to the HDFS files directory from the Hortonworks web interface.

    Design your first Hadoop dashboard

  2. Navigate to /usr/maria_dev and click on the Upload button.
  3. Click on the Browse button, navigate to the location where we extracted the downloaded ZIP file, and select the batting.csv file.
  4. Now, open a Hive view by clicking on the Hive View button.
  5. In this view, create a table to hold the data by executing the following command:
          create table intermediate_batting (col_value STRING);
    
    
  6. Upon execution of the query, we can view the intermediate_batting table under default databases.
  7. Execute the following command to load the batting.csv data file into the intermediate_batting table:
          Load data inpath '/user/maria_dev/Batting.csv' overwrite into table
          intermediate_batting;
    
    

  8. Create a table called batting using the following command:
    create table batting (player_id STRING, year INT, runs INT);
    

Extract data from the intermediate_batting table to the batting table using the following commands:

insert overwrite table batting  
SELECT  
    regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) player_id,  
    regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) year,  
    regexp_extract(col_value, '^(?:([^,]*),?){9}', 1) run  
from intermediate_batting;

Now, that we have the table in Hadoop we can start creating a MicroStrategy report based on this as:

  1. Select the table from the list of available tables.
  2. Double click on table selected and click Finish (this step also let user prepare their data).

This gives you two data access options, as follows:

  • Connect Live allows users to select data directly from the data source
  • Import as an in-memory dataset allows users to access data based on the stored results.

Design your first Hadoop dashboard

Select Connect Live and create a dashboard based on the data imported.

Design your first Hadoop dashboard

Data wrangling

With MicroStrategy 10, users have the ability to prepare data. In the previous section, when we were creating a dashboard using data from Hadoop, we were presented with the step of data preparation, or data wrangling, which allows business users to explore the data to improve its quality before it is imported to MicroStrategy. Example of data preparation include:

  • Removing white spaces
  • Concatenating columns
  • Deleting cells with null values

The following screenshot presents data wrangling:

Data wrangling

So, even if the user is exporting data from any source, they can still prepare it without ETL and data modeling.

Data wrangling

So, let's say we have data loaded from a source to store coordinates in one column, but we want to have two separate columns to store this data. We can do it using data wrangling.

The following screenshot shows data loaded from source:

Data wrangling

Use the data wrangle functionality to prepare data for reporting:

Data wrangling

Output columns will be displayed as follows:

Data wrangling

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset