List of Figures

Chapter 1. Beyond reporting: business analytics

Figure 1.1. Mondrian is the analytics engine for the business application.

Figure 1.2. Orders by city

Figure 1.3. Orders by customer

Figure 1.4. Orders by city for USA

Figure 1.5. Drag-and-drop analysis

Figure 1.6. State-level orders

Figure 1.7. Customer-level orders

Figure 1.8. Filtered data

Figure 1.9. Execution of an analytics query

Figure 1.10. Increased performance with Mondrian

Chapter 2. Mondrian: a first look

Figure 2.1. Mondrian running in Pentaho

Figure 2.2. Pentaho login page

Figure 2.3. Pentaho User Console (PUC)

Figure 2.4. A Pentaho report: Product Sales Report

Figure 2.5. CDF: Product Sales by Month

Figure 2.6. Saiku: Product Sales by Year

Figure 2.7. Interactive analysis with Saiku

Figure 2.8. Save a report

Figure 2.9. Results showing comparison to same quarter a year ago

Figure 2.10. Sales schema

Figure 2.11. Getting data to the analyst

Figure 2.12. Normalized data vs. star schemas

Figure 2.13. Loading the data warehouse using ETL

Figure 2.14. Using Kettle for ETL

Chapter 3. Creating the data mart

Figure 3.1. Analytic architecture overview: data is copied (and enriched) from the source systems to a dedicated analytic environment, which is where users (via Mondrian) access analytic data.

Figure 3.2. Star schema. The fact contains the what you’re trying to measure—sales, and more specifically the column that has the data to be aggregated, sales_amount. The dimensions are the by attributes that you’re trying to segment and allocate the data to—customer, and more specifically the state the customer is from, customer_state.

Figure 3.3. Source data in the CRM system. A single record per customer in CRM_TABLE is uniquely identified by CRMID, with sales transactions related to a customer being referenced through a foreign key in CRM_SALES.

Figure 3.4. Type I dimension tables. Notice that the foreign key from the fact table to the dimension table is the single-source system identifier CRMID. Given that this is the primary key for the dimension table, it’s clear that there’s only one record for Bob in the dimension.

Figure 3.5. Type II dimension tables.

Figure 3.6. Time dimension: all the attributes of a date are denormalized and presented as real textual values. Even though extracting the month number from the date is possible, a denormalized and repetitive column (MonthNumberOfYear) is created for easy and high-performance grouping and filtering for star queries.

Figure 3.7. Sales fact as a star. All supplier attributes are included as additional columns in the products dimension. All country information is included in the customers dimension as additional columns. The database only needs to optimize one join for the relevant information.

Figure 3.8. Sales fact as a snowflake. The relevant supplier information has been normalized out of products. The database needs to join to the suppliers table from the products table to be able to aggregate and group by supplier attributes.

Figure 3.9. Modeled as standard dimensions using surrogate foreign key references and joins. The sales type dimension includes only two records, as does channel dimension. The original order dimension includes nearly as many rows as the fact table does because every new order also needs a new record in the order dimension.

Figure 3.10. Channel, sales type, and original order included as degenerate dimensions in the fact table. This makes sense for small, single-attribute dimensions, where it’s beneficial to eliminate management for a separate dimension and eliminate unnecessary joins to very small (or very large in the case of the original order) dimensions.

Figure 3.11. Multiple unrelated degenerate dimensions (channel, sales type) can be consolidated into a junk dimension. This provides the performance benefits of a small, single integer in the fact table and reduces the size of the fact table. Note that the high-ordinality degenerate dimension (original order) isn’t moved into the junk dimension because to do so doesn’t make sense.

Chapter 4. Multidimensional modeling: making analytics data accessible

Figure 4.1. Sales schema mapped to an XML schema outline

Figure 4.2. Report showing localized caption

Figure 4.3. Logical and physical schemas

Figure 4.4. Physical schema of the Sales data mart

Figure 4.5. Hierarchical structure of a Mondrian schema

Figure 4.6. Members of the Customers hierarchy

Chapter 5. How schemas grow

Figure 5.1. Star schema with unit and store Sales, Forecast, and Inventory measure groups

Figure 5.2. Schema with star, snowflake, and degenerate dimensions

Chapter 6. Securing data

Figure 6.1. Security grants within security grants

Figure 6.2. Topand bottom-level restrictions

Chapter 7. Maximizing Mondrian performance

Figure 7.1. Performance improvement process

Figure 7.2. Performance test environment

Figure 7.3. Evaluate the database

Figure 7.4. Evaluate Mondrian

Figure 7.5. Aggregate versus detail diagram

Figure 7.6. The different Mondrian caches

Figure 7.7. External segment cache architecture

Figure 7.8. Using Infinispan for the external segment cache

Figure 7.9. Using Memcached for the external segment cache

Figure 7.10. Community Distributed Caching architecture

Figure 7.11. Community Distributed Caching configuration

Figure 7.12. CDC cluster summary

Figure 7.13. Performance analysis process

Figure 7.14. XMLA cache results

Figure 7.15. Caching and the ETL workflow

Chapter 8. Dynamic security

Figure 8.1. Dynamic security process

Figure 8.2. Results showing values from the action sequence

Figure 8.3. Unfiltered data

Figure 8.4. Data filtered by region

Figure 8.5. No restriction on state

Figure 8.6. Restricting by state

Chapter 9. Working with Mondrian and Pentaho

Figure 9.1. Pentaho Analyzer

Figure 9.2. Multiple bar charts for each country

Figure 9.3. Select cube for analysis

Figure 9.4. Set report options

Figure 9.5. Filter numeric values

Figure 9.6. Filter standard dimension values

Figure 9.7. Filter time dimension values

Figure 9.8. Example of a stacked bar chart

Figure 9.9. Plotting data on a Geo Map chart

Figure 9.10. Plotting data as a chord chart

Figure 9.11. CDF pie chart

Figure 9.12. Blank report

Figure 9.13. Choose OLAP data source

Figure 9.14. Entering the OLAP settings

Figure 9.15. Populated report template

Figure 9.16. Report with data

Figure 9.17. Defining the territory parameter

Figure 9.18. The report with a territory parameter

Figure 9.19. Adding a global script

Figure 9.20. PDI view

Figure 9.21. Database connection information

Figure 9.22. Adding a Mondrian input

Figure 9.23. Setting Mondrian values

Figure 9.24. Results of query

Chapter 10. Developing with Mondrian

Figure 10.1. Mondrian can be used from web and desktop clients via XMLA, and it can be embedded in Java applications.

Figure 10.2. Exchanging XMLA messages via SOAP

Figure 10.3. Simple thin client for XMLA queries

Figure 10.4. SOAP message exchange for discovery

Figure 10.5. Table of query results

Figure 10.6. Table of query results

Figure 10.7. xmla4js query results

Chapter 11. Advanced analytics

Figure 11.1. [Gross Profit] results

Figure 11.2. [Gross Profit] at [Product Department] level

Figure 11.3. [Gross Profit Margin]

Figure 11.4. Sales percentage of total table

Figure 11.5. Sales percentage of total chart

Figure 11.6. [Prior Sales] results

Figure 11.7. Prior Period chart

Figure 11.8. [YTD Sales] results

Figure 11.9. Year-to-date chart

Figure 11.10. Fixed-goal table

Figure 11.11. Fixed-goal chart

Figure 11.12. Trend-line table

Figure 11.13. Trend-line chart

Figure 11.14. Ranking table

Figure 11.15. Saiku scenario start. This is the baseline data from Mondrian without any modification.

Figure 11.16. Saiku scenario change

Figure 11.17. Saiku scenario result

Figure 11.18. Weka clustering output

Figure 11.19. NoSQL plus database architecture

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset