List of Figures
Chapter 1. Beyond reporting: business analytics
Figure 1.1. Mondrian is the analytics engine for the business application.
Figure 1.2. Orders by city
Figure 1.3. Orders by customer
Figure 1.4. Orders by city for USA
Figure 1.5. Drag-and-drop analysis
Figure 1.6. State-level orders
Figure 1.7. Customer-level orders
Figure 1.8. Filtered data
Figure 1.9. Execution of an analytics query
Figure 1.10. Increased performance with Mondrian
Chapter 2. Mondrian: a first look
Figure 2.1. Mondrian running in Pentaho
Figure 2.2. Pentaho login page
Figure 2.3. Pentaho User Console (PUC)
Figure 2.4. A Pentaho report: Product Sales Report
Figure 2.5. CDF: Product Sales by Month
Figure 2.6. Saiku: Product Sales by Year
Figure 2.7. Interactive analysis with Saiku
Figure 2.8. Save a report
Figure 2.9. Results showing comparison to same quarter a year ago
Figure 2.10. Sales schema
Figure 2.11. Getting data to the analyst
Figure 2.12. Normalized data vs. star schemas
Figure 2.13. Loading the data warehouse using ETL
Figure 2.14. Using Kettle for ETL
Chapter 3. Creating the data mart
Figure 3.1. Analytic architecture overview: data is copied (and enriched) from the source systems to a dedicated analytic
environment, which is where users (via Mondrian) access analytic data.
Figure 3.2. Star schema. The fact contains the what you’re trying to measure—sales, and more specifically the column that
has the data to be aggregated, sales_amount. The dimensions are the by attributes that you’re trying to segment and allocate
the data to—customer, and more specifically the state the customer is from, customer_state.
Figure 3.3. Source data in the CRM system. A single record per customer in CRM_TABLE is uniquely identified by CRMID, with
sales transactions related to a customer being referenced through a foreign key in CRM_SALES.
Figure 3.4. Type I dimension tables. Notice that the foreign key from the fact table to the dimension table is the single-source
system identifier CRMID. Given that this is the primary key for the dimension table, it’s clear that there’s only one record
for Bob in the dimension.
Figure 3.5. Type II dimension tables.
Figure 3.6. Time dimension: all the attributes of a date are denormalized and presented as real textual values. Even though
extracting the month number from the date is possible, a denormalized and repetitive column (MonthNumberOfYear) is created
for easy and high-performance grouping and filtering for star queries.
Figure 3.7. Sales fact as a star. All supplier attributes are included as additional columns in the products dimension. All
country information is included in the customers dimension as additional columns. The database only needs to optimize one
join for the relevant information.
Figure 3.8. Sales fact as a snowflake. The relevant supplier information has been normalized out of products. The database
needs to join to the suppliers table from the products table to be able to aggregate and group by supplier attributes.
Figure 3.9. Modeled as standard dimensions using surrogate foreign key references and joins. The sales type dimension includes
only two records, as does channel dimension. The original order dimension includes nearly as many rows as the fact table does
because every new order also needs a new record in the order dimension.
Figure 3.10. Channel, sales type, and original order included as degenerate dimensions in the fact table. This makes sense
for small, single-attribute dimensions, where it’s beneficial to eliminate management for a separate dimension and eliminate
unnecessary joins to very small (or very large in the case of the original order) dimensions.
Figure 3.11. Multiple unrelated degenerate dimensions (channel, sales type) can be consolidated into a junk dimension. This
provides the performance benefits of a small, single integer in the fact table and reduces the size of the fact table. Note
that the high-ordinality degenerate dimension (original order) isn’t moved into the junk dimension because to do so doesn’t
make sense.
Chapter 4. Multidimensional modeling: making analytics data accessible
Figure 4.1. Sales schema mapped to an XML schema outline
Figure 4.2. Report showing localized caption
Figure 4.3. Logical and physical schemas
Figure 4.4. Physical schema of the Sales data mart
Figure 4.5. Hierarchical structure of a Mondrian schema
Figure 4.6. Members of the Customers hierarchy
Chapter 5. How schemas grow
Figure 5.1. Star schema with unit and store Sales, Forecast, and Inventory measure groups
Figure 5.2. Schema with star, snowflake, and degenerate dimensions
Chapter 6. Securing data
Figure 6.1. Security grants within security grants
Figure 6.2. Topand bottom-level restrictions
Chapter 7. Maximizing Mondrian performance
Figure 7.1. Performance improvement process
Figure 7.2. Performance test environment
Figure 7.3. Evaluate the database
Figure 7.4. Evaluate Mondrian
Figure 7.5. Aggregate versus detail diagram
Figure 7.6. The different Mondrian caches
Figure 7.7. External segment cache architecture
Figure 7.8. Using Infinispan for the external segment cache
Figure 7.9. Using Memcached for the external segment cache
Figure 7.10. Community Distributed Caching architecture
Figure 7.11. Community Distributed Caching configuration
Figure 7.12. CDC cluster summary
Figure 7.13. Performance analysis process
Figure 7.14. XMLA cache results
Figure 7.15. Caching and the ETL workflow
Chapter 8. Dynamic security
Figure 8.1. Dynamic security process
Figure 8.2. Results showing values from the action sequence
Figure 8.3. Unfiltered data
Figure 8.4. Data filtered by region
Figure 8.5. No restriction on state
Figure 8.6. Restricting by state
Chapter 9. Working with Mondrian and Pentaho
Figure 9.1. Pentaho Analyzer
Figure 9.2. Multiple bar charts for each country
Figure 9.3. Select cube for analysis
Figure 9.4. Set report options
Figure 9.5. Filter numeric values
Figure 9.6. Filter standard dimension values
Figure 9.7. Filter time dimension values
Figure 9.8. Example of a stacked bar chart
Figure 9.9. Plotting data on a Geo Map chart
Figure 9.10. Plotting data as a chord chart
Figure 9.11. CDF pie chart
Figure 9.12. Blank report
Figure 9.13. Choose OLAP data source
Figure 9.14. Entering the OLAP settings
Figure 9.15. Populated report template
Figure 9.16. Report with data
Figure 9.17. Defining the territory parameter
Figure 9.18. The report with a territory parameter
Figure 9.19. Adding a global script
Figure 9.20. PDI view
Figure 9.21. Database connection information
Figure 9.22. Adding a Mondrian input
Figure 9.23. Setting Mondrian values
Figure 9.24. Results of query
Chapter 10. Developing with Mondrian
Figure 10.1. Mondrian can be used from web and desktop clients via XMLA, and it can be embedded in Java applications.
Figure 10.2. Exchanging XMLA messages via SOAP
Figure 10.3. Simple thin client for XMLA queries
Figure 10.4. SOAP message exchange for discovery
Figure 10.5. Table of query results
Figure 10.6. Table of query results
Figure 10.7. xmla4js query results
Chapter 11. Advanced analytics
Figure 11.1. [Gross Profit] results
Figure 11.2. [Gross Profit] at [Product Department] level
Figure 11.3. [Gross Profit Margin]
Figure 11.4. Sales percentage of total table
Figure 11.5. Sales percentage of total chart
Figure 11.6. [Prior Sales] results
Figure 11.7. Prior Period chart
Figure 11.8. [YTD Sales] results
Figure 11.9. Year-to-date chart
Figure 11.10. Fixed-goal table
Figure 11.11. Fixed-goal chart
Figure 11.12. Trend-line table
Figure 11.13. Trend-line chart
Figure 11.14. Ranking table
Figure 11.15. Saiku scenario start. This is the baseline data from Mondrian without any modification.
Figure 11.16. Saiku scenario change
Figure 11.17. Saiku scenario result
Figure 11.18. Weka clustering output
Figure 11.19. NoSQL plus database architecture