9 Data Warehousing and Business Intelligence Management
Data warehouse and business intelligence management is the seventh Data Management Function in the data management framework shown in Figures 1.3 and 1.4. It is the sixth data management function that interacts with and is influenced by the Data Governance function. Chapter 9 defines the data warehousing and business intelligence management function and explains the concepts and activities involved in data warehousing and business intelligence management.
9.1 Introduction
A Data Warehouse (DW) is a combination of two primary components. The first is an integrated decision support database. The second is the related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources. Both of these parts combine to support historical, analytical, and business intelligence (BI) requirements. A data warehouse may also include dependent data marts, which are subset copies of a data warehouse database. In its broadest context, a data warehouse includes any data stores or extracts used to support the delivery of data for BI purposes.
An Enterprise Data Warehouse (EDW) is a centralized data warehouse designed to service the business intelligence needs of the entire organization. An EDW adheres to an enterprise data model to ensure consistency of decision support activities across the enterprise.
Data Warehousing is the term used to describe the operational extract, cleansing, transformation, and load processes—and associated control processes—that maintain the data contained within a data warehouse. The data warehousing process focuses on enabling an integrated and historical business context on operational data by enforcing business rules and maintaining appropriate business data relationships. Data Warehousing also includes processes that interact with meta-data repositories.
Data warehousing is a technology solution supporting Business Intelligence (BI). Business Intelligence is a set of business capabilities. BI means many things, including:
Data Warehousing and Business Intelligence Management (DW-BIM) is the collection, integration, and presentation of data to knowledge workers for the purpose of business analysis and decision-making. DW-BIM is composed of activities supporting all phases of the decision support life cycle that provides context, moves and transforms data from sources to a common target data store, and then provides knowledge workers various means of access, manipulation, and reporting of the integrated target data.
Figure 9.1 outlines the context of Data Warehousing and Business Intelligence Management.
Figure 9.1 DW and BI Management Context Diagram
Objectives for DW-BIM include:
9.2 Concepts and Activities
The purpose of this section is to provide some foundational DW-BIM concepts and definitions, before diving into the details of the specific DW-BIM activities. It presents a quick tour of the history of DW-BIM and an overview of typical DW-BIM components. An explanation of some general DW-BIM terminology follows, and a brief introduction and overview of dimensional modeling and its terminology leads into the activities identified in Figure 9.1.
9.2.1 Data Warehousing—A Brief Retrospective and Historical Tour
In a discussion of any length about data warehousing, two names will invariably come up—Bill Inmon and Ralph Kimball. Each has made significant contributions that have both advanced and shaped the practice of data warehousing. Here is a brief introduction to their major contributions along with some comparisons and contrasts of their approaches.
9.2.1.1 Classic Characteristics of a Data Warehouse—Inmon Version
In the early 1990’s Bill Inmon defined the data warehouse as “a subject oriented, integrated, time variant, and non-volatile collection of summary and detailed historical data used to support the strategic decision-making processes for the corporation.”
These key characteristics give a clear distinction of the nature of the data warehouse data compared to typical operational systems, and still hold largely true as distinctive characteristics of data warehouses.
9.2.1.2 Classic Characteristics of a Data Warehouse—Kimball Version
Ralph Kimball took a different approach, defining a data warehouse simply as “a copy of transaction data specifically structured for query and analysis.” The copy, to differentiate it from the operational system, has a different structure (the dimensional data model) to enable business users to understand and use the data more successfully, and to address DW query performance. Data warehouses always contain more than just transactional data—reference data is necessary to give context to the transactions. However, transactional data is the vast majority of the data in a data warehouse.
Dimensional data models are relational data models. They just do not consistently comply with normalization rules. Dimensional data models reflect business processes more simply than normalized models.
9.2.2 DW / BI Architecture and Components
This section introduces the major components found in most DW / BI environments by providing an overview of the big picture views presented by both Inmon and Kimball. First is the Corporate Information Factory, from Inmon. Second is Kimball’s approach, which he refers to as the “DW Chess Pieces”. Both views and their components are described and contrasted.
9.2.2.1 Inmon’s Corporate Information Factory
Inmon, along with Claudia Imhoff and Ryan Sousa, identified and wrote about components of a corporate data architecture for DW-BIM and called this the “Corporate Information Factory” (CIF). These components appear in tables following Figure 9.2.
Figure 9.2 The Corporate Information Factory
Table 9.1 lists and describes the basic components of the Corporate Information Factory view of DW / BI architecture.
Label – Name |
Description |
Raw Detailed data |
Operational / Transactional Application data of the enterprise. The raw detailed data provides the source data to be integrated into the Operational Data Store (ODS) and DW components. They can also be in database or other storage or file format. |
Integration and Transformation |
This layer of the architecture is where the un-integrated data from the various application sources stores is combined / integrated and transformed into the corporate representation in the DW. |
Reference Data |
Reference data was a precursor to what is currently referred to as Master Data Management. The purpose was to allow common storage and access for important and frequently used common data. Focus and shared understanding on data upstream of the Data Warehouse simplifies the integration task in the DW. |
Historical Reference Data |
When current valued reference data is necessary for transactional applications, and at the same time it is critical to have accurate integration and presentation of historical data, it is necessary to capture the reference data that was in place at any point in time. For more discussion on reference data, see Chapter 8 Master and Reference Data Management. |
Operational Data Store (ODS) |
The focus of data integration is meeting operating and classically operational reporting needs that require data from multiple operational systems. The main distinguishing data characteristics of an ODS compared to a DW include current-valued vs. DW historical data and volatile vs. DW non-volatile data. Note: ODS is an optional portion of the overall CIF architecture, dependent upon specific operational needs, and acknowledged as a component that many businesses omit. |
Operational Data Mart (Oper-Mart) |
A data mart focuses on tactical decision support. Distinguishing characteristics include current-valued vs. DW historical data, tactical vs. DW strategic analysis, and sourcing of data from an ODS rather than just the DW. The Oper-Mart was a later addition to the CIF architecture. |
Data Warehouse (DW) |
The DW is a large, comprehensive corporate resource, whose primary purpose is to provide a single integration point for corporate data in order to serve management decision, and strategic analysis and planning. The data flows into a DW from the application systems and ODS ,and flows out to the data marts, usually in one direction only. Data that needs correction is rejected, corrected at its source, and re-fed through the system. |
Data Marts (DM) |
The purpose of the data marts is to provide for DSS / information processing and access that is customized and tailored for the needs of a particular department or common analytic need. |
Table 9.1 Corporate Information Factory Component Descriptions
Table 9.2 provides context for the reporting scope and purpose of each of the Corporate Information Factory components and some explanatory notes.
Component |
Reporting Scope / Purpose |
Notes |
Applications |
Isolated Operational Reports |
Limited to data within one application instance |
ODS |
Integrated Operational Reports |
Reports requiring data from multiple source systems. Typically, they have more operational than analytical orientation, with little historical data. |
DW |
Exploratory Analysis |
The complete set of corporate data allows for discovery of new relationships and information. Many BI data mining tools work with flat-file extracts from the DW, which can also offload the processing burden from the DW. |
Oper-Mart |
Tactical Analytics |
Analytic reporting based on current-values with a tactical focus. Dimensional data modeling techniques employed. |
Data Mart |
Analytics – classical management decision support, and Strategic Analytics |
Inmon’s early focus was on “departmental analysis”, which was experientially true for real-world organizational issues, such as political and funding expediency. Later work expanded concepts to common-analytic needs crossing departmental boundaries. |
Table 9.2 Corporate Information Factory Component Reporting Scope and Purpose
Table 9.3 provides a compare-and-contrast from a business and application perspective between the four major components of the Corporate Information Factory, such as between the Applications, ODS, DW and Data Marts.
Note the following general observations about the contrast between the information on the right hand side for DW and Data Marts, compared to the left hand side for applications, in particular:
Application Data |
ODS |
DW |
Data Mart |
|
Business Purpose |
Specific Business Function |
Corp Integrated Operational Needs |
Central Data Repository Integration and Reuse |
Analysis: Departmental (Inmon) Business Process (Kimball) Business Measures (Wells) |
System Orientation |
Operations (Execution) |
Operations (Reports) |
Infrastructure |
Informational, Analytic (DSS) |
Target Users |
End Users: Clerical (Daily Operations) |
Line Managers: Tactical Decision Makers |
Systems: Data Marts, Data Mining |
Executives: Performance Metrics / Enterprise Metrics Sr. Mgrs: Organization Metrics Mid Mgrs: Process Metrics Knowledge Workers: Activity |
How System is Used |
Fixed Ops |
Operational Reporting |
Stage, Store, Feed |
Ad-Hoc |
System Availability |
High |
Medium |
Varies |
Relaxed |
Typical Response Time |
Seconds |
Seconds to Minutes |
Longer (Batch) |
Seconds to Hours |
# Records in an op. |
Limited |
Small to Med. |
Large |
Large |
Amount of Data Per Process |
Small |
Medium |
Large |
Large |
System Development Life Cycle (SDLC) |
Classic |
Classic |
Classic |
Modified |
Table 9.3 Corporate Information Factory Components—Business / Application View
Table 9.4 provides a compare-and-contrast from a data perspective between the four major components of the Corporate Information Factory, such as between the applications, ODS, DW and Data Marts.
Table 9.4, especially the breakout rows for Amount of History and Latency, represents a classic framework where a majority of DW processes are for higher latency and, often, over-night batch processing. The combination of continued business pressure and requirements for more data faster, and the improvement in underlying technology, are blurring the lines and requiring advances in architectural design and approach. These topics are covered briefly in Section 9.2.4.1, Active Data Warehousing. Considered an advanced topic, it is not presented here as a separate architectural alternative.
Application |
ODS |
DW |
Data Mart |
|
Orientation |
Functional |
Subject |
Subject |
Limited Subject |
View |
Application |
Corporate (Ops) |
Corporate (Historical) |
Focused Analysis |
Integration |
Not Integrated - Application Specific |
Integrated Corporate Data |
Integrated Corporate Data |
Integrated Subset |
Volatility |
High, Create / Read / Update / Destroy (CRUD) |
Volatile |
Non-Volatile |
Non-Volatile |
Time |
Current Value |
Current Value |
Time Variant |
Time Variant |
Detail Level |
Detail Only |
Detail Only |
Detail + Summary |
Detail + Summary |
Amount of History* |
30 to 180 Days |
30 to 180 days |
5-10 years |
1-5 years |
Latency* |
Real Time to Near Real Time (NRT) |
NRT |
> 24 hours |
1 day to 1 month |
Normalized? |
Yes |
Yes |
Yes |
No |
Modeling |
Relational |
Relational |
Relational |
Dimensional |
Table 9.4 Corporate Information Factory Components—Data View
Note the following general observations about the contrast between the data perspective on the right hand side for DW and Data Marts, compared to the left hand side for applications, in particular:
9.2.2.2 Kimball’s Business Development Lifecycle and DW Chess Pieces
Ralph Kimball calls his approach the Business Dimensional Lifecycle; however, it is still commonly referred to as the Kimball Approach. From his Design Tip #493*, “We chose the Business Dimensional Lifecycle label instead, because it reinforced our core tenets about successful data warehousing based on our collective experiences since the mid-1980s.”
The basis of the Business Dimensional Lifecycle is three tenets:
The Business Dimensional Lifecycle advocates using conformed dimensions and facts design. The conformation process enforces an enterprise taxonomy and consistent business rules so that the parts of the data warehouse become re-usable components that are already integrated.
Figure 9.3 is a representation of what Kimball refers to as Data Warehouse Chess Pieces (Adapted from figures in The Data Warehouse Toolkit, 2nd Edition, Ralph Kimball and Margy Ross, John Wiley & Sons, 2002). Note that Kimball’s use of the term “Data Warehouse” has been more inclusive and expansive than that of Inmon. In the diagram below, Kimball uses the term Data Warehouse to encompass everything in both the data staging and data presentation areas.
Figure 9.3 Kimball’s Data Warehouse Chess Pieces
Table 9.5 describes the basic components of the Kimball’s Data Warehouse Chess Pieces view of DW / BI architecture and notes how these components map to CIF components.
Name |
Description |
Operational Source Systems |
Operational / Transactional Applications of the Enterprise. These provide the source data to be integrated into the ODS and DW components. Equivalent to the Application systems in the CIF diagram. |
Data Staging Area |
Kimball artfully uses the analogy of a “kitchen” to refer to this area as one where the data is prepared behind-the-scenes for presentation. He refers to it as the comprehensive set of all storage and ETL processes that stand between the source systems and the data presentation area. The key difference in the architectural approach here is that Kimball’s focus has always been on the efficient end-delivery of the analytical data. With that scope, smaller than Inmon’s corporate management of data, the data staging area becomes a potentially eclectic set of processes needed to integrate and transform data for presentation. Similar to combining two CIF components, such as Integration and Transformation, and DW. Note: In recent years, Kimball has acknowledged that an enterprise DW can fit into the architecture inside his Data Staging Area. |
Data Presentation Area |
Similar to the Data Marts in the CIF picture, with the key architectural difference being an integrating paradigm of a “DW Bus”, such as shared or conformed dimensions unifying the multiple data marts. |
Data Access Tools |
Focus on the needs and requirements for the end customers / consumers of the data has been a hallmark of Kimball’s approach. These needs translate into selection criteria from a broad range of data access tools to the right tools for the right task. In the CIF model, the access tools are outside of the DW architecture. |
Table 9.5 Kimball’s DW Chess Pieces—Component Descriptions
9.2.3 Tactical, Strategic and Operational BI
Tactical BI is the application of BI tools to analyze business trends by comparing a metric to the same metric from a previous month or year, etc. or to analyze historical data in order to discover trends that need attention. Use Tactical BI to support short-term business decisions.
Strategic BI has classically involved providing metrics to executives, often in conjunction with some formal method of business performance management, to help them determine if the corporation is on target for meeting its goals. Use Strategic BI to support long-term corporate goals and objectives.
Operational BI provides BI to the front lines of the business, where analytical capabilities guide operational decisions. Use Operational BI to manage and optimize business operations. Operational BI was the last of these three approaches to evolve in the industry. Operational BI entails the coupling of BI applications with operational functions and processes, with a requirement for very low tolerance for latency (near real-time data capture and data delivery). Therefore, more architectural approaches such as Service-oriented architectuure (SOA) become necessary to support operational BI fully. Some of these approaches are discussed in Section 9.2.4.1, Active Data Warehousing.
9.2.4 Types of Data Warehousing
The three major types of data warehousing are described in the following sections.
9.2.4.1 Active Data Warehousing
Data Warehouses serving tactical and strategic BI have existed for many years, often with a daily loading frequency, often serviced by a nightly batch window. These architectures were very dependent upon one of Inmon’s original hallmarks of data in the data warehouse, such as non-volatile data.
With the onset of Operational BI (and other general requirements from the business) pushing for lower latency and more integration of real time or near real time data into the data warehouse, new architectural approaches are emerging to deal with the inclusion of volatile data. A common application of operational BI is the automated banking machine (ABM) data provisioning. When making a banking transaction, historical balances and new balances resulting from immediate banking actions, need to be presented to the banking customer real-time. Full treatment of those new approaches for this data provisioning is beyond the scope of this introduction, but it will suffice to introduce two of the key design concepts that are required—isolation of change, and alternatives to batch ETL.
The impact of the changes from new volatile data must be isolated from the bulk of the historical, non-volatile DW data. Typical architectural approaches for isolation include a combination of building partitions and using union queries for the different partitions, when necessary.
Many alternatives to batch ETL handle the shorter and shorter latency requirements for data availability in the DW; some of these include trickle-feeds, pipelining, and Service-oriented architectuure (SOA) where Data Services are designed and maintained.
9.2.4.2 Multi-dimensional Analysis – OLAP
Online Analytical Processing (OLAP) refers to an approach to providing fast performance for multi-dimensional analytic queries. The term OLAP originated, in part, to make a clear distinction from OLTP, Online Transactional Processing. The typical output of OLAP queries are in a matrix format. The dimensions form the rows and columns of the matrix; and the factors, or measures, are the values inside the matrix. Conceptually, this illustrates as a cube. Multi-dimensional analysis with cubes is particularly useful where there are well-known ways analysts want to look at summaries of data.
A common application is financial analysis, where analysts want to repeatedly traverse known hierarchies to analyze data; for example, date (such as Year, Quarter, Month, Week, Day), organization (such as Region, Country, Business Unit, Department), and product hierarchy (such as Product Category, Product Line, Product).
9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
Three classic implementation approaches support Online Analytical Processing. The names of these relate to the respective underlying database implementation approach, such as Relational, Multi-dimensional, Hybrid, and Database.
9.2.5 Dimensional Data Modeling Concepts and Terminology
Dimensional data modeling is the preferred modeling technique for designing data marts. Dr. Ralph Kimball pioneered many of the terms and techniques of dimensional data modeling. The purpose of this section is to introduce the concepts and terms.
Kimball’s focus has been on end-user presentation of the data, and dimensional data modeling, in general, focuses on making it simple for the end-user to understand and access the data. Inherent in the design technique is a conscious trade-off of preferring and choosing easy to understand and use structures from an end-user perspective, at the cost of more implementation work for the developers. This helps contribute to the fact that the majority of data mart design work ends up being in ETL processing.
Table 9.6 contrasts the typical differences in the characteristics of systems built from relational modeling for transactional applications versus those built with dimensional data modeling for data marts.
Entity Relationship Modeling (Transactional Applications) |
Dimensional Data Modeling (Data Marts) |
|
Typical System |
Operational |
Informational, Analytic (BI) |
# Records in an operation |
A few |
Many (millions +) |
Typical Response Time |
Seconds |
Seconds, minutes to hours |
Target users |
Clerical – front line staff |
Management and analysts |
Orientation |
Application – Run the business |
Analysis – Analyze the business |
Availability |
High |
Relaxed |
Amount of Data Per Process |
Small |
Large |
Time Horizon for data |
60-180 days |
One to many years |
How System is Used |
Fixed Operations |
Fixed and Ad-Hoc |
Table 9.6 System Characteristics for Transactional Applications and Data Marts
Dimensional data modeling is a subset of entity relationship data modeling, and has the basic building blocks of entities, attributes, and relationships. The entities come in two basic types: facts, which provide the measurements; and dimensions, which provide the context. Relationships in simple dimensional modeling are constrained to all go through the fact table, and all dimension-to-fact relationships are one-to-many (1:M).
9.2.5.1 Fact Tables
Fact tables represent and contain important business measures. The term “fact” is overloaded, as “fact tables” (entities) contain one or more “facts” (attributes representing measures). The rows of a fact table correspond to a particular measurement and are numeric, such as amounts, quantities, or counts. Some measurements are the results of algorithms so that meta-data becomes critical to proper understanding and usage. Fact tables take up the most space in the database (90% is a reasonable rule of thumb), and tend to have a large number of rows.
Fact tables express or resolve many-to-many relationships between the dimensions. Access to fact tables is usually through the dimension tables.
Fact tables often have a number of control columns that express when the row was loaded, by what program, or indicators for most current record, or other statuses. These fields help the programmers, the operators and the super-users navigate and validate the data.
9.2.5.2 Dimension Tables
Dimension tables, or dimensions for short, represent the important objects of the business and contain textual descriptions of the business. Dimensions serve as the primary source for “query by” or “report by” constraints. They act as the entry points or links into the fact tables, and their contents provide report groupings and report labels. Dimensions are typically highly de-normalized and account for about 10% of the total data, as a rule of thumb. The depth and quality of the detailed design of dimensions determine the analytic usefulness of the resulting systems.
All designs will likely have a Date dimension and an Organization or Party dimension at a minimum. Other dimensions depend on the type of analysis that supports the data in the fact table.
Dimension tables typically have a small number of rows and large number of columns. Main contents of a dimension table are:
Dimensions must have unique identifiers for each row. The two main approaches to identifying keys for dimension tables are surrogate keys and natural keys.
9.2.5.2.1 Surrogate Keys
Kimball’s approach gives each dimension a single primary key, populated by a number unrelated to the actual data. The number is a “surrogate key” or “anonymous key”, and can be either a sequential number, or a truly random number. The advantages of using surrogate keys include:
In exchange for these advantages, there is extra ETL processing necessary to map the numeric key values to source key values, and maintain the mapping tables.
9.2.5.2.2 Natural Keys
For some systems, it is preferable not to create additional key fields, using, instead, the data that is already present to identify unique rows. The advantages of using natural keys include:
In exchange for these advantages, there can be a need to identify multiple fields in each query as part of the join, and possibly complex values for those non-numeric fields. Also, in some RDBMS, joins using long text strings may perform worse than those using numbers.
9.2.5.3 Dimension Attribute Types
The three main types of dimension attributes are differentiated by the need to retain historical copies. They are creatively named Type 1, Type 2 (and 2a), and Type 3. There are two other types that do not appear very often, also creatively named Type 4 and Type 6 (1+2+3). Types 1 through 3 can co-exist within the same table, and the actions during update depend on which fields with which types are having updates applied.
9.2.5.3.1 Type 1 Overwrite
Type 1 dimension attributes have no need for any historical records at all. The only interest is in the current value, so any updates completely overwrite the prior value in the field in that row. An example of Type 1 is ‘hair color’. When an update occurs, there is no need to retain the current value.
9.2.5.3.2 Type 2 New Row
Type 2 dimension attributes need all historical records. Every time one of these Type 2 fields changes, a new row with the current information is appended to the table, and the previously current row’s expiration date field is updated to expire it. An example is Billing Address. When the Billing Address changes, the row with the old address expires and a new row with the current Billing Address information is appended.
Note that managing Type 2 attributes requires that the table’s key be able to handle multiple instances of the same natural key, either through the use of surrogate keys, by the addition of an index value to the primary key, or the addition of a date value (effective, expiration, insert, and so on) to the primary key.
9.2.5.3.3 Type 3 New Column
Type 3 dimension attributes need only a selected, known portion of history. Multiple fields in the same row contain the historical values. When an update occurs, the current value is moved to the next appropriate field, and the last, no longer necessary, value drops off. An example is a credit score, where only the original score when the account opened, the most current score, and the immediate prior score are valuable. An update would move the current score to the prior score.
Another example is monthly bill totals. There can be 12 fields, named Month01, Month02, etc., or January, February, etc. If the former, then the current month value updates Month01 and all other values move down one field. If the latter, then when the proper month is updated, the user knows that the month after the current month contains last year’s data.
One useful purpose of Type 3 is for attribute value migrations. For example, a company decides to reorganize its product hierarchy, but wants to see sales figures for both the old hierarchy and the new for a year, to make sure that all sales are being recorded appropriately. Having both the old and the new available for a period of time allows this transition in the data.
9.2.5.3.4 Type 4 New Table
Type 4 dimension attributes initiate a move of the expired row into a ‘history’ table, and the row in the ‘current’ table is updated with the current information. An example would be a Supplier table, where expired Supplier rows roll off into the history table after an update, so that the main dimension table only contains current Supplier rows. The latter is sometimes called a Type 2a dimension.
Retrievals involving timelines are more complex in a Type 4 design, since current and history tables need to be joined before joining with the fact table. Therefore, it is optimal when the vast majority of access uses current dimension data and the historical table is maintained more for audit purposes than for active retrievals.
9.2.5.3.5 Type 6 1+2+3
Type 6 treats the dimension table as a Type 2, where any change to any value creates a new row, but the key value (surrogate or natural) does not change. One way to implement Type 6 is to add three fields to each row—effective date, expiration date, and a current row indicator. Queries looking for data as of any particular point in time check to see if the desired date is between the effective and end dates. Queries looking for only current data, add filters for the current row indicator. Adding filters has the drawback of requiring additional knowledge to create queries that correctly ask for the proper row by period value or indicator.
Another way to implement Type 6 is to add an index field instead of a current row indicator, with the current value of 0. Updated rows get the index value of zero, and all rows add 1 to their index values to move them down the line. Queries looking for the current values would set the filter for index value equal to zero, and queries looking for prior times would still use the effective and expiration dates. This technique has the drawback that all fact rows will link automatically to the index version 0 (the current row). Queries joining to the fact table will not find any prior values of the dimension unless the dimensional effective and expiration dates are included in the query.
9.2.5.4 Star Schema
A star schema is the representation of a dimensional data model with a single fact table in the center connecting to a number of surrounding dimension tables, as shown in Figure 9.4. It is also referred to as a star join schema, emphasizing that the joins from the central fact table are via single primary keys to each of the surrounding dimension tables. The central fact table has a compound key composed of the dimension keys.
9.2.5.5 Snowflaking
Snowflaking is the term given to de-normalizing the flat, single-table, dimensional structure in a star schema into the respective component hierarchical or network structures. Kimball’s design methods discourage snowflaking on two main principals: 1) it dilutes the simplicity and end-user understandability of the star schema, and 2) the space savings are typically minimal.
Three types of snowflake tables are recognized: true snowflakes, outriggers, and bridges:
9.2.5.6 Grain
Kimball coined the term grain to stand for the meaning or description of a single row of data in a fact table. Or, put another way, it refers to the atomic level of the data for a transaction. Defining the grain of a fact table is one of the key steps in Kimball’s dimensional design method. For example, if the fact table has data for a store for all transactions for a month, we know the grain or limits of the data in the fact table will not include data for last year.
Figure 9.4 Example Star Schema
9.2.5.7 Conformed Dimensions
Conformed dimensions are the common or shared dimensions across multiple data marts in Kimball’s design method. More precisely, Kimball defines dimensions to be conformed when they either match in terms of both data element names and their respective values, or contain a strict subset. The practical importance is that the row headers from any answer sets from conformed dimensions must be able to match exactly.
For example, think of multiple data marts or fact tables, all linking directly to the same dimension table, or a direct copy of that dimension table. Updates to that dimension table automatically show in all queries for those data marts.
Reuse of conformed dimensions in other star schemas allows for modular development of the DW. Stars can be clipped together through conformed dimensions as the design grows. A DW that starts with a fact table for the Accounts Payable Department can be clipped onto a fact on vendor performance in Supply Department through a Product dimension that they share in common. Ultimately, queries walk across subject areas to unify data access to the DW across the entire enterprise.
9.2.5.8 Conformed Facts
Conformed facts use standardized definitions of terms across individual marts. Different business users may use the same term in different ways. Does “customer additions” refer to “gross additions” or “adjusted additions”? Does “orders processed” refer to the entire order, or the sum of individual line items.
Developers need to be keenly aware of things that may be called the same but are different concepts across organizations, or conversely things that are called differently but are actually the same concept across organizations.
9.2.5.9 DW-Bus Architecture and Bus Matrix
The term bus came from Kimball’s electrical engineering background, where a bus was something providing common power to a number of electrical components. Building on that analogy, the DW-bus architecture of conformed dimensions is what allows multiple data marts to co-exist and share by plugging into a bus of shared or conformed dimensions.
The DW-bus matrix is a tabular way of showing the intersection of data marts, data processes, or data subject areas with the shared conformed dimensions. Table 9.7 shows a sample tabular representation of bus architecture. The opportunity for conformed dimensions appears where a data mart is marked as using multiple dimensions (the row). The DW-bus appears where multiple data marts use the same dimensions (the column).
The DW-bus matrix is a very effective communication and planning tool. Its unifying concept is one of Kimball’s most valuable contributions to the DW-BIM practice. The unifying concept becomes a key living design document in the DW-BIM. As new design pieces are added, the existing dimensions and facts, complete with their sources, update logic, and schedule, need to be reviewed for possible re-use.
Conformed Dimensions |
|||||
Business Process (Marts) |
Date |
Product |
Store |
Vendor |
Warehouse |
Sales |
X |
X |
X |
||
Inventory |
X |
X |
X |
X |
X |
Orders |
X |
X |
X |
Table 9.7 DW-Bus Matrix Example
9.3 DW-BIM Activities
Data warehousing is concerned primarily with the part of the DW-BIM lifecycle from data source to a common data store across all relevant departments—in short, data content. BIM is concerned with the portion of lifecycle from common data store to targeted audience use—in short, data presentation.
DW and BIM naturally intertwine, as no DW can deliver value to the organization without some means of providing access to the collected data along with analytic and reporting capabilities. In turn, the effectiveness of a BIM capability is directly dependent upon the provision of data from the DW that is timely, relevant, integrated, and has other quality factors controlled for and documented as required.
DW-BIM activities overlap with many of the data management functions already covered in the Guide. The purpose of the DW-BIM Activities section is to articulate the activities involved in DW-BIM in a practical implementation based context. It includes references to other data management functions, with definitions elsewhere in the Guide, as well as providing practical insights into the various methods, tools, and techniques that are specific to the DW-BIM function.
9.3.1 Understand Business Intelligence Information Needs
Starting with and keeping a consistent business focus throughout the DW-BIM lifecycle is essential to success. Looking at the value chain of the enterprise is a good way to understand the business context. The specific business processes in a company’s value chain provide a natural business-oriented context in which to frame areas of analysis. See Section 4.2.2.3 for further information on Value Chains and Figure 4.4 for a good example.
Gathering requirements for DW-BIM projects has both similarities to and differences from gathering requirements for other projects in typical IT development. In DW-BIM projects, it is generally more important to understand the broader business context of the business area targeted, as reporting is generalized and exploratory. The broader business context is in stark contrast to operational systems, where the development process defines, up-front, the precise specific details and requirements of operations and reports.
Analysis for DW-BIM projects is more ad-hoc by nature and involves asking questions, which, in turn, lead to new or different questions. Of course, querying will be limited by the nature and quality of the data available. And although the exact specifications of all reports will not be known, what can be known is the context for the questions—the ways the knowledge workers will most likely want to slice-and-dice the data.
Identify and scope the business area, then identify and interview the appropriate business people. Ask what they do and why. Capture specific questions they want to ask. Document how they distinguish between and categorize important aspects of the information. Ask the business people how they track and define success. Where possible, define and capture key performance metrics and formulae.
Capturing the actual business vocabulary and terminology is a key to success. The BI requirements-gathering activity is a great opportunity to partner with the Meta-data Management function (see Chapter 11), as it is critical to have a business-understandable context of data end-to-end; from the initial data sources, through all the transformations, to the end presentation. In summary, a DW-BIM project requirements write-up should frame the whole context of the business areas and / or processes that are in scope.
Document the business context, then explore the details of the actual source data. Typically, the ETL portion can consume 67% of a DW-BIM project’s dollars and time. Data profiling is very helpful, and collaborating with the Data Quality function is essential (see Chapter 12). Evaluation of the state of the source data leads to more accurate up-front estimates for feasibility and scope of effort. The evaluation is also important for setting appropriate expectations.
Note that the DW is often the first place where the pain of poor quality data in source systems and / or data entry functions becomes apparent. Collaborating with the Data Governance function (see Chapter 3) is critical, as business input on how to handle all the unexpected variations that inevitably occur in the actual data is essential.
Creating an executive summary of the identified business intelligence needs is a best practice. The executive summary should include an overview of the business context, have a list of sample questions, provide commentary on the existing data quality and level-of-effort for cleansing and integration, and describe related organizations and business functions. It may also include a mock-up drawing for navigation of the solution showing the pathways for query and reporting in the selected presentation product. Review the executive summary with the business for prioritization in the DW-BIM program.
When starting a DW-BIM program, a good way to decide where to start is using a simple assessment of business impact and technical feasibility. Technical feasibility will take into consideration things like complexity, availability and state of the data, and the availability of subject matter experts. Projects that have high business impact and high technical feasibility are good candidates for starting.
Most importantly, assess the necessary business support, considering three critical success factors:
9.3.2 Define and Maintain the DW-BI Architecture
Chapter 4, on Data Architecture Management, provided excellent coverage of both data architecture in general, as well as many of the specific components of the DW-BIM architecture, including enterprise data models (subject area, conceptual, and logical), data technology architecture, and data integration architecture. Section 9.2.2 introduced the key components of DW-BIM architecture. The current section adds some practical considerations related to defining and maintaining the DW-BIM architecture.
Successful DW-BIM architecture requires the identification and bringing together of a number of key roles, potentially from other major functions, including:
DW-BIM needs to leverage many of the disciplines and components of a company’s IT department and business functions; thus, another key set of initial activities includes assessing and integrating the appropriate business processes, architectures, and technology standards, including ones for:
Technical requirements including performance, availability, and timing needs are key drivers in developing the DW-BIM architecture. The DW-BIM architecture should answer the basic questions about what data goes where, when, why and how. The ‘how’ needs to cover the hardware and software detail. It is the organizing framework to bring all the activities together.
The design decisions and principles for what data detail the DW contains is a key design priority for DW-BIM architecture. Publishing the clear rules for what data will only be available via operational reporting (such as in non-DW) is critical to the success of DW-BIM efforts. The best DW-BIM architectures will design a mechanism to connect back to transactional level and operational level reports from atomic DW data. Having this mechanism is part of the art of good DW design. It will protect the DW from having to carry every transactional detail.
An example is providing a viewing mechanism for key operational reports or forms based on a transactional key, such as Invoice Number. Customers will always want all the detail available, but some of the operational data has value only in the context of the original report, and does not provide analytic value, such as long description fields.
It is very important that the DW-BIM architecture integrate with the overall corporate reporting architecture. Many different design techniques exist, but one helpful technique is to focus on defining business-appropriate Service Level Agreements (SLAs). Often the response time, data retention, and availability requirements and needs differ greatly between classes of business needs and their respective supporting systems, such as operating reporting versus DW versus data marts. Several tables in Section 9.2.2 will be helpful in considering the varying aspects of different design components of the DW-BIM architecture.
Another critical success factor is to identify a plan for data re-use, sharing, and extension. The DW-Bus matrix introduced in Section 9.2.5.9 provides a good organizing paradigm.
Finally, no DW-BIM effort can be successful without business acceptance of data. Business acceptance includes the data being understandable, having verifiable quality, and having a demonstrable lineage. Sign-off by the Business on the data should be part of the User Acceptance Testing. Structured random testing of the data in the BIM tool against data in the source systems over the initial load and a few update load cycles should be performed to meet sign-off criteria. Meeting these requirements is paramount for every DW-BIM architecture. Consider, up-front, a few critically important architectural sub-components, along with their supporting activities:
9.3.3 Implement Data Warehouses and Data Marts
Data warehouses and data marts are the two major classes of formal data stores in the DW-BIM landscape.
The purpose of a data warehouse is to integrate data from multiple sources and then serve up that integrated data for BI purposes. This consumption is typically through data marts or other systems (e.g. a flat file to a data-mining application). The design of a data warehouse is a relational database design with normalization techniques. Ideally, a single data warehouse will integrate data from multiple source systems, and serve data to multiple data marts.
The primary purpose of data marts is to provide data for analysis to knowledge workers. Successful data marts must provide access to this data in a simple, understandable, and well-performing manner. Dimensional modeling (using de-normalization techniques) and design, as introduced in Section 9.2.5, has largely been the technique of choice for designing user-oriented data marts. Create a data mart to meet specialized business analysis needs. Data marts often include aggregated and summarized information to support faster analysis. Kimball’s vision has only data marts and no normalized DW layer.
Chapter 5 covers detailed data design, and database design, in particular. The references at the end of the chapter provide a number of excellent books on DW-BIM implementation methods.
In review, the use of data warehouses and data marts could be considered an application of one of Covey’s famous Seven Habits4, such as start with the end in mind. First, identify the business problem to solve, then identify the details and what would be used (end solution piece of the software and associated data mart). From there, continue to work back into the integrated data required (the data warehouse), and ultimately, all the way back to the data sources.
9.3.4 Implement Business Intelligence Tools and User Interfaces
The maturity of the BI market and a wide range of available BI tools makes it rare for companies to build their own BI tools.5* The purpose of this section is to introduce the types of tools available in the BI marketplace, an overview of their chief characteristics, and some information to help match the tools to the appropriate customer-level capabilities.
Implementing the right BI tool or User Interface (UI) is about identifying the right tools for the right user set. Almost all BI tools also come with their own meta-data repositories to manage their internal data maps and statistics. Some vendors make these repositories open to the end user, while some allow business meta-data to be entered. Enterprise meta-data repositories must link to, or copy from, these repositories to get a complete view of the reporting and analysis activity that the tool is providing. Chapter 11 covers Meta-Data Management.
9.3.4.1 Query and Reporting Tools
Query and reporting is the process of querying a data source, then formatting it to create a report, either a production style report such as an invoice, or a management report.
The needs within business operations reporting are often different from the needs within business query and reporting. Yet sometimes, the needs blur and lines cross. Just as you can use a hammer to get a screw into the wall, you can use a business operations-reporting tool for management reporting. The converse, however, is not true; rarely can you use a business query and reporting tool to develop business operations reports. A business query tool may not support pixel-perfect layouts, normalized data sources, or the programmability that IT developers demand.
With business query and reporting, the data source is more often a data warehouse or data mart (though not always). While IT develops production reports, power users and casual business users develop their own reports with business query tools. Table 9.8 provides an excellent generalization of the mapping classes of BI tools to their respective primary classes of users. It compares some additional characteristics that help distinguish business operations-style reports from business query and reporting-style reports. These characteristics are by no means absolute, and you will not necessarily find vendor tools that fit precisely either. Business operations reports are not necessarily pixel-perfect, although some are. Use reports generated with business query tools individually, departmentally, or enterprise-wide.
Characteristics |
Business Operations Reports |
Business and Query Reporting |
Primary Author |
IT Developer |
Power users or business user |
Purpose |
Document preparation |
Decision making |
Report Delivery |
Paper or email, embedded in an application |
Portal, spreadsheet, email |
Print Quality |
Pixel perfect |
historically presentation quality, now pixel perfect |
User Base |
10s of 1000s |
100s or 1000s |
Data Source |
OLTP – real time |
Data warehouse or data mart, occasionally OLTP |
Level of Data Detail |
Atomic |
Aggregated, filtered |
Scope |
Operational |
Tactical, strategic |
Usage |
Often embedded within an OLTP application |
BI as a separate application |
Table 9.8 Production versus Business and Query Reporting
In the last few years, there has been a tremendous coalescing and collapsing of the marketplace with respect to reporting tools. All of the major BI vendors now offer classic pixel-perfect report capabilities that were once primarily in the domain of application reports. From a simple cost perspective, the delivery mechanism and infrastructure for reports or even information is agnostic to the content or type of information. In other words, it is prudent for companies to leverage common infrastructure and delivery mechanisms. These include the web, email, and applications for the delivery of all kinds of information and reports, of which DW-BIM is a subset.
Production reporting crosses the DW-BIM boundary and often queries transactional systems to produce operational items such as invoices or bank statements. The developers of production reports tend to be IT personnel.
Business query and reporting tools enable users who want to author their own reports, or create outputs for use by others. They are less concerned with the precise layout because they are not trying to generate an invoice or the like. However, they do want charts and tables quickly and intuitively. Some tools focus on innovative visualization of the data as a means to show meaning in the data with data maps, and moving landscapes of data over time. The formatting capabilities vary dramatically in this segment. Tools in this segment are referred to as ad hoc query tools. Often the reports created by business users become standard reports, not exclusively used for ad hoc business questions.
Figure 9.5 relates the classes of BI tools to the respective classes of BI users for those tools.
Figure 9.5 What BI Tools for What Users?
In defining the target user groups, there is a spectrum of BI needs. First, know your user groups and then match the tool to the user groups in your company. On one end, IT developers may be most concerned with extracting the data, and focus on advanced functionality. On the other end of the spectrum, information consumers may want fast access to previously developed and executed reports. These consumers may want some degree of interactivity such as drill, filter, sort, or may only want to see a static report.
Keep in mind that drilling is an OLAP functionality. So is this a need just for Analyst or Power Users, or is it something that customers / suppliers / casual users would also like, but that perhaps has not been possible in the past?
You need to understand how all classes of users expect to use the tool, including Web users. Will the Web just be a delivery mechanism, or also a report-authoring environment? Will you / how will you provide offline access for reports that are available over the Web?
Users may move from one class of users to another as their skills increase or as they perform different business functions. A supply chain manager, for example, may want to view a static financial report, but will want a highly interactive report for analyzing inventory. A financial analyst and a line manager responsible for expenses may be a power user when analyzing total expenses but a customer viewing a static report of one phone bill.
External users typically look at static reports, like a summary of their activity. Increasingly, however, companies are providing more interactive extranet reporting for their best customers and biggest suppliers. Front-line workers may use static, published reports, or a nugget of information embedded within an application. Executives and managers will use a combination of fixed reports, dashboards, and scorecards. Managers and power users tend to want to drill into these reports slice and dice the data to identify the root cause of problems.
9.3.4.2 On Line Analytical Processing (OLAP) Tools
OLAP provides interactive, multi-dimensional analysis with different dimensions and different levels of detail. Section 9.2.3.2, Multi-dimensional Analysis—MOLAP, briefly introduced this topic. This section covers OLAP tools, which provide for the arrangement of data into OLAP cubes for fast analysis.
Typically, cubes in the BI tools are generated from a star (or snowflake) database schema. The OLAP cubes consist of numeric facts, called measures, from the fact tables. These cubes can be virtual on-demand or batch jobbed. The dimensions categorize their facts in the respective schema (See Section 9.2.2).
The value of OLAP tools and cubes is reduction of the chance of confusion and erroneous interpretation, by aligning the data content with the analyst’s mental model. The analyst can navigate through the database and screen for a particular subset of the data, changing the data’s orientations and defining analytical calculations. Slice-and-dice is the user-initiated process of navigation by calling for page displays interactively, through the specification of slices via rotations and drill down / up. Common OLAP operations include slice and dice, drill down, drill up, roll up, and pivot.
9.3.4.3 Analytic Applications
Henry Morris of IDC first coined the term “analytic applications” in the mid 1990s, clarifying how they are different from OLAP and BI tools in general.6 Analytic applications include the logic and processes to extract data from well-known source systems, such as vendor ERP systems, a data model for the data mart, and pre-built reports and dashboards. Analytic applications provide businesses with a pre-built solution to optimize a functional area (people management, for example) or industry vertical (retail analytics, for example).
Different types of analytic applications include customer, financial, supply chain, manufacturing, and human resource applications.
The buy versus build approach greatly influences the nuances within analytic applications. When you buy an analytic application, you buy the data model and pre-built cubes and reports with functional metrics. These buy applications tell you what is important, what you should be monitoring, and provide some of the technology to help you get to value faster. For example, with a general BI tool, you determine how and whether to calculate business measures, such as average sale per store visit, and in which reports you want it to appear. A pre-built analytic application provides this and other metrics for you. Some build analytic applications provide a development environment for assembling applications.
The value proposition of analytic applications is in the quick start, such as the shortened time-to-market and delivery. Some of the key questions for evaluation of analytic applications are:
9.3.4.4 Implementing Management Dashboards and Scorecards
Dashboards and scorecards are both ways of efficiently presenting performance information. Typically, dashboards are oriented more toward dynamic presentation of operational information, while scorecards are more static representations of longer-term organizational, tactical, or strategic goals. Scorecards focus on a given metric and compare them to a target, often reflecting a simple status of red, yellow, and green for goals, based on business rules; dashboards typically present multiple numbers in many different ways.
Typically, scorecards are divided into 4 quadrants or views of the organization: Finance, Customer, Environment, and Employees, though there is flexibility, depending on the priorities of the Organization. Each will have a number of metrics that are reported and trended to various targets set by senior executives. Variance to targets is shown, usually with a root cause or comment accompanying each metric. Reporting is usually on a set interval, and ownership of each metric is assigned so that performance improvement expectations can be enforced.
In his book on Performance Dashboards, Wayne Eckerson provides in depth coverage of the types and the architectures of dashboards. The purpose of presenting this information is to provide an example of the way various BI techniques combine to create a rich integrated BI environment. Figure 9.6 is an adaptation of a related TDWI publication7*.
Figure 9.6 The Three Threes of Performance Dashboards
9.3.4.5 Performance Management Tools
Performance management applications include budgeting, planning, and financial consolidation. There have been a number of major acquisitions in this segment, as ERP vendors and BI vendors see great growth opportunities here and believe BI and Performance Management are converging. On the customer buying side, the degree to which customers buy BI and performance management from the same vendor depends on product capabilities, but also on the degree to which the CFO and CIO co-operate. It is important to note that budgeting and planning does not apply only to financial metrics, but to workforce, capital, and so on, as well.
9.3.4.6 Predictive Analytics and Data Mining Tools
Data mining is a particular kind of analysis that reveals patterns in data using various algorithms. Whereas standard query and reporting tools require you to ask a specific question, a data mining tool will help users discover relationships or show patterns in a more exploratory fashion. Predictive analytics (‘what-if” analysis) allow users to create a model, test the model based on actual data, and then project future results. Underlying engines may be neural networks or inference.
Use data mining in predictive analysis, fraud detection, root cause analysis (through clustering), customer segmentation and scoring, and market basket analysis. Although data mining is one segment of the BI market, it continues to be an application reserved for specialist users. In the past, statisticians have largely extracted data from source systems and data warehouses to perform analyses outside of the BI environment. Recent partnerships between BI and DB vendors are providing tighter coupling and integrating of analytic processing and DB capabilities. Typically flat file extracts are used to train the engine, and then a full run on a source database is performed, producing statistical reports and charts.
Note that a good strategy for interfacing with many data mining tools is to work with the business analysts to define the data set needed for analysis, and then arrange for a periodic file extract. This strategy offloads the intense multi-pass processing involved in data mining from the DW, and many data mining tools work with file-based input, as well.
9.3.4.7 Advanced Visualization and Discovery Tools
Advanced visualization and discovery tools often use an in-memory architecture to allow users to interact with the data in a highly visual, interactive way. Patterns in a large dataset can be difficult to recognize in a numbers display. A pattern can be picked up visually fairly quickly, when thousands of data points are loaded into a sophisticated display on a single page of display.
The difference in these tools versus most dashboard products is usually in:
9.3.5 Process Data for Business Intelligence
The lion’s share of the work in any DW-BIM effort is in the preparation and processing of the data. This section introduces some of the architectural components and sub-activities involved in processing data for BI.
9.3.5.1 Staging Areas
A staging area is the intermediate data store between an original data source and the centralized data repository. All required cleansing, transformation, reconciliation, and relationships happen in this area.
Advanced architectures implement these processes in a well-defined and progressive manner. Dividing the work reduces the overall complexity, and makes debugging much simpler. Having an initial staging area is a common, simple strategy to offload a complete set of data from the respective source system as-is, i.e., with no transforms.
A change-capture mechanism reduces the volume of transmitted data sets. Several months to a few years of data can be stored in this initial staging area. Benefits of this approach include:
Use subsequent design components to filter data only needed for business priorities, and do iterative, progressive, conforming and normalization. Designs that further allow separation of data conforming, such as conforming types and value sets, from merging and normalization will be simpler to maintain. Many architectures name this data integration and transformation to distinguish it from the simple copy-only staging area.
9.3.5.2 Mapping Sources and Targets
Source-to-target mapping is the documentation activity that defines data type details and transformation rules for all required entities and data elements, and from each individual source to each individual target. DW-BIM adds additional requirements to this classic source-to-target mapping process encountered as a component of any typical data migration. In particular, one of the goals of the DW-BIM effort should be to provide a complete lineage for each data element available in the BI environment all the way back to its respective source(s)
The most difficult part of any mapping effort is determining valid links between data elements in multiple equivalent systems. Consider the effort to consolidate data into an EDW from multiple billing or order management systems. Chances are that tables and fields that contain equivalent data do not have the same names or structures. A solid taxonomy is necessary to match the data elements in different systems into a consistent structure in the EDW. Gold sources or system of record source(s) must be signed off by the Business.
9.3.5.3 Data Cleansing and Transformations (Data Acquisition)
Data cleansing focuses on the activities that correct and enhance the domain values of individual data elements, including enforcement of standards. Cleansing is particularly necessary for initial loads where significant history is involved. The preferred strategy is to push data cleansing and correction activity back to the source systems, whenever possible.
Strategies must be developed for rows of data that are loaded but found to be incorrect. A policy for deleting old records may cause some havoc with related tables and surrogate keys, expiring a row and loading the new data as a whole new row may be a better option.
Data transformation focuses on activities that provide organizational context between data elements, entities, and subject areas. Organizational context includes cross-referencing, reference and master data management (see Chapter 8), and complete and correct relationships. Data transformation is an essential component of being able to integrate data from multiple sources. Data transformation development requires extensive involvement with Data Governance.
9.3.6 Monitor and Tune Data Warehousing Processes
Transparency and visibility are the key principles that should drive DW-BIM monitoring. The more one can expose the details of the DW-BIM activities, the more end-customers can see and understand what is going on (and have confidence in the BI) and less direct end-customer support will be required. Providing a dashboard that exposes the high-level status of data delivery activities, with drill-down capability, is a best practice that allows an on-demand-pull of information by both support personnel and customers. The addition of data quality measures will enhance the value of this dashboard where performance is more than just speed and timing.
Processing should be monitored across the system for bottlenecks and dependencies among processes. Database tuning techniques should be employed where and when needed, including partitioning, tuned backup and recovery strategies. Archiving is a difficult subject in data warehousing. Users often consider the data warehouse as an active archive due to the long histories that are built, and are unwilling, particularly if the OLAP sources have dropped records, to see the data warehouse engage in archiving.
Management by exception is a great policy to apply here. Sending success messages will typically result in ignored messages, but sending attention messages upon failure is a prudent addition to a monitoring dashboard.
9.3.7 Monitor and Tune BI Activity and Performance
A best practice for BI monitoring and tuning is to define and display a set of customer-facing satisfaction metrics. Average query response time and the number of users per day / week / month, are examples of useful metrics to display. In addition to displaying the statistical measures available from the systems, it is useful to survey DW-BIM customers regularly.
Regular review of usage statistics and patterns is essential. Reports providing frequency and resource usage of data, queries, and reports allow prudent enhancement. Tuning BI activity is analogous to the principle of profiling applications in order to know where the bottlenecks are and where to apply optimization efforts. The creation of indexes and aggregations is most effective when done according to usage patterns and statistics. Tremendous performance gains can come from simple solutions such as posting the completed daily results to a report that runs hundreds or thousands of times a day.
9.4 Summary
The guiding principles for implementing data warehousing and business intelligence management into an organization, a summary table of the roles for each data warehousing and business intelligence activity, and organization and cultural issues that may arise during data warehousing and business intelligence management are summarized below.
9.4.1 Guiding Principles
The implementation of the data warehousing and business intelligence management function into an organization follows eleven guiding principles:
9.4.2 Process Summary
The process summary for the data warehousing and business intelligence management function is shown in Table 9.9. The deliverables, responsible roles, approving roles, and contributing roles are shown for each activity in the data warehousing and business intelligence management function. The Table is also shown in Appendix A9.
Activities |
Deliverables |
Responsible Roles |
Approving Roles |
Contributing Roles |
7.1 Understand Business Intelligence Information Needs (P) |
DW-BIM Project Requirements |
Data / BI Analyst, BI Program Manager, SME |
Data Steward, Business Executives and Managers |
Meta-Data Specialist, Business Process Lead |
7.2 Define the Data Warehouse / BI Architecture (P) (same as 2.1.5) |
Data Warehouse / Business Intelligence Architecture |
Data Warehouse Architect, Business Intelligence Architect |
Enterprise Data Architect, DM Executive, CIO, Data Architecture Steering Committee, Data Governance Council |
Business Intelligence Specialists, Data Integration Specialists, DBAs, Other Data Mgmt. Professionals, IT architects |
7.3 Implement Data Warehouses and Data Marts (D) |
Data Warehouses, Data Marts, OLAP Cubes |
Business Intelligence Specialists |
Data Warehouse Architect, Data Stewardship Teams |
Data Integration Specialists, DBAs, Other Data Mgmt. Professionals, Other IT Professionals |
7.4 Implement Business Intelligence Tools and User Interfaces (D) |
BI Tools and User Environments, Query and Reporting, Dashboards, Scorecards, Analytic Applications, etc. |
Business Intelligence Specialists |
Data Warehouse Architect, Data Stewardship Committee, Data Governance Council, Business Executives and Managers |
Data Warehouse Architect, Other Data Mgmt. Professionals,, Other IT Professionals |
7.5 Process Data for Business Intelligence (O) |
Accessible Integrated Data, Data Quality Feedback Details |
Data Integration Specialists |
Data Stewards |
Other Data Mgmt. Professionals, Other IT Professionals |
7.6 Monitor and Tune Data Warehousing Processes (C) |
DW Performance Reports |
DBAs, Data Integration Specialists |
IT Operators |
|
7.7 Monitor and Tune BI Activity and Performance (C) |
BI Performance Reports, New Indexes, New Aggregations |
Business Intelligence Specialists, DBAs, Business Intelligence Analysts |
Other Data Mgmt. Professionals, IT Operators, IT Auditors |
Table 9.9 DW and BI Management Process Summary
9.4.3 Organizational and Cultural Issues
Q1: I can’t get CEO / CIO support. What can I do?
A1: Try to discover what their burning business problems and issues are and align your project with providing solutions to those.
Q2: How do I balance the pressures of individual project delivery with DW / BI program goals of building out re-usable data and infrastructure?
A2a: Build out re-usable infrastructure and data a piece at a time.
A2b: Use the DW- bus matrix as a communication and marketing tool. On a project by project basis, negotiate a give-and-take – e.g., “Here are the conformed dimensions that other projects have developed that you get to benefit from.”; and “Here are the ones we are asking this project to contribute to building so other future projects can benefit.”
A2c: Don’t apply the same rigor and overhead to all data sources. Relax the rules / overhead for single source, project-specific data. Use business priorities to determine where to apply extra rigor. In short, use the classic 80 / 20 rule: 80% of the value comes from 20% of the data. Determine what that 20% is and focus on it.
9.5 Recommended Reading
The references listed below provide additional reading that support the material presented in Chapter 9. These recommended readings are also included in the Bibliography at the end of the Guide.
9.5.1 Data Warehousing
Adamson, Christopher. Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance. John Wiley & Sons, 2006. ISBN 0-471-77709-9. 345 pages.
Adamson, Christopher and Michael Venerable. Data Warehouse Design Solutions. John Wiley & Sons, 1998. ISBN 0-471-25195-X. 544 pages.
Adelman, Sid and Larissa T. Moss. Data Warehouse Project Management. Addison-Wesley Professional, 2000. ISBN 0-201-61635-1. 448 pages.
Adelman, Sid and others. Impossible Data Warehouse Situations: Solutions from the Experts. Addison-Wesley, 2002. ISBN 0-201-76033-9. 432 pages.
Brackett, Michael. The Data Warehouse Challenge: Taming Data Chaos. New York: John Wiley & Sons, 1996. ISBN 0-471-12744-2. 579 pages.
Caserta, Joe and Ralph Kimball. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data. John Wiley & Sons, 2004. ISBN 0-764-56757-8. 525 pages.
Correy, Michael J. and Michael Abby. Oracle Data Warehousing: A Practical Guide to Successful Data Warehouse Analysis, Build and Roll-Out. TATA McGraw-Hill, 1997. ISBN 0-074-63069-5.
Covey, Stephen R. The 7 Habits of Highly Effective People. Free Press, 2004. ISBN 0743269519. 384 Pages.
Dyche, Jill. E-Data: Turning Data Into Information With Data Warehousing. Addison- Wesley, 2000. ISBN 0-201-65780-5. 384 pages.
Gill, Harjinder S. and Prekash C. Rao. The Official Guide To Data Warehousing. Que, 1996. ISBN 0-789-70714-4. 382 pages.
Hackney, Douglas. Understanding and Implementing Successful Data Marts. Addison Wesley, 1997. ISBN 0-201-18380-3. 464 pages.
Imhoff, Claudia, Nicholas Galemmo and Jonathan G. Geiger. Mastering Data Warehouse Design: Relational and Dimensional Techniques. John Wiley & Sons, 2003. ISBN 0-471-32421-3. 456 pages.
Imhoff, Claudia, Lisa Loftis and Jonathan G. Geiger. Building the Customer-Centric Enterprise: Data Warehousing Techniques for Supporting Customer Relationship Management. John Wiley & Sons, 2001. ISBN 0-471-31981-3. 512 pages.
Inmon, W. H. Building the Data Warehouse, 4th Edition. John Wiley & Sons, 2005. ISBN 0-764-59944-5. 543 pages.
Inmon, W. H. Building the Operational Data Store, 2nd edition. John Wiley & Sons, 1999. ISBN 0-471-32888-X. 336 pages.
Inmon, W. H., Claudia Imhoff and Ryan Sousa. The Corporate Information Factory, 2nd edition. John Wiley & Sons, 2000. ISBN 0-471-39961-2. 400 pages.
Inmon, W. H. and Richard D. Hackathorn. Using the Data Warehouse. Wiley-QED, 1994. ISBN 0-471-05966-8. 305 pages.
Inmon, William H., John A. Zachman and Jonathan G. Geiger. Data Stores, Data Warehousing and the Zachman Framework. McGraw-Hill, 1997. ISBN 0-070-31429-2. 358 pages.
Kimball, Ralph and Margy Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edition. New York: John Wiley & Sons, 2002. ISBN 0-471-20024-7. 464 pages.
Kimball, Ralph, Laura Reeves, Margy Ross and Warren Thornwaite. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses. John Wiley & Sons, 1998. ISBN 0-471-25547-5. 800 pages.
Kimball, Ralph and Richard Merz. The Data Webhouse Toolkit: Building the Web-Enabled Data Warehouse. John Wiley & Sons, 2000. ISBN 0-471-37680-9. 416 pages.
Mattison, Rob, Web Warehousing & Knowledge Management. McGraw Hill, 1999. ISBN 0-070-41103-4. 576 pages.
Morris, Henry. Analytic Applications and Business Performance Management. DM Review Magazine, March, 1999. www.dmreview.com. Note: www.dmreview.com is now www.information-management.com.
Moss, Larissa T. and Shaku Atre. Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley, 2003. ISBN 0-201-78420-3. 576 pages.
Poe, Vidette, Patricia Klauer and Stephen Brobst. Building A Data Warehouse for Decision Support, 2nd edition. Prentice-Hall, 1997. ISBN 0-137-69639-6. 285 pages.
Ponniah, Paulraj. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. John Wiley & Sons – Interscience, 2001. ISBN 0-471-41254-6. 528 pages.
Westerman, Paul. Data Warehousing: Using the Wal-Mart Model. Morgan Kaufman, 2000. ISBN 155860684X. 297 pages.
9.5.2 Business Intelligence
Biere, Mike. Business Intelligence for the Enterprise. IBM Press, 2003. ISBN 0-131-41303-1. 240 pages.
Eckerson, Wayne W. Performance Dashboards: MEassuring, Monitoring, and Managing Your Business. Wiley, 2005. ISBN-10: 0471724173. 320 pages.
Bischoff, Joyce and Ted Alexander. Data Warehouse: Practical Advice from the Experts. Prentice-Hall, 1997. ISBN 0-135-77370-9. 428 pages.
Howson, Cindi. “The Business Intelligence Market”. http://www.biscorecard.com/. Requires annual subscription to this website.
Malik, Shadan. Enterprise Dashboards: Design and Best Practices for IT. Wiley, 2005. ISBN 0471738069. 240 pages.
Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley, 2003. ISBN 0-201-78420-3. 576 pages.
Vitt, Elizabeth, Michael Luckevich and Stacia Misner. Business Intelligence. Microsoft Press, 2008. ISBN 073562660X. 220 pages.
9.5.3 Data Mining
Cabena, Peter, Hadjnian, Stadler, Verhees and Zanasi. Discovering Data Mining: From Concept to Implementation. Prentice Hall, 1997. ISBN-10: 0137439806
Delmater, Rhonda and Monte Hancock Jr. Data Mining Explained, A Manager’s Guide to Customer-Centric Business Intelligence. Digital Press, Woburn, MA, 2001. ISBN 1-5555-8231-1.
Rud, Olivia Parr. Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management. John Wiley & Sons, 2000. ISBN 0-471-38564-6. 367 pages.
9.5.4 OLAP
Thomsen, Erik. OLAP Solutions: Building Multidimensional Information Systems, 2nd edition. Wiley, 2002. ISBN-10: 0471400300. 688 pages.
Wremble, Robert and Christian Koncilia. Data Warehouses and Olap: Concepts, Architectures and Solutions. IGI Global, 2006. ISBN: 1599043645. 332 pages.
http://www.olapcouncil.org/research/resrchly.htm