Chapter 6.4

Introduction to Data Vault Methodology

Abstract

One of the most important components of the end-state architecture is that of the data vault. The data vault exists to satisfy the need for rock-solid data integrity. Like all other components of the end-state architecture, the data vault has gone through its own evolution. And like all components of the end-state architecture, data vault will continue to evolve.

Keywords

Six Sigma; TQM; PMP; SEI; CMMI; TCO; ROI

Data Vault 2.0 Methodology Overview

The Data Vault 2.0 standard provides a best practice for project execution, which is called the “Data Vault 2.0 methodology.” It is derived from core software engineering standards and adapts them for the use in data warehousing. Fig. 6.4.1 shows the standards that have influenced the Data Vault 2.0 methodology.

Fig. 6.4.1
Fig. 6.4.1 Data Vault 2.0 methodology overview.

The methodology for data vault projects is based on best practices pulled from disciplined agile delivery (DAD), automation and optimization principles (CMMI, KPAs, and KPIs), Six Sigma error tracking and reduction principles, Lean Initiatives, and cycle time reduction principles.

In addition, the data vault methodology takes in to account a notion known as managed self-service BI. The notion of managed self-service BI is introduced in the Data Vault 2.0 architecture section of this chapter.

The idea of the methodology is to provide teams with current working practices and a well-laid out IT process for building data warehouse systems (business intelligence systems) in repeatable fashion, reliably and rapidly.

How Does CMMI Contribute to the Methodology?

Carnegie Mellon's Capability Maturity Model Integration (CMMI) contains the foundations of management, measurement, and optimization. These components are applied to the methodology at the levels of key process areas (KPAs) and key performance indicators (KPIs). These pieces are necessary in order to understand and define what the business processes are and should be around the implementation and life cycle of the business intelligence build-out.

The business of building business intelligence solutions needs to mature. In order to accomplish these goals, the implementation team must first accept that a BI system is a software product. As such, software development life cycle (SDLC) components, along with best practices of managing, identifying, measuring, and optimizing must be applied—particularly if the team is to become and remain agile going forward. Fig. 6.4.2 demonstrates how CMMI levels map to Data Vault 2.0 methodology. It is not a complete map, just a representative portion of the entire piece.

Fig. 6.4.2
Fig. 6.4.2 CMMI mapped to Data Vault 2.0.

The end goal of CMMI is optimization. Optimization cannot be achieved without metrics (quantitative measurements) or KPIs. These KPIs cannot be achieved without the KPAs or definitions of key areas to be measured, which of course cannot be achieved without first managing the project.

The road to agility is paved with metrics and well-defined/well-understood business processes. The Data Vault 2.0 methodology relies on the necessary components of CMMI in order to establish a solid foundation on which to build and automate enterprise business intelligence systems.

Taking a step back, here is a simplified definition of what CMMI cares about:

In CMMI, process management is the central theme. It represents learning and honesty as demonstrated through work according to a process. Process also enables transparency by communicating how work should be done. Such transparency is within the project, among projects, and being clear about expectations. Also, measurement is part of process and product management and provides the information needed to make decisions that guide product development. http://resources.sei.cmu.edu/asset_files/TechnicalNote/2008_004_001_14924.pdf
Page 17

CMMI brings consistency to the processes; it also brings manageability, documentation, and cost control. CMMI helps the people assigned to the project execute with a specific quality metric in mind. It also assists with the measurements of those metrics by identifying common processes that must happen in every business intelligence system.

CMMI provides the framework to operate within. Teams implementing Data Vault 2.0 methodology inherit the best parts of CMMI level 5 specifications and can successfully hit the ground running. Why? Because Data Vault 2.0 methodology provides the transparency, defines many of the KPAs and KPIs, and also enriches the project process by allocating template-based predefined deliverables, utilized during the implementation phases.

Transparency is implemented in the Data Vault 2.0 projects in different manners. The first recommendation for teams is to set up an in-company wiki, one that can reach any and all employees (including executives) in the firm. All meetings, all models, all templates, designs, metadata, and documentation should be recorded in the wiki.

The wiki should be updated at least once a day (if not more) by different members of the team. There will be more updates at the start or kickoff of new projects than at any other time during the life cycle. This should indicate a level of communication (which is stressed in agile/Scrum) with the business users.

The second component is the recording of business requirement meetings. All business requirement meeting “time” can be shortened, and the quality of the requirements increases when the meetings themselves are actually recorded utilizing an MP3 recorder. The audio files should be submitted to the wiki, so that team members (if out of the office) can retroactively attend or review when necessary.

This leads to a more agile business requirement meeting. Noise makers in these meetings tend to be quiet when it is recorded until or unless they have a significant contribution to make that would impact the outcome of the project goals. Please note that the rest of the explanation of how and why this works is beyond the scope of this book and is available in Data Vault 2.0 Boot Camp training classes and online at http://LearnDataVault.com.

If CMMI Is So Great, Why Should We Care About Agility Then?

Agility and Scrum or disciplined agile delivery (DAD) is still necessary to manage the individual sprint cycles or miniprojects that need to occur. CMMI manages the overall enterprise goals and provides a baseline consistency to the enterprise-wide efforts—so everyone in IT is on the same page (at least those involved with the BI project).

An agile implementation should be tailored to match an organization's actual maturity level; however, implementing agile when an organization is at CMMI level three can result in less rework and improve the overall CMMI initiative while providing the significant benefits of agile. Implementing a CMMI compliant software development process that is also agile will bring the repeatability and predictability offered by CMMI. Agile, by design, is highly adaptable and therefore can be molded into a CMMI-compliant software development process without altering the primary objectives set forth by the Agile Manifesto.https://www.scrumalliance.org/community/articles/2008/july/agile-and-cmmi-better-together

Please keep in mind that teams don’t wake up one day and just decide to be agile right there on the spot. It's an evolutionary process; the team must undergo training both in agile and in Data Vault 2.0 methodology in order to achieve the desired goals. Most teams that undertake training with Data Vault 2.0 start with 7-week sprint cycles (if they have had zero exposure to CMMI and agile previously).

Usually, the second sprint cycle reduces 7 weeks to 6 weeks. The third (if the team is working in earnest and measuring their productivity and following the agile and Scrum review process) can see sprint cycles drop to about 4 weeks. From there, it simply improves to 2 weeks as the team gets better at it. Currently, there is a team implementing Data Vault 2.0 methodology and attempting to achieve 1-week sprint cycles. There doesn’t seem to be a bottleneck to optimizing the processes.

But as a reminder, where does the optimization of these processes come from? CMMI—in direct correlation with the KPAs and KPIs of building a data warehouse. It is tied as well to repeatable designs, pattern-based data integration, pattern-based models, and yes—pattern-based business intelligence build cycles. This is the value of Data Vault 2.0 methodology—it provides the patterns out of the gate, to get the teams kick-start in the right direction.

Why Include PMP, SDLC If CMMI and Agile Should Be All That's Needed?

That said, CMMI doesn’t describe how to achieve these goals; it just describes what should be in place. Agile doesn’t describe what you need, but rather how to manage the people and the life cycle. Projects and SDLC components are necessary for the next step: pattern-based development and delivery. The next pieces of the puzzle come from project management professional (PMP) and software development life cycle (SLDC). PMP lays the project foundation for the common project best practices.

While the team strives to be agile in the end, at some level, waterfall project practices must be adhered to. Otherwise, a project cannot progress through its lifecycle to completion.

According to project management body of knowledge (PMBOK) guide:

The project management framework embodies a project life cycle and five major project management process groups:

  •  Initiating
  •  Planning
  •  Executing
  •  Monitoring and controlling
  •  Closing

Reference: http://encyclopedia.thefreedictionary.com/Project+Management+Professional

The difference is that this “lifecycle” is now assigned to a 2-week sprint, with disciplined agile delivery (DAD) overseeing the process.

There are several components to how this fits in with the Data Vault 2.0 methodology. First, there is the master project—the overall enterprise-wide vision. This generally consists of a multiyear, large-scale effort (for large enterprises). These projects are then often broken into subprojects (as they should be), with outlined goals and objectives within 6-month time frames.

Then, the subprojects should be broken into 2-week sprint cycles (to meet agile requirements). The idea is to not have the project levels become top-heavy and full of planning, but rather to act as an overall guide or map from start to finish in terms of what the enterprise business intelligence solution needs to provide.

At the end of the day, project managers should have a firm grasp on what they are managing (CMMI), how they will manage the people (agile/Scrum/DAD), how the sprints have to be lined up in order to accomplish the goals and objectives of the enterprise, and how to measure the success/failure of particular parts of the process. Otherwise, without hindsight or measurement, then there will be no room for improvement or optimization.

So Then, What Does Six Sigma Contribute to the Data Vault 2 Methodology?

Six Sigma is defined to be

Six Sigma seeks to improve the quality of process outputs by identifying and removing the causes of defects (errors) and minimizing variability in manufacturing and business processes. It uses a set of quality management methods, including statistical methods, and creates a special infrastructure of people within the organization ("Champions,” "Black Belts,” "Green Belts,” "Yellow Belts,” etc.) who are experts in these methods.http://en.wikipedia.org/wiki/Six_Sigma

To paraphrase for enterprise BI projects, Six Sigma is all about measuring and eliminating defects that plague the enterprise warehouse build process. Data Vault 2.0 methodology attaches Six Sigma school of thought to the metrics that are captured in the life cycle of each sprint (i.e., the KPIs and the Scrum review process—what’s broken, why, and how do we fix it).

In order to reach full optimization (or full maturity) for the enterprise BI initiatives, all miniprojects or minisprints must reach their full optimization as well. Otherwise, the organization cannot achieve CMMI level 5. The Data Vault 2.0 methodology outlines (in some levels of detail) how to tie these components together.

Once the team understands that all work is measured, monitored, and eventually optimized, then Six Sigma mathematics can provide the business with a confidence rating—showing improvement (or not) of the enterprise BI team and their progress, as a whole. This is only part of the nature of total cost of ownership (TCO) and reducing TCO while improving return on investment (ROI) for the business.

The Data Vault 2.0 methodology provides the patterns, the artifacts, and the repeatable processes for building an enterprise BI solution, effectively and in a measured and applied manner. Six Sigma seeks to assist the optimization of the teams, and the implementation methods in order to streamline the agility and improve quality overall. In other words, without Six Sigma, the words “better faster cheaper” cannot apply to business intelligence projects.

Where Does TQM (Total Quality Management) Fit in to All of This?

Total quality management (TQM) is the cream of the crop. TQM is necessary in order to keep the moving parts of the enterprise BI solution well-oiled and running smoothly. TQM is the icing on the cake (as it were). Turns out, TQM plays several roles in the Data Vault 2.0 methodology; these roles will be briefly introduced and discussed below. In order to better understand TQM, a definition is in order:

Total quality management (TQM) consists of organization-wide efforts to install and make permanent a climate in which an organization continuously improves its ability to deliver high-quality products and services to customers.http://en.wikipedia.org/wiki/Total_quality_management

The Data Vault 2.0 methodology incorporates and aligns the goals and functions of TQM with the purpose of producing better faster cheaper business intelligence solutions. It is actually hard to imagine enterprise focused projects being run any other way. TQM offers a view consistent with the business users and the deliverables that the enterprise BI project strives to provide. Some of the fundamental primary elements behind TQM include the following:

It is clear by now that TQM plays a vital role in the success of the data warehousing and BI projects. TQM is aligned (as previously described) with the desired outcomes of CMMI, Six Sigma, Agile/Scrum, and DAD.

The Data Vault 2.0 methodology is process-centered, provides for an integrated system, is a strategic and systematic approach, requires total employee involvement, is customer-focused, and relies on transparency and communications. The Data Vault 2.0 model brings fact-based decision-making to the table, rather than “truth” or subjective-based decision-making. The other part of the fact-based decision-making is impacted by the collected KPAs and KPIs in the enterprise BI project (don’t forget, these are a part of the optimization steps in CMMI level 5).

As it turns out, accountability (both for the system as a whole and the data living in the data warehouse) is a necessary part of TQM as well. How is this possible? TQM is customer-focused; the customer (in this case, the business user) needs to stand up and take ownership of their data (no not their information but their data).

The only place in the organization that these data exist in raw form integrated by business key is in the Data Vault 2.0 data warehouse. It is precisely this understanding of facts that draws the business users’ attention to Six Sigma metrics—demonstrating quantitatively, the gaps between business perception of operation, and business reality of data capture over time.

Addressing these gaps by filing change requests to the source systems or renegotiating the SLAs with the source data provider is part of the TQM process and part of reducing TCO and improving data quality across the enterprise. TQM plays a role in enriching the BI ecosystem, if and only if the business users are forced to be accountable for their own data and decide to engage in gap analysis (the old-fashioned way) by leveraging statistics that show where and what percentage of their current business perception (business requirements) are broken. The DV2 methodology provides pathways in the project that the teams and business users can follow to achieve these results.

Without the business taking action to close the gaps, TQM dissolves to simple data quality initiatives and does not contribute as heavily or as well to the TCO reduction strategy. Improving the quality of the data and understanding the gaps that exist are vital to the overall success and future of an enterprise BI solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset