Chapter 4
Metrics Data Visualization

Data Analysis

Data analysis is an essential part of a metrics system. While the metrics data could be a process indicator, the hidden patterns are revealed only by analysis of data. Traditionally, data analysis is considered a tedious process dependent on rigorous statistical techniques. It is customary to brand data analysis as a “scientific method,” not in the style of the practitioner. Furthermore, the cumbersome nature of some of the statistical techniques deters people. Data analysis can be made simple and interesting by using appropriate tools and an effective approach. The tool can be as handy as a spreadsheet with its statistical functions and built-in macros. There are several approaches to effective data analysis; we present here one such approach that reduces process behavior to three dimensions and has a strong bearing on visual techniques.


Visual Analysis

Before going to statistical techniques, we can analyze data in a much simpler and more effective way by using data visualization. Even if one chooses to do a complex statistical analysis, it is better to do a preliminary analysis of data using visual elements.

We may begin with viewing data in structured tabular forms and transform them to graphs and pictures to gain intuitive insights. One can also use exploratory data analysis (EDA) to reduce the amount of data by clustering and cut down the dimensions by mapping. EDA allows one to explore data as a precursor to more formal statistical analysis. Some view EDA as an integral part of statistical analysis. Such visual analysis reduces the complexity and provides a higher-level summary of the situation.


Rigorous Analysis

Rigorous data analysis brings to one’s mind hypothesis testing, multivariate analysis, design of experiments, and similar sophisticated methods. We find that even with basic analysis such as histograms, control charts, and scatter plots, we can understand process behavior with sufficient depth. We can think of fruitful analysis of metrics data in the corresponding three domains: frequency, time, and relationship.


Graphical Analysis

A picture is worth a thousand words. Graphical forms have the potential to reveal the intrinsic patterns, otherwise hidden within the raw data. Visualization of data by the human mind is equivalent to execution of highly sophisticated analysis routines, albeit subconsciously. In the first place, data visualization requires creation of graphs, visual icons, and symbols, best done using the computer. It also requires human perception for the detection of patterns.

Data visualization as an emerging technology links the world’s two most powerful information-processing systems — the human mind and the modern computer visual metaphors.

Creating graphs requires data and further processing such as:

  • Data collection
  • Data structuring
  • Data cleansing
  • Examination of data
  • Creating graphs to visualize data

Perceiving patterns in a visual presentation is essentially a human process that involves the following cognitive elements:

  • Active goals
  • Motivation to find a solution
  • Recollection of experience
  • Application of knowledge
  • Pattern discovery

Visualizing Data

Transforming data into a graph makes it easy to interpret. For example, Exhibit 1 presents productivity data from a bug-fixing process. The raw data columns are difficult to read even though they contain the basic information. Creating a line graph from this data, as shown in Exhibit 2, instantly makes it easy to see productivity, its variations, and trends. A pattern almost hidden in the data now emerges. Thus, the elementary but very useful application of graphs is reducing complexity and enhancing readability. This enables process analysis by the human mind.


Graphical Techniques

The spreadsheet supports many graphical tools for visualizing data. Exhibit 3 contains a list from MS Excel, which can be used to analyze most project situations.

Exhibit 1. Productivity data.

Work
Package
Ref.
Prod.
Bug/PM
Work
Package
Ref.
Prod.
Bug/PM
Work
Package
Ref.
Prod.
Bug/PM
Work
Package
Ref.
Prod.
Bug/PM
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
WP9
WP10
WP11
WP12
WP13
WP14
WP15
WP16
WP17
WP18
WP19
WP20
WP21
WP22
WP23
WP24
WP25
WP26
WP27
WP28
WP29
WP30
WP31
2.08
10.06
2.61
3.66
53.57
12.66
14.32
40.97
11.00
30.21
16.13
6.93
17.54
90.51
10.02
13.71
12.51
9.87
33.87
2.75
1.20
11.25
12.12
9.79
29.07
13.10
2.53
8.62
4.95
5.21
7.57
WP32
WP33
WP34
WP35
WP36
WP37
WP38
WP39
WP40
WP41
WP42
WP43
WP44
WP45
WP46
WP47
WP48
WP49
WP50
WP51
WP52
WP53
WP54
WP55
WP56
WP57
WP58
WP59
WP60
WP61
WP62
9.36
169.34
10.60
87.76
6.07
15.32
5.61
13.10
14.95
42.30
40.45
14.87
6.95
5.27
2.07
5.48
9.26
2.88
46.04
17.68
41.45
7.80
14.48
11.91
6.59
40.47
10.40
10.79
8.48
16.47
2.32
WP63
WP64
WP65
WP66
WP67
WP68
WP69
WP70
WP71
WP72
WP73
WP74
WP75
WP76
WP77
WP78
WP79
WP80
WP81
WP82
WP83
WP84
WP85
WP86
WP87
WP88
WP89
WP90
WP91
WP92
WP93
13.88
24.12
67.01
8.72
6.87
4.72
5.42
8.08
16.39
26.80
12.28
6.66
7.10
6.39
51.47
27.60
11.36
15.84
7.88
116.27
70.06
11.21
15.80
26.12
5.42
9.66
6.96
12.13
15.91
10.36
12.02
WP94
WP95
WP96
WP97
WP98
WP99
WP100
WP101
WP102
WP103
WP104
WP105
WP106
WP107
WP108
WP109
WP110
WP111
WP112
WP113
WP114
WP115
WP116
WP117
WP118
WP119
WP120
WP121
WP122
WP123
WP124
5.08
16.35
9.62
6.58
11.95
4.99
36.49
27.06
11.98
23.97
5.27
4.01
6.67
8.23
6.37
11.12
12.18
7.69
4.77
3.91
3.06
8.52
9.46
5.54
9.96
3.71
5.03
3.94
5.18
5.30
11.67

Pie Charts: Distribution Analysis

Pie charts have the inherent power to show distribution patterns. For example, distribution of rework cost among software products is illustrated in Exhibit 4. This chart gives us the picture of a problem at a glance, along with a sense of totality.

i_Image1

Exhibit 2. Productivity graph.


Exhibit 3. MS Excel graphical tools.

• Column
• Bar
• Line chart
• Pie chart
• Scatter plot
• Area
• Doughnut
• Radar
• Surface
• Bubble
• Stock
• Cylinder
• Cone
• Pyramid
• Area block (three-dimensional)
• B & White area
• B & W column (three-dimensional)
• B & W line timescale
• B & W pie
• Blue pie (gradient)
• Colored lines
• Column area
• Columns with depth
• 3D cones
• Floating bars
• Line column
• Line column on two axes
• Line on two axes
• Logarithmic
• Outdoor bars
• Pie explosion (three-dimensional)
• Smooth lines
• Stack of colors
• Tubes

Pie charts have unlimited application potential. They can be applied to almost any decision situation. Some of the common applications are

  • Distribution of customer complaints
  • Distribution of defects among components
  • Effort distribution
  • Market share analysis
  • Defect discovery analysis
  • Sales analysis
  • HR analysis
  • Downtime analysis
Project ID Rework Cost
( $ )
A
B
C
D
E
3000
1500
2500
3000
1000
i_Image2

Exhibit 4. Distribution of rework cost among software products.


Mapping

Mapping is the process of displaying data as a projection into two- or three-dimensional space. During this projection the spatial separation between points represents “relations of the data.” Data can be ordinal for the purpose of mapping. Thus, subjective assessments are allowed to be mapped without losing application potential. Also, mapping permits even nonlinear projections.

Mapping does not require that the measurement be done on a very precise and fully validated manner. Even ordinal scales of measurement, which could have subjective errors, can be used for mapping. Similarly, mapping accommodates nonlinear scales of measurement despite the inherent ambiguities. Thus, mapping as a method has a universal appeal and remains flexible.

The outstanding benefits of mapping are that it reduces the dimensionality of the dataset to a sufficiently small value to allow visual inspection.

Two patterns of people management emerge from Exhibit 5. The circles form a family of events that correspond to a “manage for results” approach. The squares form a different family that relates to a more-modern approach of “managing for results and people.” Exhibit 5 is plotted from subjective evaluation of leadership styles in an organization. Despite its simplicity, it has a powerful revelation of two sub-cultures in the organization.

Profiles, matrices, and contours are some of the commonly used forms of mapping.

i_Image1

Exhibit 5. Management grid: map of leadership styles.


Life Cycle Profiles

Perhaps the most pertinent analysis in a software project is to view the life cycle of the project and to recognize process outcomes in life cycle phases. One can think of a series of life cycle phase analyses expressed in the form of profiles. Each life cycle profile (LCP) has the following benefits:

  • It provides connectivity among phases.
  • It arranges project events in a natural order in tune with the work-flow.
  • It gives the complete picture of the project at a glance.

Apart from these common merits, there are additional advantages that can be derived from LCP, based on the metric chosen for the presentation. If the metric is defect, the profile gives clues about process maturity. If the metric is rework, the profile provides causal readings into cost control and could become an eminent problem definition for cost reduction initiatives. Risk can also be perceived from some profiles.


Effort Profile

Effort profiles for two projects are presented in Exhibit 6 and Exhibit 7. First, one can identify the following features in the profiles:

  • The phase where effort peaks
  • The share of effort devoted to requirements and design
  • The share of effort given to testing
  • The ratio of design effort to code effort
  • The percentage of effort on project management

Perception of such features is the beginning of analysis. The mind delves into the recognitions aided by knowledge and motivated by expectations. Model effort profiles that have been reported in the past spring to the mind of the perceiver as baselines. One recalls effort profiles of design projects that used concurrent engineering and cut down defects 25 times and reduced the “time to deliver” 4 times. These profiles recorded effort concentration in the early half of the project. One can also recall the Walker Royce finding of a similar trendy shift in effort profiles of modern software projects. Some may see the thrust on testing in Project B as a serious attempt toward the attainment of reliability growth, where operational pro-file testing continues well after system testing, costing a budget but cutting down postdelivery defects.

Life Cycle Profile Effort %
Req. Analysis
Design
Coding
Testing
Others
12%
16%
57%
11%
3%
i_Image2

Exhibit 6. Effort profile of Project A.

Life Cycle Profile Effort %
Req. Analysis
Design
Coding
Testing
Other
18%
20%
19%
28%
15%
i_Image2

Exhibit 7. Effort profile of Project B.


There are several possibilities when it comes to interpreting the effort pro-files. During visualization, the mind runs through all known paths of visual analysis, almost in a jiffy, drawing from the vast storehouse of experiences, opinions, and knowledge units embedded in the viewer’s personality.

When such effort profiles are constructed for all projects and compared with the business results attained by them, intuitive mapping rules emerge, which can be reapplied to new projects. The intuition derived from visualization gets one closer to forecasting the destiny of the project from data available from completed phases.

Almost certainly, the visual icon of effort profile will influence budgeting in the subsequent projects. It will also facilitate the project leader in setting phase-level process goals.


Process Compliance Profile

Measuring process compliance is done by auditing process centers against quality system elements such as capability maturity model KPAs or ISO 9000 clauses. The findings could be presented as a profile with compliance displayed on a scale of 0 to 10, as illustrated in Exhibit 8. This ten-level measurement has an element of subjectivity that depends on the auditor’s experience and approach. Also, sampling methods might have been applied while collecting data, introducing additional possibilities of errors. The profile, however, succeeds in capturing the larger truth without much ambiguity. A profile is truer than an isolated point. By seeing the patterns of the strong and weak areas and their relative “distances,” it is possible to understand what is wrong with the system. Such profiles display process landscape of organizations.


Responsibility Matrix

The matrix structure is a very convenient mapping tool, widely used in process analysis. The matrix structure is ideal to map relationship between two complex sets of data.

A good way to visualize responsibility allocation to project team members is to create a responsibility matrix, as shown in Exhibit 9. The header row represents team member ID, the header column contains responsibility areas, and the cells contain the numbers that point toward responsibility levels.

QSE CL QSE CL
1
2
3
4
5
6
7
8
9
10
9
5
8
4
7
6
3
5
2
9
11
12
13
14
15
16
17
18
19
20
3
4
6
8
4
5
3
9
7
2
i_Image2

Exhibit 8. Process compliance profile.


Exhibit 9. Typical responsibility matrix.

Team Member ID
A B C D E F G
Responsibility
Levels
Requirements 10510
Design 10
Build 1010
Review 10510
Test 105
Defect prevention 310
Risk mitigation 310
Project management 10
Total 26101010101550
i_Image3

Exhibit 10. Responsibility allocation.


We are able to first perceive the complex nature of two systems, the team, and the process, and then compare them in the matrix. Here complexity is reduced to one dimension, making it easy for visual comparison. The bearing of each grid element can be easily and conveniently read off from the headers. By encoding each grid element with color that represents the degree of relationship, we can quickly get a visual feel of interplay between two systems. Sometimes instead of color we use linguistic expressions: high, medium, and low, based on the user’s preferences. If we choose to use numbers, even in an ordinary scale, further analysis is possible, as illustrated in Exhibit 10.


Exhibit 11. Responsibility matrix after reallocation.

Team Member ID
A B C D E F G
Responsibility
Levels
Requirements 10 5 10
Design 10 5 2
Build 3 10 10
Review 10 2 2 2 2 5 10
Test 3 10 5
Defect prevention 4 4 4 4 4 4 10
Risk mitigation 4 4 4 4 4 4 10
Project management 2 2 2 2 2 10
Total 28 25 22 30 22 27 50
i_Image3

Exhibit 12. Responsibility levels after resource balancing.


Resource Balancing

The responsibility matrix can be used for resource balancing. A graph can be created from a total responsibility quantum for each person. Exhibit 10 is such a graph, which visualizes responsibility distribution among people.

Exhibit 11 illustrates a typical scenario where people have narrow allocations of jobs. A few people share the bulk of the responsibility while others are waiting for better utilization of the skills. It may happen that, only after plotting the graph, people may realize the imbalance in resource utilization.

Where cross-functional teamwork and development of multi-disciplinary skills prevail, such imbalances could be minimized. Exhibit 12 illustrates a scenario after resource balancing.

Defect Code Defect Level
CUST
REQ
DES
COD
PROC
2
5
3
8
12
i_Image2

Exhibit 13. Quality contour.


Contours

A contour map is a top view of a terrain representing terrain features, which are otherwise hidden to a side view. Process contours, built from several metrics, provide a complete view in a similar manner. For example, quality contour graphs can be created to display product quality, as illustrated in Exhibit 13.

Quality is seen from several directions such as the customer perspective (CUST), requirement defects (REQ), design defects (DES), coding defects (COD), and associated process defects (PROC). It may be seen that each metric mentioned here has been captured in different phases of the project, using different detection techniques. Even the units of defect metric could be different. An attempt to define quality by a single metric is at any time a partial answer. Quality is seen as a continuum expression which starts from project inception and continues to implementation.

A product with poor history cannot suddenly become wonderful based on the final metric. A more practical view is to establish a connected view tracking the full quality history. Quality contours achieve this completeness of expression. Thus, quality contour redefines the meaning of quality.


Radar Charts: A Balanced View

Radar charts can present a balanced view of factors. For example, if a project has to support multiple goals, it is pragmatic to assume that all the goals may not receive equal attention at any given time. There could be competition among them. Fulfillment of goals could reflect the same situation. If you plot a radar chart for goal fulfillment with each goal in one polar axis, we get a diagram that will indicate balance in fulfillment. If one goal dominates the scene the radar chart will look lopsided, visually indicating the problem.

In Exhibit 14 fulfillment of training needs in six chosen areas is plotted by a training manager. This graph helps us to visualize the learning tendencies of people gravitating toward technical training.

A radar chart can play the role of a rudimentary balanced scorecard in projects, based on the metrics chosen. The inherent ability of a radar chart lies in the fact that it can handle multiple variables at a given time and establish a visual relationship among them. A radar chart can be an ideal component in a project dashboard.


Dynamic Views

The visual elements can be made dynamic to interact with the viewer. The links between graphical presentations and the parent metrics database can be organized in a dynamic manner, instead of providing static images. While advanced data mining tools offer interactive facility, we can build on the spreadsheet macros that rearrange the data and feed the graphs with fresh choices of data sets. Pivot tables and data filters may be put to maximum advantage. To get the most out of graphs, we need to make them respond quickly to an inquiry. The changing views constitute “dynamic analysis” of process, almost a simulation run. Because graphs have a natural propensity to summarize data and run on the upper rungs of the information pyramid, these “simulation runs” appeal to the deeper recesses of human perception.


Clustering

A natural way of analyzing data is to group together, or cluster, similar data in accordance with some selected criteria. The clusters thus formed could be related to other clusters, forming a cluster tree. Huge amounts of data can now be reduced to colonies of clusters, which can be easily visualized.

Contrasting with cluster trees, sometimes dissimilar data is grouped into disjoint clusters. The clustering rules now tend to maximize the dissimilarities between clusters but minimize dissimilarities within each.

For example, maintenance project metrics data can be grouped according to the rules of priority ascribed to each bug. Clusters are formed around priority levels. Each cluster is characterized uniquely, still preserving and honoring the core precept that cements it.

Alternatively, maintenance events can be clustered around “cost” of bug fixing, if the clustering rule were cost. The high cost cluster may exhibit unique process characteristics, significantly different from low cost clusters. The clustering reveals an inner order — a guide — which facilitates understanding of bug behavior.

Training Needs Fulfillments%
Communication
Programing
Total Quality Management
Testing
Software Requirement Specification
Project Management
20
50
40
28
13
10
i_Image2

Exhibit 14. Radar chart on training needs fulfillment.


There could be several such rules for clustering, such as cost and priority, each a vehicle for seeing the complete database from one perspective. Viewing the database from significant perspectives and clustering rules is a convenient form of data analysis.


Data Exploration and Visualization Tools

Data visualization tools in general provide highly interactive and dynamic graphics that helps the user to see multiple views of data. The graphics are designed to augment visual intuition so that we can better understand the data and see what the data has to say.

Many tools are compatible with the spreadsheet, benefiting business practitioners who use spreadsheets extensively.

Data visualization capabilities are commonly embedded in a wide range of software types, including tools for reporting, online analytical processing (OLAP), text mining, and data mining. Software tools for customer relationship management and business performance management also employ data visualization in the front end.

Data visualization tools are available suitable for stand-alone, embedded, or enterprise applications with several attractive features.

Features providing analytical support, particularly for interactive use, are listed here. These features show patterns on parameters or variable names (instead of defining data ranges and locations) that can be selected by the user with the click of a mouse.

  • Interactive analysis
  • Drag and drop
  • Dynamic graphs (plots and tables)
  • What-if simulation
  • Multi-view graphics
  • Linked plots
  • Visual scalability
  • Partition
  • Data mining
  • Animation to see patterns
  • 3D images
  • Nonparametric methods
  • Drill down
  • Cause-and-effect diagram

Exhibit 15. Data exploration and visualization tools.

S No. Tool Name Vendor Name Site Address
1
2
3
4
5
6
7
8
9
10
11
12
PopChartXpress
Visual Insight
Cviz
Dataplot
Data Desk & Vizion
JMP5
S-PLUS
omegahat
XploRe
Fathom
nViZn
MARS & CART
CORDA Technologies
Bell Labs
IBM
NIST
Data Description, Inc.
JMP
Insightful
Omega Project
Md Tech
Key Curriculum Press
illumitek
Salford Systems
www.corda.com
www.bell-labs.com
www.alphaworks.ibm.com
www.itl.nist.gov
www.datadesk.com
www.jmp.com
www.insightful.com
www.omegahat.org
www.explore-stat.de
www.keypress.com
www.illumitek.com
www.salford-systems.com

Structural facilities, which allow convenient deployment, are discussed later. These facilities help in integrating the tools with business processes and related IT systems.

  • Links to Excel
  • Centralized application management (facility useful in multi-user environment)
  • Inline analysis (facility to integrate our own algorithms)
  • Direct data source linking
  • Component library
  • Data independence (ability to work with any kind of database)
  • Web enabled
  • Versatile deployment capability

There are a multitude of data visualization tool vendors offering a wide range of capabilities and facilities. We can pick and choose from the several models, based on our specific requirements. The proliferation of tool development indicates the growing demand. A representative list of such tools is presented in Exhibit 15.


Data Visualization: Emerging Technology

There is growing interest in data visualization in all disciplines, from engineering to management. Data visualization is used both in the initial exploration before statistical analysis and in the final display of results and model building.

In the preliminary run, attempts to visualize data will help the analyst go through an iterative process of data preparation improving the structure, quality, and suitability of datasets for higher-level analysis and model building. Elegance in visual design will reflect order in data, reinforcing the already-strong connection between visuals and data.

Applications such as the weather forecast use three-dimensional visualization to simulate cloud formations, cyclones, and rainfall based on parametric models that use as many as 16 variables. In software project management, similar opportunities for higher-end methods exist in visualizing many abstract phenomena, including:

  • Organization behavior from 18 HR variables
  • Skills inventory models for recruitment from demographic data
  • Variable risk models (12)
  • Cost models with 22 parameters
  • Customer requirements models (10 parameters)
  • Market forecast

With data visualization, metrics data analysis would be better, faster, and more creative. Before we resort to rigorous statistical methods, data visualization can be used as a convenient first-cut analysis with significant benefits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset