Chapter 8
Using and Enhancing Data

One characteristic of data that differentiates it from other assets is that it is not “consumed” when used. Different people and processes can even use the same data at the same time, or use the same data multiple times without depleting it.35 Not only is data non-deplete-able, but many uses of data actually create more data. For example, aggregations and calculations of existing data sets create new data sets, as do predictive models created by data scientists. In many cases, these new data sets will continue to be produced and updated. They require management. They need to be defined and supported through Metadata. Expectations related to their quality must also be defined. Their access and use must be governed.

This chapter will look at activities within the data lifecycle where data is used and enhanced, including:

  • Master Data Usage
  • Business Intelligence
  • Data Science
  • Analytics
  • Data Visualization
  • Data Monetization

Master data usage

The use of Master Data provides a good illustration of how using data is directly connected to enhancing data. Well-managed Master Data allows an organization to have a good understanding of the entities (customers, clients, vendors, products, etc.) with which it interacts and transacts business.

In the process of transacting business, an organization learns more about these entities – what they buy, what they sell, how best to contact them. What it learns may be stored at the level of the transaction, but organizations collect data that is necessary to maintain their Master Data (e.g., changes of address, updates to contact information, etc.). Transactional data also allows them to obtain additional data (e.g., customer or client preferences, buying patterns, and the like) that can enhance their Master Data. While the dynamic interaction between different uses of data should be accounted for when planning for overall data management, this is the particular focus of Master Data Management.

Business intelligence

The development of Business Intelligence reporting is another activity where the use of data results in the creation of new data that requires a level of ongoing management.

The term Business Intelligence (BI) has two meanings.

  • First, it refers to a type of data analysis aimed at understanding organizational activities and opportunities. When people say that data holds the key to competitive advantage, they are articulating the promise inherent in Business Intelligence activity: that if an organization asks the right questions of its own data, it can gain insights about its products, services, and customers that enable it to make better decisions about how to fulfill its strategic objectives.
  • Secondly, Business Intelligence refers to a set of technologies that support this kind of data analysis. BI tools enable querying, data mining, statistical analysis, reporting, scenario modeling, data visualization, and dash-boarding. They are used for everything from budgeting to operational reporting and business performance metrics to advanced analytics.

BI is a primary driver for data warehousing, since traditional BI activities require reliable data sources that are integrated for usage. BI tools must support data exploration, as well as reporting. BI can evolve quickly as analysts use data. A successful program must have reliable foundational processes to:

  • Maintain and enhance the core data used in BI reporting and enable incorporation of new data
  • Maintain and enhance the BI tool set
  • Manage Metadata related to BI reports, so that stakeholders understand the reports themselves
  • Document the lineage of data in reports so that stakeholders know where the data came from
  • Provide a data quality feedback loop, so that reports remain trustworthy and opportunities are identified to enhance them

In short, managing the data created by a BI program follows the lifecycle management steps that are part of overall data management.

Data science

Data science has existed for a long time. It used to be called applied statistics. But the capability to explore data patterns has quickly evolved in the twenty-first century with the advent of Big Data collection and storage technologies.

Data science merges data mining, statistical analysis, and machine learning with data integration and data modeling capabilities, to build predictive models that explore data content patterns. The term data science refers to the process of developing predictive models. The data analyst (or data scientist) uses the scientific method (observation, hypothesis, experimentation, analysis, and conclusion) to develop and assess an analytic or predictive model.

The data scientist develops a hypothesis about behavior that can be observed in the data prior to a particular action. For example, the purchase of one type of item is usually followed by the purchase of another type of item (the purchase of a house is usually followed by the purchase of furniture). Then, the data scientist analyzes large amounts of historical data to determine how frequently the hypothesis has been true in the past and to statistically verify the probable accuracy of the model.36

If a hypothesis is valid with sufficient frequency, and if the behavior it predicts is useful, then the model may become the basis for an operational intelligence process to predict future behavior, even possibly in real time such as suggestive selling advertisements.

In some ways, data science can be understood as an extension of BI. In other ways, however, it takes data analysis and use to a very different level. Traditional Business Intelligence provides ‘rear-view mirror’ reporting – analysis of structured data to describe past trends. In some cases, BI patterns are used to predict future behavior, but not with high confidence.

Until recently, in-depth analysis of enormous data sets has been limited by technology. Analyses have relied on sampling or other means of abstraction to approximate patterns. As the capacity to collect and analyze large data sets has grown, data scientists have integrated methods from mathematics, statistics, computer science, signal processing, probability modeling, pattern recognition, machine learning, uncertainty modeling, and data visualization in order to gain insight and predict behaviors based on Big Data sets. In short, data science has found new ways to analyze and extract knowledge from data. In many cases, this knowledge can be translated into economic value.

As Big Data has been brought into data warehousing and BI environments, data science techniques can provide a forward-looking (‘windshield’) view of the organization. Predictive capabilities, real-time and model-based, using different types of data sources, offer organizations better insight into where they are heading.

Data science models become sources of data. They need to be monitored and mined for insights. Like other forms of science, data science creates new knowledge and also new hypotheses. Testing hypotheses results in new models and new data. All of these pieces require management if they are to create value over time. Models need to be ‘trained’ and evaluated. New data sources can be incorporated into existing models. As with other data, the lifecycle of data to support data science efforts needs to be accounted for as part of planning and strategy.

Predictive and prescriptive analytics

Much of data science is focused on the desire to create predictive models, though not all who create and use such models are data scientists. The simplest form of predictive model is the forecast. Predictive Analytics is the sub-field of supervised machine learning, rooted in statistics, where users attempt to model data elements and predict future outcomes through evaluation of probability estimates.

Predictive Analytics leverages probability models based on variables (including historical data) related to possible events (purchases, changes in price, etc.). When it receives other pieces of information, the model triggers a reaction by the organization. The triggering factor may be an event, such as a customer adding a product to an on-line shopping basket, or it may be data in a data stream, such as a news feed or utility sensor data, or an increased volume of service requests. The triggering factor may be an external event. News being reported about a company may serve as a predictor of a change in stock price. Predicting stock movement should include monitoring news and determining if news about a company is likely to be good or bad for the stock price.

Frequently, the triggering factor is the accumulation of a large volume of real-time data, such as an extremely high number of trades or requests for service or volatility of the environment. Monitoring a data event stream includes incrementally building on the populated models until a threshold is reached that activates the trigger.

The amount of time that a predictive model provides between the prediction and event predicted is frequently very small (in seconds or less). Investment in very low latency technology solutions, such as in-memory databases, high-speed networks, and even physically proximity to the source of the data, optimizes an organization’s ability to react to the prediction.

Prescriptive analytics takes predictive analytics a step further to define actions that will affect outcomes, rather than just predicting the outcomes from actions that have occurred. Prescriptive analytics anticipates what will happen, when it will happen, and implies why it will happen. Because prescriptive analytics can show the implications of various decisions, it can suggest how to take advantage of an opportunity or avoid a risk. Prescriptive analytics can continually take in new data to re-predict and re-prescribe. This process can improve prediction accuracy and result in better prescriptions. Table 3 summarizes the relationship between traditional BI and data science.

Data visualization

Visualization is the process of interpreting concepts, ideas, and facts by using pictures or graphical representations. Data visualization facilitates understanding of the underlying data by summarizing it in a visual form, such as a chart or graph. Data visualizations condense and encapsulate characteristics data, making them easier to see. Doing so, they can surface opportunities, identify risks, or highlight messages.37

Visualization has long been critical to data analysis. Traditional BI tools include visualization options such as tables, pie charts, lines charts, area charts, bar charts, histograms, and turnkey boxes (also called candlesticks).

Figure 22, a control chart, represents a classic example of data visualization. It allows the viewer to quickly grasp how data has changed over time. Depending on what the chart reveals, an analyst might take a closer look at the details.

Figure 23 shows a simple example of data visualization, a “Home Energy Report” presented by ENMAX, a utilities company based in Alberta, Canada, to its consumers. This infographic helps the consumers understand their home’s energy use in relation to the population of similar homes and to the population of efficient homes. While this report doesn’t talk about recommendations to save energy, it potentially helps the consumers ask relevant questions and set appropriate goals.38

The principles in these simple examples are extended significantly in data science applications. Data visualization is critical to data science because without it, interpretation of data is almost impossible. Patterns in a large data set can be difficult if not impossible to recognize in a numbers display. A pattern can be picked up fairly quickly when thousands of data points presented in a visual display.

Data visualizations can be delivered in a static format, such as a published report, or a more interactive online format. Some technologies for visualization enable analysts to move between layers of data, through filters or the ability to ‘drill-in’ to data. Others allow the visualization to be changed by the user on demand through innovative displays, such as data maps and moving landscapes of data over time.

To meet the growing need to understand data, the number of visualization tools has increased and techniques have improved. As data analytics matures, visualizing data in new ways will offer strategic advantages. Seeing new patterns in data can result in new business opportunities. As data visualization continues to evolve, organizations will have to grow their Business Intelligence teams to compete in an increasingly data-driven world. Business analytical departments will seek data experts with visualization skills (including data scientists, data artists, and data vision experts), in addition to traditional information architects and data modelers, especially given the risks associated with misleading visualization.

A critical success factor in implementing a data science approach is the alignment of the appropriate visualization tools to the user community. Depending on the size and nature of the organization, there are likely many different visualization tools being applied in a variety of processes. Ensure that users understand the relative complexity of the visualization tools. Sophisticated users will have increasingly complex demands. Coordination between enterprise architecture, portfolio management, and maintenance teams will be necessary to control visualization channels within and across the portfolio. Be aware that changing data providers or selection criteria will likely have downstream impacts to the elements available for visualization which can impact the effectiveness of tools.

It is a best practice to establish a community that defines and publishes visualization standards and guidelines and reviews artifacts within a specified delivery method; this is particularly vital for customer- and regulatory-facing content.

As do other uses of data, data visualization creates new data sets, in the form of the visualizations themselves, and in the methods by which data is combined so that it can be presented in a graphical format. You guessed it. This data must also be managed.

Data monetization

Any organization engaged in Data Science or other forms of analytics is likely to gain valuable insight about its own customers, products, services, and processes. Advanced analytics can generate insight about external entities as well. Such an organization is also likely to develop techniques that might be valuable to others. If these insights and techniques can be packaged and sold, then an organization would be leveraging its data not only as an asset, but as a product. In some circles, direct data monetization is perceived as the holy grail of data management. Some companies (Dun & Bradstreet, Google, Amazon) have made a business of monetizing their data. But selling data and information is not the only way to get value from data assets.

In Monetizing Data Management, Peter Aiken and Juanita Billings point out that few organizations exploit the strategic advantage they may gain from data, “an organization’s sole non-deplete-able, non-degrading, durable, strategic asset”.39 They make the case that improving data management practices is the first means of getting more value from data. An organization that puts a monetary value on effective data management practices will produce higher quality data and be able to do more with it.

Aiken and Billings assert that good data management practices are also the foundation for successful innovation of data uses. Poor data management practices, on the other hand, cost money and introduce risk to new initiatives and existing processes. The authors present case studies documenting that bad data management practices can result in direct waste through redundant work and, with it, the creation of redundant data, poor or missing Metadata, confusing processes, and incorrect information. They also provide examples of the benefits of disciplined data management practices. For example, clear and executable Metadata management practices increase organizational knowledge and make that knowledge transferrable.

Douglas Laney’s Infonomics, a full-length study on managing information as an asset, presents a wide array of case studies demonstrating how organizations have leveraged their information assets to create value. While the industries, activities, and products differ, deriving economic value from data boils down to two basic methods:

  • Exchanging information for goods, services, or cash
  • Using information to increase revenue, reduce expenses, or manage risk

Laney presents 12 business drivers for monetizing data. One of the first ways is to get value is to use organizational data more effectively to retain existing customers, enter new markets, and create new products. But Laney goes beyond the obvious. For example, better data can improve organizational efficiency by enabling a company to reduce maintenance costs, negotiate for better terms and conditions, detect fraud and waste, or defray the costs of managing data.

Beyond being able to execute their operations, many organizations have barely scratched the surface of the promise of getting value from their data. For some, as Aiken and Billings’s and Laney’s case studies show and other research confirms, low quality data is a significant liability. Others, however, have been able to break through, with operational improvements as well as direct monetization. Case studies show that innovation uses of data require reliable data management. While not every organization will want to sell its data, all organizations want to have confidence in the decisions they make based on their data. The first step in this direction is to manage the data well.

What you need to know

  • When an organization uses data, it also creates new data that needs to be managed throughout its lifecycle. The requirements for lifecycle management are frequently missed in developing analytics.
  • This new data is often the most valuable data an organization can possess because it is the source of insight.
  • Due to evolving technologies and methods, this new data is may be created in ways that impact how data management requirements can be met.
  • While new technologies offer innovative ways of working with data, they also exist alongside and interact with legacy data and legacy technology.
  • Many organizations seek to get value from their data through monetization. A logical starting point is to improve data management practices. This work can both improve efficiency and create optimal conditions for direct monetization.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset