Introduction

Humans make hundreds of decisions in their personal and professional lives every day. A decision is a response to a specific situation (Reason, 1990), such as where to eat tonight, whether to accept an invitation, if a candidate should be hired, or if a promotional discount will help sales. The more background information available, the better the human mind’s cognitive ability to make a good decision from a variety of possibilities. Analytics is all about trying to get a computer system to do the same. This book is about analytics—how data is collected and converted into information, how it transcends into knowledge, how that knowledge is used to make decisions, and how to constantly evaluate and improve those decisions.

This is a practitioner’s handbook on how to plan, design, and build analytics solutions to solve business problems. When we go about our daily lives and conduct our day-to-day activities, we produce and consume large amounts of data thanks to the digital age we live in. Historically, data was always produced in nature like weather systems and crop yields, but its storage, processing, analysis, and decision making were all done in the human mind through insights gleaned over years of experience and observation—although this wasn’t called “data” in those days. The really smart people did well—they were the wise and experienced who advised and influenced decisions in families, tribes, and kingdoms. They always learned from their own and other’s past experiences and extracted insights that were used to make decisions that managed day-to-day activities of households, towns, and governments. Fast forward to modern times, with a proliferation of business conducted through computing, and this acquisition and analysis of data becomes mainstream and enables new ways of looking at data. The evolution of this computing complimented with digital communication exploded the amount of data produced and analyzed, putting the numeric system under severe stress (try counting the zeros in 100s of petabytes leading toward zettabytes) (Mearian, 2007).

To put the human society’s relation with data and its use in context, Ifrah (2002) quoted:

Suppose every instrument could by command or anticipation of need execute its function on its own; suppose that spindles could weave of their own accord and plectra strike the strings of zithers by themselves; then craftsmen would have no need of hand-work and masters have no need of slaves.

This quote is from Aristotle who died over 2300 years ago, but was prescient in his analysis of where mankind will be heading. A careful analysis of this statement identifies several themes that attract a lot of research, development, and investment today through the field of analytics. Automation, analysis, and prediction are common themes of using data that Aristotle hints about and we, after 2300 years, are finally starting to realize. This should bring some excitement into analytics and into this book, as our belief in data-driven anticipation and automated decisions is part and parcel of our society’s function, and it has been evolving for 2000 years and will continue to evolve. Data, information, and computing are going to remain part and parcel of this evolution for decades to come. Therefore, this is not something that will vanish and be replaced by the next cool thing that shows up in a few years. The techniques and technology will change, but the principle of learning from the past is inherent in our psyche.

In this book, the process of collecting data, learning from it, anticipating scenarios, automating decisions, and tuning and monitoring is labeled an analytics solution. It is a very complex implementation challenge, and I have attempted to tackle that challenge in this book through simplification.

Analytics is not a new field of interest. There is evidence of mathematics- and statistics-based techniques being applied to business problems going as far back as 1956 (May, 2009). The proliferation of data from global businesses, web-based commerce, smartphones and smart meters, social media and gaming platforms, and various machine sensors found in automobiles, aircrafts, and construction machinery (also referred as Big Data), combined with advances in computing, storage, and specialized software, has brought new energy and excitement in this field. Typically, analytics solutions are built and run by people with advanced degrees in mathematics and statistics and they use sophisticated software packages like SAS™, SPSS, and MatLab (Mathworks)™ to solve niche problems in economics, finance, and sales and marketing. On the other hand, the last 20 years have seen analytical reporting proliferate across all business functions through the implementation of data warehouses and business intelligence systems. Since data is the greatest competitive asset that an organization has (Redman, 2008) and all parts of the organization use it for reporting and analysis through their data warehouse, why is analytics confined to select specialized areas? This question led to the writing of this book.

The purpose of writing this book revolves around four objectives, and the entire material in the book is designed to achieve these objectives. These also serve as myth-busters for myths surrounding Big Data and Business Analytics regarding technology, expertise, and expense.

Objective 1: Simplification. Since the overall implementation of an analytics solution is actually quite complex, the first step is to simplify the concepts, tools, and techniques and explain how they fit into an analytics puzzle. Part of that simplification is defining analytics as well as various topics, such as predictive modeling, regression, clustering, scoring, ETL (extract, transform, and load), decision strategies, etc. The simplified explanation is enough to make use of these concepts in building analytics solutions and yet have the foundation to attempt larger and more sophisticated solutions.

Objective 2: Commoditization. The entire methodology presented for delivering analytics solutions consistently and repeatedly utilizes commodity technology and human skills. Commodity here refers to tools, technologies, and skills that are not proprietary or, in other words, are not very expensive. The foundation of an analytics solution is built on existing business intelligence and data warehouse programs that are now an essential part of information technology (IT) portfolio. This book will show how existing working components of business intelligence can be leveraged to build analytics solutions. The material also covers the merits of analytics solutions built using proprietary resources that are useful for a very focused and specialized business case usually within one industry. There is a serious case argued toward data mining over established statistical techniques to drive the implementation towards a commoditized solution.

Objective 3: Democratization. This is the most important objective and it is tied into the motivation that resulted in this book. The importance and power of analytics should not be limited to a handful of business cases in financial and marketing space. Its use, application, and adoption can harness value out of any business function, such as procurement, facilities management, human resources, field operations, call centers, project management, etc., most of which are typically cost centers and hardly have the resources to adopt analytics. The methodology presented here will show how simplified and cost-effective deployment of analytics (commodity) enables middle management to improve their KPIs (key performance indicators). The myth of the data scientist (Davenport, 2012) is challenged through this objective, and that role is broken into a functional side coming from business operations and a technical side using commoditized implementation rendering the true data scientist’s role, limited to a handful of very specialized areas ensuring Data Scientist is not a prerequisite to getting value from data.

Objective 4: Innovation. Innovation here does not refer to either the creative or artistic aspect of product design or to breakthrough ideas that turn around companies and create new industries like smartphones, social media, shale energy, etc. The perspective of innovation in context of analytics refers to more of a process and culture of innovation that needs to be created in an organization (Drucker, 2002). Like the other three objectives, the idea is to use analytics to innovate within existing business operations and improve their key performance indicators for collective benefit. The myth that you can use analytics solutions to come up with brilliant and game-changing strategies is countered with a simpler alternative. Instead of using analytics in one specialized area to generate a 20% improvement in profitability (not a small feat if possible at all), this book takes the approach of using incremental business process innovation in dozens of functional areas with each contributing 2% to 3% to increased profitability and therefore creating a culture of constant improvement and innovation across all facets of the business.

Organization of Book

The book is organized in three parts. Part 1, Chapters 13, is more conceptual and technology agnostic—the chapters set up the stage, define the terminology, and explain analytics with a simplistic view. Part 2, Chapters 46, deals with actual analytics model design, building, and testing, and then putting the model into production for proactive business decisions. Part 3, Chapters 711, contains more specific implementation details dealing with people process and technology. Anyone who has worked with data in spreadsheets or been involved with budgeting or some kind of financial planning, sales volume, or revenue forecasts, will find Parts 1 and 2 very easy and simple to understand, and will be able to follow the content with no need for technical knowledge.

Part 1

Chapter 1 on Defining Analytics first differentiates between current business intelligence and reporting types of activities from what analytics. Then it presents simplified definitions of some very complex techniques and concepts differentiating between mathematics- and statistics-based techniques from data mining. People with experience in quantitative modeling will find the definitions and explanations extremely basic and simplistic; they are welcome to skip to subsequent chapters.

Chapter 2 presents a hierarchy and an evolutionary representation of how data should be used from basic utilization to the highest possible value out of data. This hierarchy is titled Information Continuum, and it shows how a traditional operational system and its data utilization is different from data warehouse and its data utilization. And yet how analytics and its data usage is different from both data warehouse and operational systems. There is no skipping of intermediary levels to get to the highest levels of Information Continuum. This will be very useful for organizations and teams to assess their current situation and then see how they would reach the holy grail of automated decisions stepping through the Information Continuum stages.

Chapter 3 on Using Analytics is one of two chapters that contribute toward achieving all the four objectives mentioned earlier. More than a dozen examples from several different industries are presented as problem statements and then the definitions from the first chapter are used to show:

■ How these problems can be solved using a repeatable process and commoditized solution employing data mining.

■ The common patterns or themes emerging across these varying problems to provide a thought process for finding opportunities that can be solved with analytics.

These patterns are critical in creating a culture where midlevel managers look at their operations and activities and identify problems themselves that can be solved using analytics solutions.

Part 2

Chapters 46 cover the specific techniques and concepts that make analytics a powerful tool for business improvement. These chapters cover what quants, forecasters, and predictors have been using for over three decades. Not only the model design, testing and tuning is explained but automated decision strategies on models are also covered in detail along with their governance.

Chapter 4 on models and its variables defines the input variables and then explains their design, evaluation, and testing. These input variables are used to build models (e.g., predictive models and forecasting models). Models are stress-tested, tuned, and replaced and this chapter walks readers through the entire process. No deep technical knowledge of databases, computer science, data mining, or statistics and mathematics is needed to learn from this chapter.

Chapter 5 also addresses all four objectives of this book. This chapter is on decision strategies, which is an essential part of analytics but is not a topic that is widely covered in mainstream analytics material. The analytics model is built using the patterns and insights from historical data, but once it is built, it is supposed to be used within operational activities for proactive actions. If a model predicts that customer ABC is going to cancel his wireless phone subscription in the next three months, what should the wireless firm do? That is what decision strategy is about. Design, implementation, and tuning approaches are provided in great detail to help readers make use of models by utilizing the insights from the models. Business operations managers (mid-level managers particularly) will find this very useful because they can treat the analytics model as a black box that IT will implement for them, but the output from that model is directly tied to a business reaction on that insight. Out-of-the-box thinkers and bold mid-level managers will jump on the idea of constantly building new strategies for specific business scenarios and will develop a habit of constantly designing new and innovative strategies on the output of the same model. Functional and business-oriented professionals will find this chapter to be extremely useful in creating a culture of business innovation. With decision strategies leading to automated business decisions comes the need for an audit and control mechanism.

Chapter 6 on Audit, therefore, discusses audit and control and shows how these controls are designed, implemented, and integrated in the solution.

Part 3

The third part of this book is targeted toward IT practitioners who have been involved with data-centric applications—that is, applications that deal with large and complex data-driven activities like data warehouse systems and analytical applications.

Chapter 7 presents a blueprint for an analytics adoption roadmap pilot project. It explains how to pick a problem, find business champions of the idea, and how to deliver the pilot project using existing infrastructure and tools. It achieves that by demonstrating how data warehouse projects are launched, accepted, and adopted by all areas of a business. Just like all managers want reporting to manage their operations, analytics models and decision strategies should follow the same path of demand and delivery.

Chapter 8 on requirements addresses the “chicken and egg” problem where IT asks what you need built and business keeps asking show me what you can do. This dilemma is true for analytics solutions more than for enterprise resource planning (ERP) or customer relationship management (CRM) solutions since they are based on existing business processes. Analytics deals with anticipation of future scenarios and their responses, therefore, business cannot articulate in sufficient detail what exactly they need. The chapter shows how a problem statement is identified using the foundation laid out in Chapter 3 on Using Analytics and then converts that into a formal requirements solicitation process rather than a requirements gathering process.

Chapter 9 takes a real-world example and then builds the entire analytics solution through all its stages just like any other software development methodology.

Chapter 10 covers the roles, responsibilities, and organizational structure for an analytics team that delivers analytics solutions across an entire organization. This chapter also covers various architecture challenges of building analytics solutions since there are various moving parts dealing with large amounts of data on one side and operational integration for automated decisions on the other.

Chapter 11 is a collection of three independent topics that have a direct impact on analytics solutions and the objectives of this book. The three topics (Big Data, Hadoop, and Cloud) are explained to demystify them and make them accessible to IT practitioners dealing with data-centric systems. Removing confusion, jargon, and marketing buzz from these topics combined with other material in this book will allow IT professionals to put these concepts into their proper place for planning, implementation, and deployment against specific analytics projects.

There is a small section at the end of the book titled “Conclusion” that attempts to show how the objectives for this book laid out here, were addressed throughout the entire book and whether the material was successful in achieving its goal.

Audience

This book is written for two different sets of readers. IT practitioners working or desirous of working in the Big Data space will find the entire book extremely useful for their knowledge and careers. The first two parts of the book are going to be extremely helpful to midlevel managers, graduate students in business, and other professionals involved in business operations who find data-driven business operations an intriguing proposition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset