Thomas W. Dinsmore

Disruptive Analytics

Charting Your Strategy for Next-Generation Business Analytics

Thomas W. Dinsmore

Newton, Massachusetts, USA

Any source code or other supplementary materials referenced by the author in this text is available to readers at www.apress.com . For detailed information about how to locate your book’s source code, go to www.apress.com/source-code/ .

ISBN 978-1-4842-1312-4

e-ISBN 978-1-4842-1311-7

DOI 10.1007/978-1-4842-1311-7

Library of Congress Control Number: 2016950565

© Thomas W. Dinsmore 2016

Disruptive Analytics: Charting Your Strategy for Next-Generation Business Analytics

Managing Director: Welmoed Spahr

Acquisitions Editor: Robert Hutchinson

Developmental Editor: Matt Moodie

Technical Reviewer: Robert A. Muenchen

Editorial Board: Steve Anglin, Pramila Balen, Laura Berendson, Aaron Black, Louise Corrigan, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Nikhil Karkal, James Markham, Susan McDermott, Matthew Moodie, Natalie Pao, Gwenan Spearing

Coordinating Editor: Rita Fernando

Copy Editor: Kezia Endsley

Compositor: SPi Global

Indexer: SPi Global

Cover Designer: Isaac Ruiz Soler

For information on translations, please e-mail [email protected] , or visit www.apress.com .

Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/bulk-sales .

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Ann

Apress Business: The Unbiased Source of Business Information

Apress business books provide essential information and practical advice, each written for practitioners by recognized experts. Busy managers and professionals in all areas of the business world—and at all levels of technical sophistication—look to our books for the actionable ideas and tools they need to solve problems, update and enhance their professional skills, make their work lives easier, and capitalize on opportunity.

Whatever the topic on the business spectrum—entrepreneurship, finance, sales, marketing, management, regulation, information technology, among others—Apress has been praised for providing the objective information and unbiased advice you need to excel in your daily work life. Our authors have no axes to grind; they understand they have one job only—to deliver up-to-date, accurate information simply, concisely, and with deep insight that addresses the real needs of our readers.

It is increasingly hard to find information—whether in the news media, on the Internet, and now all too often in books—that is even-handed and has your best interests at heart. We therefore hope that you enjoy this book, which has been carefully crafted to meet our standards of quality and unbiased coverage.

We are always interested in your feedback or ideas for new titles. Perhaps you’d even like to write a book yourself. Whatever the case, reach out to us at [email protected] and an editor will respond swiftly. Incidentally, at the back of this book, you will find a list of useful related titles. Please visit us at www.apress.com to sign up for newsletters and discounts on future purchases.

The Apress Business Team

Introduction

Disruption: In business, a radical change in an industry or business strategy, especially involving the introduction of a new product or service that creates a new market.

From its birth in 1979, Teradata led the field in data warehousing. The company built a reputation for technical acumen, serving customers like Walmart and Citibank; analysts and implementers alike rated the company’s massively parallel databases “best in class.” After a 2007 spinoff from NCR, the company grew by double digits.

On August 6, 2012, Teradata released its earnings report for the second quarter. Results excelled; revenue was up 18% and earnings per share (EPS) up 28%. Teradata stock traded at $80, five times its value four years earlier.

“We are increasing our guidance for constant currency revenue growth and EPS for 2012,” wrote CEO Mike Koehler.

In retrospect, that moment was Teradata’s peak. Over the next three and a half years, the company lost 75% of its market value, as it repeatedly missed revenue and earnings targets. In 2015, Koehler announced a restructuring and sale of company assets; several top executives departed. Finally, after a brutal first quarter earnings report, Koehler himself stepped down in May 2016.

Management blamed many factors for the sluggish sales: long sales cycles, a sluggish economy, and unfavorable currency movement. But worldwide spending on business analytics increased during this period and some vendors reported double-digit revenue growth.

Blaming Teradata’s struggles on poor leadership would be easy. But the company’s growth problems in the last few years are not unique: in the same period, Oracle and IBM suffered declining revenue; Microsoft and SAP failed to grow consistently, disappointing investors; and SAS had to walk back embarrassing projections of double-digit growth, recording low single-digit gains.

In short, while businesses continue to invest in analytics, they aren’t buying what the industry leaders are selling.

Meanwhile, a steady stream of innovation creates new value networks in the business analytics marketplace:

Open Source Analytics. With substantial gains in the last several years, open source software makes deep inroads in the analytics community. Surveys show that working data scientists prefer open source R and Python over commercial software. Technology leaders like Oracle, IBM, and Microsoft rush to get on the open source bandwagon.

Hadoop and its Ecosystem. As Hadoop matures, it competes successfully with data warehouse appliances, even displacing them. Technology consultant Gartner estimates that 42% of all enterprises now use Hadoop. A few years ago, data warehousing vendors laughed at Hadoop; they aren’t laughing today.

In-Memory Analytics. As the cost of memory declines, fast and scalable high-performance analytics are within reach for any organization. Adoption of open source Apache Spark, an open source project for scalable in-memory computing, increases exponentially. With more than a thousand contributors, Spark is the most active open source project in Big Data.

Streaming Analytics . Organizations face a growing volume of data in motion, driven in part by the Internet of Things (IoT). Today, there are no less than six open source projects for streaming analytics in the Apache ecosystem. In-memory databases position themselves as streaming engines for hybrid transactional/analytical processing (HTAP).

Analytics in the Cloud. When Amazon Web Services introduced its Redshift columnar database in 2012, it lacked many of the features available in competing data warehouses. For many businesses, however, Amazon offered a compelling value proposition: “good enough” functionality, at a fraction of the cost of a Teradata warehouse. The leading cloud services all report double-digit revenue growth; Gartner estimates that 44% of all businesses use the cloud.

Deep Learning . Cheap high-performance computing power makes Deep Learning practical. NVIDIA releases its DGX-1 chip for Deep Learning, with the power of 250 servers; Cray announces its Urika-GX appliance with up to 1,728 cores and 35 terabytes of solid-state memory. Meanwhile, Google releases its TensorFlow framework to open source and declares that it uses Deep Learning in “hundreds” of applications.

Self-Service Analytics. With an easy-to-learn user interface and robust connectors to data sources, Tableau turns the business intelligence software industry upside down and grows its revenues tenfold while established Business Intelligence vendors struggle to adapt. Other startups position themselves to bring the self-service model to other disciplines, such as OLAP and machine learning.

This is not another book that hypes Big Data. Petabytes of data are worthless unless they answer a business question; the tsunami of data produced by the digital economy is simply a fact of life that managers must address. Whether you manage a multinational or drive a truck, your business produces more data than ever; you will either use it or discard it, but one way or the other, you must make an informed decision.

In a disrupted business analytics market, managers must focus ruthlessly on needs for insight, then build systems and processes that satisfy those needs. Understanding the innovations described in these chapters is a step toward that end, but the focus must remain on the demand for insight and the value chain that delivers it.

Innovations do not spring fully formed from the mind of an inventor; they are the end result of a long process of tinkering. Many of the most ­significant innovations we describe in this book are more than 50 years old; they emerge today for various reasons, such as the long-run decline of computing costs. We present a historical perspective at several points in this book so the reader can distinguish between that which is really new and that which is simply repackaged and rebranded.

In the middle chapters of this book, we present a survey of a key innovation in business analytics. These chapters include detailed information about available software products and open source projects. In general, we do not cover offerings from industry leaders, under the premise that these companies have ample marketing budgets to build awareness of their products.

We close the book with a handbook for managers: specific strategies to profit from disruptive innovation. Some of these strategies may seem radical; if this disturbs you, put this book down—it’s not for you. But if you are ready to embrace disruptive innovation, and profit by it, read on.

Acknowledgments

Many thanks to Bob Muenchen of the University of Tennessee; Bob graciously agreed to serve as the technical reviewer of this book and spent many hours reading and commenting on chapter drafts.

Also thanks to Oliver Vagner, Senior Director, Data Analytics at TGI Fridays, and to Professor Dr. Diego Kuonen of the University of Geneva and Statoo Consulting, each of whom provided valuable suggestions and guidance for the book’s recommendations to managers.

Thanks also to many other people in the industry who have provided insight into the ideas and topics covered in this book, including Jeremy Achin, Data Scientist and CEO, DataRobot; Bruno Aziza, Chief Marketing Officer, AtScale; Charlie Berger, Oracle; Michael Berthold, President, KNIME.com AG; Arno Candel, Chief Architect, H2O.ai; David Champagne, Principal Software Engineering Manager, Microsoft; Michael Chu, Project Leader, The Boston Consulting Group; Davide Consiglio, Principal, The Boston Consulting Group; Boxuan Cui, Data Scientist, Smarter Travel; David Erdreich, Knowledge Expert, The Boston Consulting Group; Lee Edlefson, Principal Software Engineer, Microsoft; Ali Ghodsi, CEO, Databricks; Mario Inchiosa, Principal Software Engineer, Microsoft; Bill Jacobs, Microsoft; Paul Kent, VP, Big Data, SAS; Josh Klahr, Vice President of Product Management, AtScale; Bill Lehmann, Investor, Bain Capital Ventures; Matthew Madden, Director of Product Marketing, Alteryx; Xiangrui Meng, Software Engineer, Databricks; Ingo Miersewa, Founder and CTO, RapidMiner; Derek Norton, Senior Data Scientist, Microsoft; Thomas Ott, Marketing Data Scientist, RapidMiner; Sean Owen, Director of Data Science, Cloudera; Krishnan Parasuraman, VP Sales, Splice Machine; Zoltan Prekopcsak, VP Big Data, RapidMiner; Peter Prettenhofer, Data Scientist, DataRobot; Dan Putler, Chief Scientist, Alteryx; David Rich, Strategic Advisor and Executive Coach; Razi Razuddin, VP, Strategic Business Development, DataRobot; Vincent Saulys, Senior Director, Advanced Surveillance Development, Financial Industry Regulatory Authority; Joseph Sirosh, Corporate Vice President, Data Group, Microsoft; Dan Steinberg, President, Salford Systems; Ben Strauss, Alteryx and Tableau Evangelist, The Boston Consulting Group; Gregory Todd, Principal, PricewaterhouseCoopers; Deenar Torasker, Founder, ThinkReactive; Tom Ventura, Global IT Director, The Boston Consulting Group; Reynold Xin, Chief Architect, Databricks; David Wang, Director of Product Marketing, Databricks; and Bill Zanine, Thought Leader, Netezza Analytic Advisory Services, IBM. If I have inadvertently omitted anyone from this list, I apologize in advance.

To my clients, who I cannot name, who have made it possible for me to devote a significant amount of time to this book.

Finally, many thanks to the people at Apress who agreed to publish this book, and to members of the editorial and production staff who contributed to the final product.

About the Author and About the Technical Reviewer

About the Author

A367683_1_En_BookFrontmatter_Figb_HTML.jpg

Thomas W. Dinsmore is an independent consultant and author who specializes in advanced analytics and machine learning.

In his consulting career, Mr. Dinsmore has served in expert roles for The Boston Consulting Group, PricewaterhouseCoopers, Oliver Wyman, IBM Big Data Solutions, and the SAS Institute. He has also served as Director of Product Management for Revolution Analytics (now a division of Microsoft.)

Mr. Dinsmore has more than 30 years of experience in advanced analytics. He has led or contributed to solutions for AT&T, Banco Santander, Citibank, Dell, J. C. Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, Vodafone, and many other clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, the United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia, and Singapore.

Mr. Dinsmore has working experience with most of the leading tools for advanced analytics. He is the co-author of Modern Analytics Methodologies (FT Press, 2014) and Advanced Analytics Methodologies (FT Press, 2014) and publishes The Big Analytics Blog. He earned an MBA from the Wharton School, The University of Pennsylvania, and a BA from Boston University.

About the Technical Reviewer

A367683_1_En_BookFrontmatter_Figc_HTML.jpg

Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users . He is also the creator of r4stats.com , a popular web site devoted to analyzing trends in data science software and helping people learn the R language. Bob is an ASA Accredited Professional Statistician™ with 30 years of experience and is currently the manager of OIT Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has taught workshops on research computing topics for more than 500 organizations and has offered training in partnership with DataCamp.com, Revolution Analytics, RStudio, New Horizons Computer Learning Centers, and Xerox Learning Services. Bob has written or co-authored over 70 articles published in scientific journals and conference proceedings, and has provided guidance on more than 1,000 graduate theses and dissertations.

Bob has served on the advisory boards of SAS Institute, SPSS Inc., Intuitics OOD, StatAce OOD, the Statistical Graphics Corporation, and PC Week Magazine . His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS, and several R packages. His research interests include statistical computing, data graphics and visualization, text analytics, and data mining.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset