Foreword

Dr. Dursun Delen has written a concise, information-rich book that effectively provides an excellent learning tool for someone who wants to understand what analytics, data mining, and “Big Data” are all about. As business becomes increasingly complex and global, decision-makers must act more rapidly and accurately, based on the best available evidence. Modern data mining and analytics are indispensable for doing this. This reference makes clear the current best practices, demonstrates to readers—students and practitioners—how to use data mining and analytics to uncover hidden patterns and correlations, and explains how to leverage these to improve decision-making.

The author delivers the right amount of concept, technique, and cases to help readers truly understand how data mining technologies work. Coverage includes data mining processes, methods, and techniques; the role and management of data; tools and metrics; text and web mining; sentiment analysis; and integration with cutting-edge Big Data approaches, as presented as follows.

In Chapter 1, he commendably traces the roots of analytics from World War II times to the present, illustrated by Figure 1.2, where he takes the reader from Decision Support Systems in the 1970s, to the Enterprise/Executive IS Systems in the 1980s, to the Business Intelligence (BI) that we all heard about in the 1990s and early 2000s, and finally to our modern day uses of analytics (2000s) and Big Data (2010s). That was all in Chapter 1, creating a preamble for what is to come in the rest of the book: data mining.

Chapter 2 provides a very easy-to-understand description and an excellent taxonomy for data mining. In this chapter, the author differentiates data mining from several other related terminologies, making a strong case for what it really stands for: discovery of knowledge. Identifying data mining as a problem-solving and decision-making philosophy that sits in the midst of many disciplines is quite refreshing; many people think of data mining as a new discipline of its own. With a number of real-world examples, intuitive graphics, and down-to-earth discussion, this chapter demystifies data mining for the masses. In my opinion, this is an excellent way to portray seemingly complex and highly technical topics like data mining to a wider audience.

In Chapter 3, Dr. Delen provides a rich collection of different approaches to standardized data mining processes in a manner that any reader can understand. KDD (knowledge discovery in databases) is the first standardized process that the chapter talks about, which was developed by Usama Fayyad, an early pioneer in the field. Dr. Delen presents KDD in an engaging discussion enhanced with a diagram (Figure 3.1), which illustrates the flow of the KKD data mining process. Additionally, other data mining “schemas” proposed by various groups and individuals are examined to show the development of the fundamental thinking in this field. To illustrate the usefulness of these schemas, Dr. Delen presents a data mining case study at the end of this chapter: “Mining Cancer Data for New Knowledge.”

Chapter 4 considers the types of data used in data mining, including the ever-increasing use of text data (that is, unstructured non-numerical data, which is probably 90% of the data available to the world today). Data preparation is the most important part of data mining: Data must be clean and good in order to develop useful models (“garbage in, garbage out”); thus, up to 90% of the time involved in data mining can be taken up by the data preparation stage. Dr. Delen goes into all the ways of looking at data to get it clean and ready for data analytics, including developing the train and test data sets, giving one of the most learner-friendly visuals of k-fold cross-validation in Figure 4.6.

In Chapter 5, Dr. Delen describes the most common data mining algorithms in a way that the layperson can understand. Among others, neural networks and SVM (support vector machines) are described thoroughly, with illustrations that help the reader really understand these complicated mathematical processes. Dr. Delen makes his own original illustrations, and they are well worth the price of the book!

Text mining (text analytics) is described thoroughly in Chapter 6, with Dr. Delen starting out with a diagram he originally made for our 2012 book, Practical Data Mining (on which I am the lead author: Miner, G.D.; Delen, D.; Elder, J.; Fast, A.; Hill, T.; and Nisbet, B. Elsevier/Academic Press: 2012). Dr. Delen effectively distills our large 1,100 page book into one chapter that tells it all—in other words, is very useful for the new learner. Well done!

In the last chapter, Chapter 7, Dr. Delen goes into the new buzz word in the analytics field: Big Data analytics. Big Data is heard in the news almost daily. What does it mean? It means different things to different people. But I can tell you that, working in the data mining field for more than 15 years now, I have been dealing with Big Data all that time. But the ever-decreasing cost of storage space for data and the availability of cloud storage, plus the availability of faster and faster computers, mean that even a small laptop can do both distributed processing and multi-threading in data analysis. This has made even the small tablet more powerful than the warehouse of air-conditioned mainframe servers of decades ago. One can even run a bank of servers and cloud storage from one’s smartphone these days. So as the data becomes “bigger,” the physical needs to process it become “smaller.”

But Big Data is misunderstood by most, at least it seems that way to me. Many think that data mining requires Big Data. But I have worked with medical residents for 10 years who want to look at lots of variables in their one-year research project but usually can only get a fraction of the cases they need for that many variables in their limited time; thus, traditional statistics are of almost no use to these paltry data sets by traditional statistics standards; yet by using machine-learning, modern data mining methods, I have found that one can usually generate useful hypotheses from these small data sets and find knowledge that was previously impossible to obtain with only traditional p-value Fischerian statistics. Traditional statistics were an anomaly of the twentieth century; prior to 1900, Bayesian statistics had predominated in data analysis for centuries; with the advent of the year 2000, the new modern versions of Bayesian statistics—including the SVM, NN, and other machine-learning modalities—had come of age, and we are now back into the Bayesian age in this twenty-first century. Unfortunately, it is taking a while for the “traditionally statistical training” cadre to understand and catch up...but the cutting edge is with Bayesian, data mining, and Big Data.

Anyone wanting to learn about data mining and have a technical understanding of the topic should get this book. By the end of the read, you will understand the field!

—Gary D. Miner, Ph.D.
Author of two PROSE Award–winning analytics books
Senior Analyst, Healthcare Applications Specialist
Dell, Information Management Group, Dell Software

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset