The Data Mining Process - CRISP-DM Methodology

At this point, our backpack is quite full of exciting tools; we have the R language and an R development platform. Moreover, we know how to use them to summarize data in the most effective ways. We have finally gained knowledge on how to effectively represent our data, and we know these tools are powerful. Nevertheless, what if a real data mining problem suddenly shows up? What if we return to the office tomorrow and our boss finally gives the OK: Yeah, you can try using your magic R on our data, let's start with some data mining on our customers database; show me what you can do. OK, this is getting a bit too fictional, but you get the point—we need one more tool, something like a structured process to face data mining problems when we encounter them.

When dealing with time and resource constraints, having a well-designed sequence of steps to accomplish our objectives becomes a crucial element to ensure the data mining activities success. You may therefore be wondering whether some kind of golden rule about how to conduct data mining projects was ever set out. It actually was, around 1996, by a pool of leading industries, based on their data mining experiences.

Since then, this methodology has spread to all major industries and is currently considered a best practice within the data mining realm. That is why it is a really good idea to learn it from the very beginning of your data mining journey, letting it shape your data mining behavior based on what the best in the class do.

Before getting into this, there is a final note on this chapter within the general flow of the book. The concepts we are going to look at more theoretically here are going to be more fully examined in future chapters, in particular:

  • Business understanding in Chapter 5, How to Address a Data Mining Problem – Data Cleaning and Validation
  • Data understanding  in Chapters 5How to Address a Data Mining Problem – Data Cleaning and Validation and Chapter 6, Looking into Your Data Eyes – Exploratory Data Analysis
  • Data preparation in Chapter 5, How to Address a Data Mining Problem – Data Cleaning and Validation and Chapter 6, Looking into Your Data Eyes – Exploratory Data Analysis
  • Modeling in Chapters 7, Our First Guess – a Linear Regression,  to Chapter 12, Looking for the Culprit – Text Data Mining with R.
  • Deployment in Chapter 13, Sharing Your Stories with Your Stakeholders through R Markdown

Moreover, you should understand that we are going employ the tools acquired here to face the same data mining issues throughout the book, so that by the end of it you will have experienced one real-life data mining cycle from the very beginning to the end. 

No more additional notes now, since it is time to get into the actual description of what Crisp-DM is!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset