How to Address a Data Mining Problem – Data Cleaning and Validation

This chapter is where our real journey begins (finally!, I can hear you exclaiming). We are now familiar enough with R and the data mining process and architecture to get involved with a real problem.

I say real problem I actually mean real, since we are going to face something that actually happened and that actually puzzled a non-trivial number of people in a real company. Of course, we are going to use randomized dataF here and fictitious names, nevertheless, this will not remove any pathos to the problem. We are shortly going to get immersed into some kind of mystery that actually came up, and we will need to solve it, employing data mining techniques.

I know you may be thinking: OK, don't make it too serious, is it something which actually already got solved? You would be right, but what if something similar pops up for you some day in the future? What would you do? The mystery we are going to face will not be presented in the typical way: here is a table, apply models to it, and tell us which fits best. This is not how things usually work in real life. We will just be provided with news about a problem and some unstructured data to look at, asking for an answer.

Are you ready? Do you need to look back at the previous pages? No problem, I will wait for you here.

Table of Contents for How to Address a Data Mining Problem – Data Cleaning and Validation

Create new playlist

Sign In

Sign Up

Table of Contents for
How to Address a Data Mining Problem – Data Cleaning and Validation