Validating our data

Now that our data is tidy and structured in a convenient way for the purposes of our subsequent analyses, a further question arises, is our data good for the analyses? This question refers to the quality of our data. This is a non-trivial aspect of data mining, and you can easily understand why by reflecting on this popular quote: Garbage in, garbage out.

Within our context, this means that if you put as an input poor quality data, you will get unreliable results, meaning results that you should not base your decision on. This main tenet carries on a lot of activities that are regularly performed within companies to ensure that acquired or produced data is free from material quality problems.

But what exactly is a data quality problem? 

First of all, we should focus on what data quality is. This property of data is commonly intended both as the fitness for use and the conformance to standards.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.