Data modelling notions

We are therefore considering the default event as the variable to be explained, that is, the response variable. We then have all the other attributes that we hope will explain our response variable, we call them explanatory variables. To go from the explanatory variables to the response variables, we need to establish some kind of rule or relationship. We call this rule a function. However, we need to emphasize that the relationship between them is casual or asymmetric. Calling Y the response variable and x,x2 ... xn the set of explanatory variables, we use to formalize this concept as:

To be true, we usually add an element, which is called the error term and is noted as ε. This expresses the impossibility of defining a model able to exactly reproduce the true underlying phenomenon. Sources of this error are:

  • The inadequacy of the model
  • Error in measures
  • Variation of the phenomenon over time

We can therefore write our previous equation as:

Starting from this, a whole world opens up to understand how to estimate that small letter f and reduce the even smaller letter ε. To let you orientate, we can distinguish between two big modeling strategies:

  • Supervised learning
  • Unsupervised learning
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset