Developing the data visualization

As I said, having tidy data gives us the opportunity to leverage all the functionalities of ggplot. But, let's start with something basic so that you can see why these functionalities are needed. Just draw a scatterplot, with x the revenues vector and y the y vector:

 show_lm %>% 
ggplot(aes(x = revenues, y = y))+
geom_point()

You see? Can you distinguish here which are the fitted and which are the observed values? You could guess it, I know, but we can't rely on guessing. This is why we are going to employ the color aesthetics, setting it to type so that it will distinguish the color of our point based on the value of type:

show_lm %>% 
ggplot(aes(x = revenues, y = y, colour =type))+
geom_point()

It is now bold and clear what is fitted and what is not. What can you see there? I see that our model seems to not be doing an excellent job, since the fitted data is falling quite far from the observed data. If we look closer at the data, we can also see that company_revenues is not that relevant to explain our default_numeric variable, since you find both a relevant number of performing and defaults (zero and one) for low revenues and high revenues (left and right of the x axis). 

We will look at some more structured performance metrics later when performing our multiple linear regression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset