Scatterplots

Scatterplots are actually the most basic kind of chart you could imagine: on the x axis you have one variable, on the y you have the other one, and each data is marked with a point, being a couple of x and y. Even if they are really basic, they tend to be rather powerful for visualizing the existence and intensity of dependence between two variables. This is so true that, as our friend Anscombe used to say:

"Before anything else is done, we should scatterplot the y values against the x values and see what sort of relation there is."

Let's start with a scatterplot showing time against cash flow: we are going to use the geom_point geom here, which is shown as follows:

cash_flow_report %>% 
ggplot(aes(x = y, y = cash_flow))+
geom_point()

What can you see? I see a quite stationary movement around an average value of 100k euros, which is coherent with what our studies about distribution and standard deviation told us. Moreover, I can see that our famous outlier does not seem to be part of a trend, but seems to have appeared from nowhere. It would actually be interesting to add one or more bits of info to our plot, and here I am talking about the geographic area. Let's try to add it as a grouping variable, which in ggplot can be easily done through the group aesthetic:

cash_flow_report %>% 
ggplot(aes(x = y, y = cash_flow, group = x, colour = x))+
geom_point()

It definitely looks more colored now, nevertheless, I am not sure we have added any meaning to it. The only relevant confirmation we get is that the outlier is still coming from the Middle East. We need to connect points pertaining to the same region and see if any kind of trend appears. To do that, we can create a mixed chart, joining the scatterplot with a line chart:

cash_flow_report %>% 
ggplot(aes(x = y, y = cash_flow, group = x, colour = x))+
geom_point()+
geom_line()

Here we are:

Neat! We now see clearly all we have been talking about, such as low variability, absence of clear trends, and the recent outlier from the Middle East. Our objective is met: we found the origin of the drop. Good job!

We now need to polish this plot up a bit since we are going to use it when sharing our results with the boss. Let's do the following:

  • Add a title and a subtitle to the plot
  • Add explicative names to the axis label
  • Add the source of the plot as a caption
  • Fix the coloring by removing the gray space and lightening the weight of the line, since it doesn't keep any message relevant to our point
  • Add some explicative text next to the Middle East outlier
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset