Visualizing the number of movements per day of the week

As a warm up, let's visualize the number of movements per day. We are going to do this starting from the daily_summary data frame. How do we do that? First of all, let's choose the right type of chart. I can wait for you here while you look at the previous section and try to work out which is the best chart for our purposes.

Since we are going to compare the values of the same attribute (the number of movements) across different values of a categorical variable (day of the week), our best possible choice is going to be a bar plot.

What do we need for a bar plot in ggplot?

  • A ggplot() call, passing the following:
    • The daily_summary object as the data argument
    • name_of_the_day and number_of_movements as x and y aesthetics,  the values represented in the x and the y axes
  • geom_bar() as geometry, specifying that no statistical computation has to be performed to the data specified as aesthetic, through the stat argument, set to identity

Leveraging the well-known pipe operator, we can write the plot code as follows:

daily_summary %>%
ggplot(aes(x = name_of_the_day,y = number_of_movements)) +
geom_bar(stat = 'identity')

Here it is, our new data visualization:

Monday clearly appears to be the day where the greatest number of movements are performed, while we apparently tend to rest during the weekend. We can easily add clarity to this plot by rotating the bars. This will increase the level of readability of the plot since we tend to be more able to compare horizontal lines than the vertical ones. Which layers do you think are impacted by this rotation of the bars? We would probably be tempted to answer The geom layer.  Well, that would be wrong, since the layer that determines how the bars are oriented on the canvas is the coordinates plot. We can then add a coord_flip() call after the code we already showed to rotate our bars:

daily_summary %>%
ggplot(aes(x = name_of_the_day,y = number_of_movements)) +
geom_bar(stat = 'identity') +
coord_flip()

To be fair, knowing how many movements we perform by day of the week is not actually a salient point within our financial habits investigation: what if, on the weekend, we perform fewer movements but more conspicuous ones? We should add to the number information the amount information, which we already summarized within the daily summary data frame.

Let us think about this plot: what would we like to show altogether? Both the number of movements and the amount of those movements for any given day of the week. We will, therefore, have one x variable and two y variables. Moreover, we will probably need one geometry layer for the number of movements and one for the amount of those movements.

An elegant way to show this information is to draw a line proportional to the number of movements for each day of the week and place, at the top of this line, a point that is proportional to the amount of those movements. We can draw such a plot by employing two new geometries, geom_linerange and geom_point.

The first one requires us to specify where to start drawing the line, that is, the minimum value of y for every x, and where to stop the line.

geom_point draws a point for every x-y couple found within the data. One relevant feature of this geometry is the ability to map the point size to the value of another attribute available in the data frame specified within the ggplot() call. For instance, we could decide to map the size of the point to the hour of the day in which the transaction was performed, having points greater as the time gets later.

Within our data visualization, we are going to pass the following specification to our geometries:

  • The minimum y for the line range will be set to zero since we want every line to start from zero
  • The maximum y will be the number of movements for any x
  • We are going to make every point size proportional to the mean amount of movements of any given day of the week

Finally, we are going to add labels printing out the mean amount for any given day. This will be done by specifying one more aesthetic within the ggplot() call, namely the label one, and adding a geom_text() layer:

daily_summary %>%
ggplot(aes(x = name_of_the_day, y = number_of_movements, label = number_of_movements)) +
geom_linerange(aes(ymin = 0, ymax = number_of_movements)) +
geom_point(aes(size = (sum_of_entries + sum_of_expenses)/number_of_movements))+
geom_text(nudge_y = 1.7)+

coord_flip()

This is what we get :

We are going to comment on the result in a moment, but first, let's have a little break and look back at the code we just executed: it's started to become quite a serious piece of code, and we are just warming up!

What this plot clearly shows is that Monday has got the greater absolute number of movements, but those movements tend to have a very small average amount. The movements performed on Thursdays have the greatest value. Are there any recurring expenses we have on a Thursday? This deserves some further analysis, so we'll need to get back to the raw data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset