The spread function

spread does exactly what it promises—spreads data along columns. It takes tables expressed in the long form and makes them wide. To see how it works, we can once again use the table previously used, showing cities and their temperatures for a bunch of dates. Let's recreate it with some lines of code:

city <- c(rep("london",4),rep("moscow",4),rep("new_york",4), rep("rome",4))
date <- c(rep(c("june_2016","july_2016","august_2016","september_2016"),4))
temperature <- c(15,18,19,18,17,20,19,11,22,26,26,23,23,27,26,22) #source : wunderground.com
temperature_table <- data.frame(city,date,temperature)

You should now be familiar with vector and data frame creation. Nevertheless, you can always go back to Chapter 1Why to Choose R for Your Data Mining and Where to Start; if any doubts arise.

What the spread function requires in order to work is the identification of one key column and one value column:

  • The key column is the one used to label the multiple columns we are going to create. Within our example, the key column will be the date column, since we want to spread this attribute in a wide form.
  • The value column is the one showing the value we are going to employ to populate the newly created columns. As you may be guessing, within our example it will be constituted from the temperature one.

These two are the only two arguments of the spread function that are to be considered mandatory, since no default value is provided for them and therefore running the function without providing them will raise an error. One more argument you should be aware of when dealing with this function is the fill function. Imagine that no temperature was recorded during July 2016 in London. This would not result in a problem while we are in the long form, since it would just translate into one missing row. What about the wide form we are going to create? How should the cell corresponding to the month of July 2016 for the city of London be filled? This is exactly what the fill argument takes care of.

Let's give the function a try, passing date and temperature as key and value:

temperature_table %>% 
spread(key = date, value= temperature)-> wide_temperature

Running these two lines of code will results in the following:

city august_2016 july_2016 june_2016 september_2016
1 london 19 18 15 18
2 moscow 19 20 17 11
3 new_york 26 26 22 23
4 rome 26 27 23 22

 

This is what we were looking for: a wide table from the long one we had. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset