The separate function

The separate function is provided for really dirty data, that is, data having some columns where two or more attributes are merged together. You can get a sense of this by looking at the following table:

Time Ttemperature
15/05/217 12:25:33 16
15/05/217 13:25:33 16
15/05/217 14:25:33 17
15/05/217 15:25:33 15

 

As you can see, the time column actually stores two different types of information – the date and hour of recording. How do we tidy that in order to comply with the every column shows an attribute rule? As you may be guessing, the separate function comes to help here. To apply it to your data you just have to run separate() on it, specifying the minimum set of arguments you would expect:

  • column, which column you would like to split, unquoted
  • into, the name of the new columns you would like to create from the messy one, in the form of a vector of characters
  • sep, which is the token the function is going to employ to identify the end of a column and the beginning of another

Going back to our really dirty data, this is how the tidying code will look:

really_dirty_data %>% 
separate(time, sep = " ", into = c("date", "hour"))
date hour temperature
15/05/17 12:25:33 16
16/05/17 13:25:33 16
17/05/17 14:25:33 21
18/05/17 15:25:33 15

 

Which is now a tidy table, having an attribute per column and a record per row.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset