How it works...

In the dplyr framework, the first input is the entire dataset which works like the attach() function in base R. The use of the pipe operator gives you the facility to call the variable names without any quote similar to accessing the variables after applying the attach() function. The select() function takes the first input as the entire data frame and then takes the names of variables separated by commas. In this example, the variables were QUARTER, MONTH, ORIGIN, DEST, DEP_DELAY, and ARR_DELAY.

To show the significance of using select(), the example has more code such as group_by(), do(), and summarize(). These additional three functions have been used to show the capability of processing larger datasets efficiently and on the fly. There is no need to create an intermediate data frame for performing further analysis.

In the preceding example from the previous section, a linear regression model has been fitted between arrival delay and departure delay for each quarter of the year 2016, and the resultant intercept and slope have been collected into a data frame. If you want to do the same operation using the base R functionality, you need more memory and processing time compared to the dplyr framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset