Using the select verb for data processing

The dataset usually contains a large number of variables that are not relevant for every type of analysis. Working with the entire dataset consumes more memory, and it is recommended that you use only the smaller number of variables for the analysis that is required to achieve the task. Taking the smaller number of variables from the entire dataset is usually known as subsetting, but when the term subset has been used, the user could interpret this in two ways: subset of dataset with a smaller number of variables and also subset by taking fewer rows from the entire dataset. In the dplyr library, these two aspects are covered by the select() and filter() verbs. In this recipe, you will subset a dataset by taking only a handful of variables by using the select() verb.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset