0%

Book Description

What you’ll learn—and how you can apply it

You’ll learn to perform efficient data carpentry—the process of taking rough, raw, and to some extent randomly arranged input data and creating neatly organized and tidy data. Working with clean data will be beneficial for every subsequent stage of your R project.

In this Lesson, readers will learn how to create user-friendly data frames with tibble, reshape data with tidyr operations such as gather and separate, process data efficiently with dplyr’s functions, and connect R to a range of database types.

This lesson is for you because

You are working on a project in R and have reached the data processing stage. You want to clean, manipulate, and tidy your dataset to get it ready for the next stage (typically modeling and visualization).

Prerequisites

  • Some knowledge of R

Materials or downloads needed in advance

  • Installed RStudio

This Lesson relies on a number of packages for data cleaning and processing. Check that they are installed on your computer and load them with:

  • library("tibble")
  • library("tidyr")
  • library("stringr")
  • library("readr")
  • library("dplyr")
  • library("data.table")



RSQLite and ggmap are also used in a couple of examples, though they are not central to the Lesson’s content.

Table of Contents

  1. Efficient Data Frames with tibble
    1. Exercise
  2. Tidying Data with tidyr and Regular Expressions
    1. Make Wide Tables Long with gather()
    2. Split Joint Variables with separate()
    3. Other tidyr Functions
    4. Regular Expressions
    5. Exercises
  3. Efficient Data Processing with dplyr
    1. Renaming Columns
    2. Changing Column Classes
    3. Filtering Rows
    4. Chaining Operations
    5. Exercises
    6. Data Aggregation
    7. Exercises
    8. Nonstandard Evaluation
  4. Working with Databases
    1. Databases and dplyr
    2. Exercises
    3. References