Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Loading your data into R with rio packages

The rio package is a relatively recent R package, developed by Thomas J. Leeper, which makes data import and export in R painless and quick.

This objective is mainly reached when rio makes assumptions about the file format. This means that the rio package guesses the format of the file you are trying to import and consequently applies import functions appropriate to that format.

All of this is done behind the scenes, and the user is just required to run the import() function.

As Leeper often states when talking about the package: "it just works."

One of the great results you can obtain by employing this package is streamlining workflows involving different development and productivity tools.

For instance, it is possible to produce tables directly into sas and make them available to the R environment without any particular export procedure in sas, we can directly acquire data in R as it is produced, or input into an Excel spreadsheet.

Getting ready

As you would expect, we first need to install and load the rio package:

install.packages("rio")
library(rio)

In the following example, we are going to import our well-known world_gdp_data dataset from a local .csv file.

How to do it...

The first step is to import the dataset using the import() function:
```
messy_gdp ← import("world_gdp_data.csv")
```
Then, we visualize the result with the RStudio viewer:
```
View(messy_gdp)
```

How it works...

We first import the dataset using the import() function. To understand the structure of the import() function, we can leverage a useful behavior of the R console: putting a function name without parentheses and running the command will result in the printing of all the function definitions.

Running the import on the R console will produce the following output:

function (file, format, setclass, ...) 
{
    if (missing(format)) 
        fmt <- get_ext(file)
    else fmt <- tolower(format)
    if (grepl("^http.*://", file)) {
        temp_file <- tempfile(fileext = fmt)
        on.exit(unlink(temp_file))
        curl_download(file, temp_file, mode = "wb")
        file <- temp_file
    }
    x <- switch(fmt, r = dget(file = file), tsv = import.delim(file = file, 
        sep = "	", ...), txt = import.delim(file = file, sep = "	", 
        ...), fwf = import.fwf(file = file, ...), rds = readRDS(file = file, 
        ...), csv = import.delim(file = file, sep = ",", ...), 
        csv2 = import.delim(file = file, sep = ";", dec = ",", 
            ...), psv = import.delim(file = file, sep = "|", 
            ...), rdata = import.rdata(file = file, ...), dta = import.dta(file = file, 
            ...), dbf = read.dbf(file = file, ...), dif = read.DIF(file = file, 
            ...), sav = import.sav(file = file, ...), por = read_por(path = file), 
        sas7bdat = read_sas(b7dat = file, ...), xpt = read.xport(file = file), 
        mtp = read.mtp(file = file, ...), syd = read.systat(file = file, 
            to.data.frame = TRUE), json = fromJSON(txt = file, 
            ...), rec = read.epiinfo(file = file, ...), arff = read.arff(file = file), 
        xls = read_excel(path = file, ...), xlsx = import.xlsx(file = file, 
            ...), fortran = import.fortran(file = file, ...), 
        zip = import.zip(file = file, ...), tar = import.tar(file = file, 
            ...), ods = import.ods(file = file, ...), xml = import.xml(file = file, 
            ...), clipboard = import.clipboard(...), gnumeric = stop(stop_for_import(fmt)), 
        jpg = stop(stop_for_import(fmt)), png = stop(stop_for_import(fmt)), 
        bmp = stop(stop_for_import(fmt)), tiff = stop(stop_for_import(fmt)), 
        sss = stop(stop_for_import(fmt)), sdmx = stop(stop_for_import(fmt)), 
        matlab = stop(stop_for_import(fmt)), gexf = stop(stop_for_import(fmt)), 
        npy = stop(stop_for_import(fmt)), stop("Unrecognized file format"))
    if (missing(setclass)) {
        return(set_class(x))
    }
    else {
        a <- list(...)
        if ("data.table" %in% names(a) && isTRUE(a[["data.table"]])) 
            setclass <- "data.table"
        return(set_class(x, class = setclass))
    }
}

As you can see, the first task performed by the import() function calls the get_ext() function, which basically retrieves the extension from the filename.

Once the file format is clear, the import() function looks for the right subimport function to be used and returns the result of this function.

Next, we visualize the result with the RStudio viewer. One of the most powerful RStudio tools is the data viewer, which lets you get a spreadsheet-like view of your data.frame objects. With RStudio 0.99, this tool got even more powerful, removing the previous 1000-row limit and adding the ability to filter and format your data in the correct order.

When using this viewer, you should be aware that all filtering and ordering activities will not affect the original data.frame object you are visualizing.

There's more...

As fully illustrated within the Rio vignette (which can be found at https://cran.r-project.org/web/packages/rio/vignettes/rio.html), the following formats are supported for import and export:

Format	Import	Export
Tab-separated data (`.tsv`)	Yes	Yes
Comma-separated data (`.csv`)	Yes	Yes
CSVY (CSV + YAML metadata header) (`.csvy`)	Yes	Yes
Pipe-separated data (`.psv`)	Yes	Yes
Fixed-width format data (`.fwf)`	Yes	Yes
Serialized R objects (`.rds`)	Yes	Yes
Saved R objects (`.RData`)	Yes	Yes
JSON (`.json`)	Yes	Yes
YAML (`.yml`)	Yes	Yes
Stata (`.dta`)	Yes	Yes
SPSS and SPSS portable	Yes (`.sav` and `.por`)	Yes (`.sav` only)
XBASE database files (`.dbf`)	Yes	Yes
Excel (`.xls`)	Yes
Excel (`.xlsx`)	Yes	Yes
Weka Attribute-Relation File Format (`.arff`)	Yes	Yes
R syntax (`.R`)	Yes	Yes
Shallow XML documents (`.xml`)	Yes	Yes
SAS (`.sas7bdat`)	Yes
SAS XPORT (`.xpt`)	Yes
Minitab (`.mtp`)	Yes
Epiinfo (`.rec`)	Yes
Systat (`.syd`)	Yes
Data Interchange Format (`.dif`)	Yes
OpenDocument Spreadsheet (`.ods`)	Yes
Fortran data (no recognized extension)	Yes
Google Sheets	Yes
Clipboard (default is `.tsv`)

Since Rio is still a growing package, I strongly suggest that you follow its development on its GitHub repository, where you will easily find out when new formats are added, at https://github.com/leeper/rio.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Loading your data into R with rio packages

Create new playlist

Sign In

Sign Up

Loading your data into R with rio packages

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Loading your data into R with rio packages