Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Getting a sense of your data structure with R

By following the recipes given in the previous chapter, you got your data. Everything went smoothly, and you may also already have the data as a data frame object.

However, do you know what your data looks like?

Getting to know your data structure is a crucial step within a data analysis project. It will suggest the appropriate treatment and analysis, and will help you avoid error and redundancy in the coding activity that follows.

In this recipe, we will look at a dataset structure by leveraging the describe() function from the Hmisc package. For further preliminary analysis on your data structure, you can also refer to the data visualization recipes in Chapter 3, Basic Visualization Techniques.

Getting ready

This example will be built around a dataset provided in the RStudio project related to this book.

You can download it by authenticating your account at http://packtpub.com.

This dataset is named world_gdp_data.csv and stores GDP values for 248 countries around the globe, from 1960 to 2015.

Before you begin with this recipe, you will need to load this data into R by leveraging the import function from the rio package:

install.packages("rio")
library(rio)
messy_gdp <- import("world_gdp_data.csv")

You can refer to the Loading your data into R with rio packages recipe in Chapter 1, Acquiring Data for Your Project, for details on this powerful tool's functionalities.

As mentioned earlier, we will employ functions from the Hmisc and e1071 packages.

Use the following code to install and load packages:

install.packages(c("Hmisc","e1071")
library(Hmisc)
library(e1071)

How to do it...

Create a data dictionary:
```
data_dictionary <- describe(messy_gdp)
```

Save your data dictionary as a separate file to document it:

sink("data_dictionary.txt", append=TRUE)
data_dictionary
sink()

Look at your data dictionary:

file.show(file = "data_dictionary.txt",pager = "internal")

How it works...

Preforming step 1 will produce a data_dictionary object, which is a list of as many lists as there are columns in your data frame plus one, the contents of which we are going to discover lately.

For each column, the following details are exposed:

Variability domain, showing the lowest and highest values
Number of non-missing values
Number of missing values
Number of unique values
For categorical variables (for instance, country names), a frequency table is produced, showing the number of occurrences for each possible value of the variable

The last list is populated only if the columns of all missing values are read and contain the name of those columns.

Step 2 lets you create a document to which you will be able to refer, even outside R, mainly for documentation purposes. This step will produce a .txt file named data_dictionary placed within the current directory of your R session.

Since the data_dictionary object is a list object, we can't simply save it as a .txt file (we could easily do this with the write() function when dealing with a data frame). So, we used a workaround involving the sink() function.

This function sends the output of R to an external connection.

The logical phases of this process are as follows:

Establish a connection by running sink() for the first time
Run the R code you are interested in
Close the connection by running sink()again

Step 3 is the final step and involves calling the file.show function to show you your previously created data dictionary. Be aware that changing the pager argument to console would make the .txt file content show up in the R console.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Getting a sense of your data structure with R

Create new playlist

Sign In

Sign Up

Getting a sense of your data structure with R

Getting ready

How to do it...

How it works...

Table of Contents for
Getting a sense of your data structure with R