Chapter 4. Collecting and Organizing Information

After demonstrating your talents by solving Zhuge Liang's puzzle, his assistant provided you with documents summarizing the resources of the Shu army. These documents contain data on gold, equipment, and soldiers. Prior to analyzing these data in R, it is critical that you prepare and organize them. This process will make your subsequent work more clear and efficient.

In this chapter, we will focus on collecting and organizing the information that is available to us. You will encounter several new techniques in R along the way. By the end of this chapter, you will be able to:

  • Import external data into R
  • Use variables to organize and manipulate your data
  • Manage the R workspace

Time for action - importing external data

Our first task is to pull external resource data into R, so we can begin to examine it. To accomplish this, open the R console and proceed through the following steps:

  1. Set your R working directory using the setwd(dir) function. The path used in the following code acts as an example. Your working directory should be set to a relevant location on your own computer:
    > #set the R working directory
    > #replace the sample location with one that is relevant to you
    > setwd("/Users/johnmquick/rBeginnersGuide/")
    
  2. Copy the hanzhongResources.csv file into your R working directory. This file contains resource information for the Shu forces that are currently recuperating in Hanzhong.
  3. Read the resource file into R using the read.csv(file) command:
    > #use read.csv(file) to read an external data file into R
    > #Shu resources located in Hanzhong, China
    > read.csv("hanzhongResources.csv")
    
  4. R will read and display the contents of the file, and the result is shown in the following screenshot:
    Time for action - importing external data

These data indicate that your forces in Hanzhong currently have 1,000,000 each of gold and provisions, 100,000 soldiers, and equipment that is in mint condition.

What just happened?

After setting your working directory, you encountered a new function. Its syntax differs from the commands that we have previously used.

read.csv(file)

In read.csv(file), a period is placed between the function name read and the csv attribute. The term csv told the read function that the data in our file contained comma-separated values. It is important to distinguish which read function we want to use, because it can take on a number of alternative forms, such as read.S and read.SPSS.

The file portion of the read.csv(file) function is similar to dir in setwd(dir). Since we placed our data file in our working directory, the file argument needed only to contain a file name and extension. Had the data been placed elsewhere, a complete file path would have been necessary.

comma-separated values (csv) files

Throughout this book, we will use comma-separated values, or CSV, data files. This is the recommended file type for importing data into R. However, you should be aware that R can accept data from a wide variety of sources. Therefore, you can typically import from whichever sources you may use.

Pop quiz

  1. What is the key difference between the function arguments dir and file?

    a. The dir argument contains a path, whereas the file argument contains a filename.

    b. The dir argument contains a path to a directory folder, whereas the file argument contains a path to a file.

    c. Functions beginning with read receive the file argument, whereas functions beginning with set receive the dir argument.

    d. There is no difference between the dir and file arguments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset