After demonstrating your talents by solving Zhuge Liang's puzzle, his assistant provided you with documents summarizing the resources of the Shu army. These documents contain data on gold, equipment, and soldiers. Prior to analyzing these data in R, it is critical that you prepare and organize them. This process will make your subsequent work more clear and efficient.
In this chapter, we will focus on collecting and organizing the information that is available to us. You will encounter several new techniques in R along the way. By the end of this chapter, you will be able to:
Our first task is to pull external resource data into R, so we can begin to examine it. To accomplish this, open the R console and proceed through the following steps:
setwd(dir)
function. The path used in the following code acts as an example. Your working directory should be set to a relevant location on your own computer:> #set the R working directory > #replace the sample location with one that is relevant to you > setwd("/Users/johnmquick/rBeginnersGuide/")
hanzhongResources.csv
file into your R working directory. This file contains resource information for the Shu forces that are currently recuperating in Hanzhong. read.csv(file)
command:> #use read.csv(file) to read an external data file into R > #Shu resources located in Hanzhong, China > read.csv("hanzhongResources.csv")
These data indicate that your forces in Hanzhong currently have 1,000,000 each of gold and provisions, 100,000 soldiers, and equipment that is in mint condition.
After setting your working directory, you encountered a new function. Its syntax differs from the commands that we have previously used.
In read.csv(file)
, a period is placed between the function name read
and the csv
attribute. The term csv
told the read
function that the data in our file contained comma-separated values. It is important to distinguish which read
function we want to use, because it can take on a number of alternative forms, such as read.S
and read.SPSS
.
The file
portion of the read.csv(file)
function is similar to dir
in setwd(dir)
. Since we placed our data file in our working directory, the file
argument needed only to contain a file name and extension. Had the data been placed elsewhere, a complete file path would have been necessary.
Throughout this book, we will use comma-separated values, or CSV, data files. This is the recommended file type for importing data into R. However, you should be aware that R can accept data from a wide variety of sources. Therefore, you can typically import from whichever sources you may use.
dir
and file?
a. The dir
argument contains a path, whereas the file
argument contains a filename.
b. The dir
argument contains a path to a directory folder, whereas the file
argument contains a path to a file.
c. Functions beginning with read
receive the file
argument, whereas functions beginning with set
receive the dir
argument.
d. There is no difference between the dir
and file
arguments.