While retracing the development process behind our fire attack strategy, we encountered a key series of steps that are common to every analysis that you will conduct in R. Regardless of the exact situation or the statistical techniques used, there are certain things that must be done to yield an organized and thorough R analysis. Each of these steps is detailed.
Perhaps it goes without saying that the thing to do before beginning any R analysis is to launch R itself. Nevertheless, it is mentioned here for completeness and transparency.
Once R is launched, the first common step is to set your working directory. This can be done using the setwd(dir)
function and subsequently verified using the getwd()
command:
> #Step 1: set your working directory > #set your working directory using setwd(dir) > #replace the sample location with one that is relevant to you > setwd("/Users/johnmquick/rBeginnersGuide/") > #once set, you can verify your new working directory using getwd() > getwd() [1] "/Users/johnmquick/rBeginnersGuide/"
Note that commented lines, which are prefixed with the pound sign (#), appeared before each of our functions in step one. It is vital that you comment all of the actions that you take within the R console. This allows you to refer back to your work later and also makes your code accessible to others.
This is an opportune time to point out that you can draft your code in other places besides the R console. For example, R has a built in editor that can be opened by going to the File | New Document/Script menu or simultaneously pressing the Command + N or Ctrl + N keys. Other free editors can also be found online. The advantages of using an editor are that you can easily modify your code and see different types of code in different colors, which helps you to verify that it is properly constructed. Note however, that to execute your code, it must be placed in the R console.
After you set the working directory, it is time to pull your data into R. This can be achieved by creating a new variable in tandem with the read.csv(file)
command:
> #Step 2: Import data (or load an existing workspace) > #read a dataset from a csv file into R using read.csv(file) and save it into a new variable > dataset <- read.csv("datafile.csv")
Alternatively, if you were continuing a prior data analysis, rather than starting a new one, you would instead load a previously saved workspace using load.image(file)
. You can then verify the contents of your loaded workspace using the ls()
command.
> #load an existing workspace using load.image(file) > load.image("existingWorkspace.RData") > #verify the contents of your workspace using ls() > ls() [1] "myVariable 1" [2] "myVariable 2" [3] "myVariable 3"
Regardless of the type or amount of data that you have, summary statistics should be generated to explore your data. Summary statistics provide you with a general overview of your data and can reveal overarching patterns, trends, and tendencies across a dataset. Summary statistics include calculations such as means, standard deviations, and ranges, amongst others:
> #Step 3: Explore your data > #calculate a mean using mean(data) > mean(myData) [1] 1000 > #calculate a standard deviation using sd(data) > sd(myData) [1] 100 > #calculate a range (minimum and maximum) using range(data) > range(myData) > [1] 500 2000
Also recall R's summary(object)
function, which provides summary statistics along with additional vital information. It can be used with almost any object in R and will offer information specifically catered to that object:
> #generate a detailed summary for a given object using summary(object) > summary(object)
Note that there are often other ways to make an initial examination of your data in addition to using summary statistics. When appropriate, graphing your data is an excellent way to gain a visual perspective on what it has to say (data visualization is the primary topic of Chapter 8 and Chapter 9 of this book). Furthermore, before conducting an analysis, you will want to ensure that your data are consistent with the assumptions necessitated by your statistical methods. This will prevent you from expending energy on inappropriate techniques and from making invalid conclusions.
Here is where your work will differ from project to project. Depending on the type of analysis that you are conducting, you will use a variety of different techniques. For example, in this book we have primarily used regression analysis. Regression is but one of an endless number of potential methods. The correct techniques to use will be determined by the circumstances surrounding your work.
> #Step 4: Conduct your analysis > #The appropriate methods for this step will vary between analyses.
At the conclusion of your analysis, you will always want to save your work. To have the option to revisit and manipulate your R objects from session to session, you will need to save your R workspace using the save.image(file)
command, as follows:
> #Step 5: Save your workspace and console files > #save your R workspace using save.image(file) > #remember to include the .RData file extension > save.image("myWorkspace.RData")
To save your R console text, which contains the log of every action that you took during a given session, you will need to copy and paste it into a text file. Once copied, the console text can be formatted to improve its readability. For instance, a text file containing the five common steps of every R analysis could take the following form:
> #There are five steps that are common to every data analysis conducted in R > #Step 1: set your working directory > #set your working directory using setwd(dir) > #replace the sample location with one that is relevant to you > setwd("/Users/johnmquick/rBeginnersGuide/") > #once set, you can verify your new working directory using getwd() > getwd() [1] "/Users/johnmquick/rBeginnersGuide/" > #Step 2: Import data (or load an existing workspace) > #read a dataset from a csv file into R using read.csv(file) and save it into a new variable > dataset <- read.csv("datafile.csv") > #OR > #load an existing workspace using load.image(file) > load.image("existingWorkspace.RData") > #verify the contents of your workspace using ls() > ls() [1] "myVariable 1" [2] "myVariable 2" [3] "myVariable 3" > #Step 3: Explore your data > #calculate a mean using mean(data) > mean(myData) [1] 1000 > #calculate a standard deviation using sd(data) > sd(myData) [1] 100 > #calculate a range (minimum and maximum) using range(data) > range(myData) > [1] 500 2000 > #generate a detailed summary for a given object using summary(object) R workspacesaving> summary(object) > #Step 4: Conduct your analysis > #The appropriate methods for this step will vary between analyses. > #Step 5: Save your workspace and console files > #save your R workspace using save.image(file) > #remember to include the .RData file extension > save.image("myWorkspace.RData") > #save your R console text by copying it and pasting it into a text file.
See the rBeginnersGuide_CommonSteps.txt
file that is provided with this book.
a. It makes your code readable and organized.
b. It makes your code accessible to others.
c. It makes it easier for you to return to and recall your past work.
d. It makes the analysis process faster.
Conduct a complete end to end analysis using the strategy that you decided upon at the conclusion of Chapter 6. Be sure to employ each of the five common steps to all R analyses. Along the way, refer to the Retracing and Refining a Complete Analysis section of this chapter, as well as the previous chapters of this book. Once your analysis is complete, you should have the following items: