Both our hanzhongResources
and soldiersByCity
variables contain a complete set of values (as opposed to a single value). We already know that typing a variable's name into R will output all of its contents in the console. However, we often need to access the columns, rows, and cells within a dataset to perform calculations.
We will start by exploring two methods for accessing the columns in our soldiersByCity
variable:
Soldiers
column from our soldiersByCity
variable through R's variable$column
notation:> #isolate a single column within a dataset using the variable$column notation. > #display the contents of the Soldiers column from the soldiersByCity variable > soldiersByCity$Soldiers
attach(variable)
function to simplify our operation.> #isolate a single column within a dataset using the attach(variable) function and simplified notation > #attach the soldiersByCity variable > attach(soldiersByCity) > #display the contents of the Soldiers column from the soldiersByCity variable > Soldiers
Soldiers
column:Next, we will access a single row within the soldiersByCity
variable:
variable[row, column]
matrix notation to display the contents of the tenth row in our soldiersByCity
variable:> #isolate a single row within a dataset using the variable[row, column] matrix notation. > #display the contents of the tenth row in the soldiersByCity variable > soldiersByCity[10,]
soldiersByCity
dataset:Use matrix notation to display the contents of cell [5,3]
in our soldiersByCity
variable:
> #isolate a single cell within a dataset using the variable[row, column] matrix notation. > #display the contents of cell [5,3] in the soldiersByCity variable > soldiersByCity[5,3]
[5,3]
, as shown:You have just practiced accessing data within a variable from each possible angle, that is, by columns, rows, and individual cells. Let us take a closer look at how variable data is accessed in R.
Individual columns within a dataset can be accessed via the variable$column notation. Think of the dollar sign ($) as the letter S, as in the word "select." In this way, the notation can be read in words. For example, the line> A$B
can be read as "from variable A, select column B." During our activity, we selected the Soldiers
column from the soldiersByCity
variable by typing the following code in the R console:
> soldiersByCity$Soldiers
The attach(variable)
function is a convenient way to relieve ourselves of lengthy notation in some, but not all, cases. When a variable is attached in the R console, its columns can be referred to by name, without the need to identify the variable. For example, after we attached soldiersByCity
, we could display the contents of the Soldiers
column by simply typing> Soldiers
in the console.
A caveat with the attach(variable)
function is that often only a single variable can be attached to the R console at a given time. For instance, if we were to attach both our hanzhongResources
and soldiersByCity
variables at the same time, we would run into a problem regarding the Soldiers
column. Since both of these variables contain such a column, R can only refer to the most recently attached version. Accessing the other would require the use of variable$column
notation. In fact, R will warn you if you attach two variables that share a common column name. The following error occurs when the soldiersByCity
variable is attached, followed by hanzhongResources:
On the other hand, attaching a variable can be useful and efficient when you are working with a single, large dataset. If you are only manipulating data from one variable, then you will not run into the demonstrated error. Furthermore, you can always have one variable attached, even if you are working with datasets that have identical column names. Of course, if your variables do not have columns in common, then attaching them all is an option. In any case, you can always refer to columns using variable$column
notation, which we will do throughout the remainder of this book.
Note that should you ever need to detach a variable, you can use the detach(variable)
function. This will return the variable to its prior status in the console, as if it had never been attached in the first place.
When referring to row data or individual cells, the variable[row, column] notation should be used. For rows, such as when we accessed the tenth row in soldiersByCity
via> soldiersByCity[10,]
the column portion of the notation is omitted. This tells R to retrieve all of the columns in the row.
To isolate an individual cell, both a row and column value must be specified. When we accessed cell [5,2]
from soldiersByCity
via> soldiersByCity[5,2]
the 5
represented the cell's row, whereas the 2
defined the cell's column. This is similar to selecting a single point from a graph using its x-y coordinates, except the graph in our case is a matrix of data values.
On a side note, you may have noticed that variable[row,column]
notation can also be used to refer to columns. This can be accomplished by leaving the row portion of the notation blank. For example, to access the City
column in soldiersByCity
, we could use the code soldiersByCity[,1]
, this tells R to retrieve every row within the City
column.
> myVariable$myColumn
a. Multiply the data within myVariable
by the data within myColumn
.
b. Divide the data within myVariable
by the data within myColumn
.
c. In variable myColumn
, select column myVariable
.
d. In variable myVariable
, select column myColumn
.
a. You are working with a single dataset.
b. You are working with multiple datasets that contain identical column names.
c. You are working with multiple datasets that contain identical column names, but want to attach only one of them.
d. You are working with multiple datasets that do not contain identical column names.
variable[row,column]
notation can be used to access data from which of the following locations?a. Rows.
b. Columns.
c. Cells.
d. All of the above.