Being able to access the information stored in a variable is the initial step towards manipulating its data. Variables and their data can be used in the same way that we used numbers to perform calculations in Chapter 2. They can be used in mathematical formulas as well as in function arguments.
hanzhongResources
variable to calculate the amount of resources that the Shu army would have remaining if a flood were to destroy 75% of each resource:> #if a flood destroyed 75% of the Shu resources at Hanzhong, how much of each resource would remain? > #multiply the hanzhongResources variable by 0.25 to represent the remaining 25% of the original resources > hanzhongResources * 0.25
Provisions
column of the hanzhongResources
variable:> #if a flood destroyed 75% of the Provisions at Hanzhong, how much would remain? > #multiply the Provisions column by 0.25 to represent the remaining 25% of the original resources > hanzhongResources$Provisions * 0.25
soldiersByCity
variable to calculate the mean (average) number of soldiers stationed in a Shu city:> #use the mean(data) function to calculate the average number of soldiers stationed in a Shu city > #on average, a Shu city has this many soldiers: > mean(soldiersByCity$Soldiers)
meanSoldiersByCity:
> #save the mean number of soldiers per city into a new variable named meanSoldiersByCity > meanSoldiersByCity <- mean(soldiersByCity$Soldiers)
meanSoldiersByCity
by entering it into the R console:> #display the contents of meanSoldiersByCity > meanSoldiersByCity
meanSoldiersByCity
variable:In just a few lines of code, you have experienced the range of variable manipulations that you will use on a regular basis in R. Let us explore each one individually.
When you used your hanzhongResources
variable to calculate the consequences of a flood across each resource, you discovered that when a variable is manipulated in this manner, so is all of its underlying data.
For demonstration, consider the following table with the cell values of 1, 2, 3, and 4 in columns a, b, c, and d respectively:
a |
b |
c |
d |
---|---|---|---|
1 |
2 |
3 |
4 |
Suppose that this table is saved in a R variable named lettersAndNumbers
. If we were to add one to the lettersAndNumbers
variable in R, by the following command:
> lettersAndNumbers + 1
Our resulting table would contain the addition of each cell's value and one, as follows:
a |
b |
c |
d |
---|---|---|---|
2 |
3 |
4 |
5 |
As you can see, R will attempt to perform any calculation made on a dataset to each of its values. However, it is worth noting that R will not always be able to make a successful calculation on every cell in a dataset.
For instance, if we tried to make a numeric calculation on the Kingdom
and City
columns of our soldiersByCity
variable, R would return a warning along with an NA
or not applicable values. This is due to the fact that our Kingdom
and City
columns contain text and therefore it does not make sense to manipulate them numerically. To see this warning in action, enter the following lines into the R console:
> #what happens if we try to make a numeric calculation on nonnumeric data? > #we receive a warning, because it does not make sense to manipulate text mathematically > soldiersByCity * 5
This would result in the following screen:
Here, the Soldiers
columns contain numeric values and therefore each value within it is successfully multiplied by five. However, the text in the Kingdom
and City
columns cannot be multiplied. Hence, a warning message is returned. To avoid deriving meaningless values and upsetting the R console, it is important to be aware of your data and apply appropriate calculations to them.
Manipulating row, column, or cell data is identical to manipulating an entire dataset contained within a variable. The difference is not in the calculation, but rather in what you choose to perform the calculation on. Depending on whether you aim to manipulate row, column, or cell data, you will need to access the values in the appropriate manner. See the Accessing data within variables section of this chapter for a review of these methods.
A variable's data, be it from the entire set or a specific subset (row, column, or cell), can be used in function arguments. Our preceding activity used the mean(data)
function to calculate the average number of soldiers among the Shu cities listed in our soldiersByCity
variable. We could have easily done the same with the entire soldiersByCity
dataset, a single row, or an individual cell. The best method for using variable data in arguments will depend on the goal of the manipulation and the specific function being employed.
Do not forget that a variable's purpose is to store and organize your information. Quite often, we will need to store the results of a calculation or function into a new variable for subsequent manipulation. The body of variables and other objects that we amass throughout our work are stored in the R workspace, which is the topic of our next section.
The table myTable
contains two rows, three columns, and six cells with the numbers one through six. Use this table to answer questions 1 and 2.
myTable | ||
---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
> myTable * 10
If this code were applied to myTable, what would be the result? Write the appropriate values in the blank cells of myTableAfterManipulation1:
myTableAfterManipulation1 | ||
---|---|---|
> myTable[1,2] + 10
If this code were applied to myTable, what would be the result? Write the appropriate values in the blank cells of myTableAfterManipulation2:
myTableAfterManipulation2 | ||
---|---|---|
> myVariable <- mean(myData$myColumn)
a. Calculate the mean of myColumn
and then set myVariable
equal to the result.
b. Calculate the mean of myData
and then set myVariable
equal to the result.
c. In myData
, select myColumn
, calculate its mean, and then set myVariable
equal to the result.
d. Set myVariable
equal to the contents of myData
and then calculate its mean.
To practice the variety of methods that we have covered for manipulating variables, use your resource data and knowledge of R to complete the following tasks:
hanzhongResources
variable. Save the results into a single variable named hanzhongResourcesAfterFlood
. soldiersByCity
variable. Save each of these calculations into a new variable. The variables should be named guanghanSoldiersAfterRelocation
and baxiSoldiersAfterRelocation
respectively. min(data)
and max(data)
functions and your soldiersByCity
variable to calculate minimum and maximum number of soldiers in either army by city. Save the results as variables named minSoldiersByCity
and maxSoldiersByCity
respectively. sum(data)
function and your soldiersByCity
variable to calculate the total number of soldiers in the Shu and Wei armies. Then, save the result as a variable named totalSoldiers
.If you encounter a warning or error during any of these tasks, think about how you can be more specific about which data you want to apply your calculation or function to. For detailed information on handling these occurrences, refer back to the Performing a calculation on an entire dataset section of this chapter.