In this section, table()
and aggregate()
will be covered. They are basic processing functions that come in the base package.
table()
: This creates a contingency table with the specified vectors. Although its output is of the table type, it works similar to an array:sample.data <-data.frame(var1 =rep(c("Male","Female"),10), var2 =rep(c("A","B","C","D"))) example.table<-table(sample.data$var1, sample.data$var2) example.table ## ## A B C D ## Female 0 5 0 5 ## Male 5 0 5 0 example.table[2,2] ## [1] 0
The output of table()
can be indexed in the same way as an array.
aggregate()
: This performs one or more functions over a vector split by a factor variable. aggregate()
has basically two ways of usage:x
argument while one or more factor vectors are passed in the by
argument. FUN
is the aggregation function to be used:> data(iris) > aggregate(iris$Sepal.Length, by=list(iris$Species), FUN="mean") Group.1 x 1 setosa 5.006 2 versicolor 5.936 3 virginica 6.588
by
list, this information can be included in a formula object. Additionally, when using a formula object, it is not necessary to constantly refer to the data.frame
object being used. In case the variables that are used come from data.frame
, they can be specified in the data argument.With one factor, without the data argument:
> aggregate(iris$Sepal.Length ~ iris$Species, FUN="mean")
With one factor, with the data argument:
aggregate(Sepal.Length ~Species, data = iris, FUN ="mean") ## Species Sepal.Length ## 1 setosa 5.006 ## 2 versicolor 5.936 ## 3 virginica 6.588
With two factors—as there is no other factor variable, a variable letter is added:
iris$letter <-LETTERS[1:5] aggregate(Sepal.Length ~Species +letter, data = iris, FUN ="mean") ## Species letter Sepal.Length ## 1 setosa A 5.16 ## 2 versicolor A 5.96 ## 3 virginica A 6.94 ## 4 setosa B 5.03 ## 5 versicolor B 6.11 ## 6 virginica B 6.28 ## 7 setosa C 4.85 ## 8 versicolor C 6.07 ## 9 virginica C 6.78 ## 10 setosa D 4.95 ## 11 versicolor D 5.82 ## 12 virginica D 6.44 ## 13 setosa E 5.04 ## 14 versicolor E 5.72 ## 15 virginica E 6.50
Compute the mean of all variables by species:
data(iris) aggregate(. ~Species, data = iris, FUN ="mean") ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 setosa 5.006 3.428 1.462 0.246 ## 2 versicolor 5.936 2.770 4.260 1.326 ## 3 virginica 6.588 2.974 5.552 2.026
Formula objects are of the x ~ y
form where, in aggregate()
, x
is the vector over which the aggregation functions will be applied and y
is the splitting factor. To combine the elements on the right-hand side, + is used. Formula objects are also very commonly used in modeling, where x
is the variable to model and y
is the predictor. Lastly, if all variables should be included in one of the sides (except for the ones specified on the other side) they can be abbreviated with a dot.