Basic summary functions

In this section, table() and aggregate() will be covered. They are basic processing functions that come in the base package.

  • table(): This creates a contingency table with the specified vectors. Although its output is of the table type, it works similar to an array:
    sample.data <-data.frame(var1 =rep(c("Male","Female"),10), var2 =rep(c("A","B","C","D")))
    example.table<-table(sample.data$var1, sample.data$var2)
    example.table
    ##         
    ##          A B C D
    ##   Female 0 5 0 5
    ##   Male   5 0 5 0
    example.table[2,2]
    ## [1] 0
    

    The output of table() can be indexed in the same way as an array.

  • aggregate(): This performs one or more functions over a vector split by a factor variable. aggregate() has basically two ways of usage:
    • With vectors: One or more vectors are passed to the x argument while one or more factor vectors are passed in the by argument. FUN is the aggregation function to be used:
      > data(iris)
      > aggregate(iris$Sepal.Length, by=list(iris$Species), FUN="mean")
      Group.1 x
      1 setosa 5.006
      2 versicolor 5.936
      3 virginica 6.588
      
    • Through formula objects: Instead of specifying a vector and a by list, this information can be included in a formula object. Additionally, when using a formula object, it is not necessary to constantly refer to the data.frame object being used. In case the variables that are used come from data.frame, they can be specified in the data argument.

      With one factor, without the data argument:

      > aggregate(iris$Sepal.Length ~ iris$Species, FUN="mean")
      

      With one factor, with the data argument:

      aggregate(Sepal.Length ~Species, data = iris, FUN ="mean")
      ##      Species Sepal.Length
      ## 1     setosa        5.006
      ## 2 versicolor        5.936
      ## 3  virginica        6.588
      

      With two factors—as there is no other factor variable, a variable letter is added:

      iris$letter <-LETTERS[1:5]
      aggregate(Sepal.Length ~Species +letter, data = iris, FUN ="mean")
      ##       Species letter Sepal.Length
      ## 1      setosa      A         5.16
      ## 2  versicolor      A         5.96
      ## 3   virginica      A         6.94
      ## 4      setosa      B         5.03
      ## 5  versicolor      B         6.11
      ## 6   virginica      B         6.28
      ## 7      setosa      C         4.85
      ## 8  versicolor      C         6.07
      ## 9   virginica      C         6.78
      ## 10     setosa      D         4.95
      ## 11 versicolor      D         5.82
      ## 12  virginica      D         6.44
      ## 13     setosa      E         5.04
      ## 14 versicolor      E         5.72
      ## 15  virginica      E         6.50
      

      Compute the mean of all variables by species:

      data(iris)
      aggregate(. ~Species, data = iris, FUN ="mean")
      ##  Species Sepal.Length Sepal.Width Petal.Length Petal.Width
      ## 1   setosa     5.006     3.428        1.462       0.246
      ## 2 versicolor   5.936     2.770        4.260       1.326
      ## 3  virginica   6.588     2.974        5.552       2.026
      

    Note

    Formula objects are of the x ~ y form where, in aggregate(), x is the vector over which the aggregation functions will be applied and y is the splitting factor. To combine the elements on the right-hand side, + is used. Formula objects are also very commonly used in modeling, where x is the variable to model and y is the predictor. Lastly, if all variables should be included in one of the sides (except for the ones specified on the other side) they can be abbreviated with a dot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset