The lapply, vapply, sapply, and apply functions

These functions are equivalent to a for-each loop with the advantage that they are much more efficient in terms of performance. Basically, the function is applied over every item in a vectorized object. Its main structure is:

function(object_to_iterate_on, function, additional_arguments(separated by commas))

vapply() and apply() have additional arguments that will be covered in detail in the expanded explanations of these functions.

The function argument can be an already defined function (with its arguments), as follows:

sample.list <- list(a=runif(100,0,1), b=runif(500,0,100),
c=runif(35,0,200))
sapply(sample.list, quantile, probs=0.75)
##       a.75%       b.75%       c.75% 
##   0.7145661  77.4817679 158.9351519

Also, the function can be defined within the same apply-function call (these types of functions are called anonymous functions), as follows:

sapply(sample.list, function(x) round(sum(x+2)))
##     a     b     c 
##   250 27235  3867

In this example, a function is defined (sum 2 to the entire vector and then round it). Eventually, the first function could also be rewritten in this way:

> sapply(sample.list, function(x) quantile(x,probs=0.75))

The following are the differences between each function:

  • lapply(): In this, a vector is passed and the output is returned in a list.
  • sapply(): In this, a vector is passed and the output's class is defined by the function.
  • vapply(): In this, a vector and a format specification is passed and the output is returned in the specified format.
  • apply(): In this, an array and a direction (mainly, row-wise or column-wise) is passed, and the output is guessed by the function. It is important to consider that in the case of apply(), a data frame (instead of a matrix) can be passed but it is coerced to a matrix with the corresponding class transformation. This is particularly important if the function that is to be applied uses elements of different classes.

Examples

The following are a few examples:

> data(iris)
> apply(iris,1, function(x) as.numeric(x["Sepal.Width"]) + as.numeric(x["Sepal.Length"]) + 3)
[1] 11.6 10.9 10.9 10.7 11.6 12.3 11.0 11.4 10.3 11.0 12.1 11.2 10.8 10.3 12.8 13.1
[17] 12.3 11.6 12.5 11.9 11.8 11.8 11.2 11.4 11.2 11.0 11.4 11.7 11.6 10.9 10.9 11.8
[33] 12.3 12.7 11.0 11.2 12.0 11.5 10.4 11.5 11.5 9.8 10.6 11.5 11.9 10.8 11.9 10.8
[49] 12.0 11.3 13.2 12.6 13.0 10.8 12.3 11.5 12.6 10.3 12.5 10.9 10.0 11.9 11.2 12.0
[65] 11.5 12.8 11.6 11.5 11.4 11.1 12.1 11.9 11.8 11.9 12.3 12.6 12.6 12.7 11.9 11.3
[81] 10.9 10.9 11.5 11.7 11.4 12.4 12.8 11.6 11.6 11.0 11.1 12.1 11.4 10.3 11.3 11.7
[97] 11.6 12.1 10.6 11.5 12.6 11.5 13.1 12.2 12.5 13.6 10.4 13.2 12.2 13.8 12.7 12.1
[113] 12.8 11.2 11.6 12.6 12.5 14.5 13.3 11.2 13.1 11.4 13.5 12.0 13.0 13.4 12.0 12.1
[129] 12.2 13.2 13.2 14.7 12.2 12.1 11.7 13.7 12.7 12.5 12.0 13.0 12.8 13.0 11.5 13.0
[145] 13.0 12.7 11.8 12.5 12.6 11.9

As it has been mentioned before, apply transforms the data frame to a matrix, which, by definition, has a unique class. The number 1 in the margin argument (the direction) implies that the function application is done per row. In this case, as the Species variable is a character, the dataset is coerced to a character matrix. This is the reason why, in the function call, the Sepal.Width variable must be transformed to a numeric value.

In this example, the column name was used to specify which columns were to be used. Alternatively, index numbers can be also used:

> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
> apply(iris,1, function(x) as.numeric(x[2]) + as.numeric(x[1]) + 3)

This is what happens if the variables are not transformed to their corresponding classes:

> apply(iris,1, function(x) x[2] + x[1] + 3)
Error in x[2] + x[1] : non-numeric argument to binary operator

R, basically, throws an error because it does not recognize the arguments as numbers.

However, if the data frame contains only numeric values, it is coerced to a numeric matrix. The following is an explanation of the use of apply column-wise without needing to cast the variables inside the function:

apply(iris[,1:4], 2, mean)
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333

From the iris dataset, the mean for each column except the species is calculated. However, as the species is not included in the matrix argument, it is coerced to a numeric matrix as the other four variables are numeric. If the complete dataset was passed, it would throw an error.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset