Functions

A function is an object you can call. Basically, it is a machine with internal logic that takes a group of inputs (parameters or arguments) and returns a value as output.

In the previous sections, we encountered some built-in functions of R. For example, is.numeric() takes an argument that can be any R object and returns a logical value that indicates whether the object is a numeric vector. Similarly, is.function() can tell whether a given R object is a function object.

In fact, in R environment, everything we use is an object, everything we do is a function, and, maybe to your surprise, all functions are still objects. Even <- and + are both functions that take two arguments. Although they are called binary operators, they are essentially functions.

When we do casual, interactive data analysis, at times, we won't have to write any function on our own since the built-in functions and those provided by thousands of packages are usually enough.

However, if you need to repeat your logic or a process in data manipulation or analysis, those functions may not fully serve your purpose because they are not designed to meet the specific needs of a task or the format of a particular dataset. Then, you need to create your own functions targeting a specific set of demands.

Creating a function

It is easy to create a function in R. Suppose we define a function called add that simply adds two numbers x and y, respectively:

add <- function(x, y) {  x + y}

The syntax function (x, y) specifies the arguments of the function. In other words, the function takes two arguments named x and y. The { x + y } is the function body that contains a series of expressions expressed in terms of xy and other symbols available. The value of the last expression determines the value returned by the function unless return() is called inside the function. Finally, the function is assigned to add so that we can call this function using add later on.

Creating such a simple function, or any more complicated functions, does not impose any difference on evaluating a vector. The function in R just acts like another object. To see what object add refers to, just type add at the console:

add
## function(x, y) {
## x + y
## }

Calling a function

Once the function is defined, we can call the function just as we do in math. The calling requires the same syntax: name (arg1, arg2, ...). Take a look at the following:

add(2, 3)
## [1] 5

The call is quite transparent. When we evaluate such a call, R will find out if there is a function named add in the environment. Then, it will figure out that add refers to the function we just created and creates a local environment in which x takes 2 and y takes 3. The expression in the function body is then evaluated given the values of the arguments. Finally, the function returns the value of that expression, 5.

Dynamic typing

Functions in R can be very flexible since it is not strongly typed. In other words, the type of inputs are not fixed prior to the calling. Even if the function is originally designed to work for scalar numbers, it is automatically generalized to also work with all vectors as long as + works with them. For example, we can run the following code without any change in the function:

add(c(2, 3), 4)
## [1] 6 7

The preceding example does not really demonstrate the flexibility of dynamic typing because scalar is also a vector in R. A more qualified example is:

add(as.Date("2014-06-01"), 1)
## [1] "2014-06-02"

The function put the two arguments into the expression without any type checking. as.Date() creates a Date object, which has a date representation. Without changing any code of add, it works with Date perfectly. The function fails only when + is not well-defined for the two arguments:

add(list(a = 1), list(a = 2))
## Error in x + y: non-numeric argument to binary operator

Generalizing a function

Functions are a well-defined abstraction of a particular set of logic or process intended for solving some particular problem. Developers often want a function to be general enough to adapt to a wide range of use cases so that we can easily use it to solve similar problems without writing too many specialized functions for each problem.

To make a function more widely applicable is called generalization. It is very handy to generalize a function in a weakly-typed programming language like R, but it can be error-prone if it is incorrectly implemented.

To make add() more general so that it can handle various primitive algebraic operations, we can define another function called calc. This new function accepts three arguments where x and y are the two vectors, and type accepts a character vector which is the kind of algebraic operation the user wants to perform.

The following code implements such a function using flow control, which we will cover soon, but it should be easy to understand at first look. In this code, the choice of expression to be evaluated depends on the value of type:

calc <- function(x, y, type) {
  if (type == "add") {
    x + y
  } else if (type == "minus") {
    x - y
  } else if (type == "multiply") {
    x * y
  } else if (type == "divide") {
    x / y
  } else {
    stop("Unknown type of operation")
  }
}

Once the function is defined, we can call it by supplying appropriate arguments:

calc(2, 3, "minus")
## [1] -1

The function automatically works with numeric vectors:

calc(c(2, 5), c(3, 6), "divide")
## [1] 0.6666667 0.8333333

The function is also generalized to work with non-numeric vectors for which + is well-defined:

calc(as.Date("2014-06-01"), 3, "add")
## [1] "2014-06-04"

Consider supplying some invalid arguments:

calc(1, 2, "what")
## Error in calc(1, 2, "what"): Unknown type of operation

In this case, no conditions are satisfied, so the expression in the last else block will be evaluated. The stop() call yields an error message and terminates the whole evaluation immediately.

The functions seem to work fine and consider all possible situations with invalid arguments. However, it is not true:

calc(1, 2, c("add", "minue"))
## Warning in if (type == "add") {: the condition has length > 1 and only the
## first element will be used
## [1] 3

Here, we didn't consider the case where type is given as a multi-element vector. The problem is: when such a vector is compared with another vector, it will also result in a multi-element logical vector, it will also result in a mult-element logical vector which makes an ambiguous condition for if. Consider what it means by if(c(TRUE, FALSE))?

To avoid such ambiguity explicitly, we need to refine the function so that the error will be more informative and transparent. To proceed, we just need to check whether the vector has the length 1:

calc <- function(x, y, type) {

  if (length(type) > 1L) stop("Only a single type is accepted")
  if (type == "add") {
  x + y
  } else if (type == "minus") {
  x - y
  } else if (type == "multiply") {
  x * y
  } else if (type == "divide") {
  x / y
  } else {
  stop("Unknown type of operation")
  }
  }

Then, we retry the trouble-making call and see how the exception is handled by pre-checking of arguments:

calc(1, 2, c("add", "minue"))
## Error in calc(1, 2, c("add", "minue")): Only a single type is accepted

Default value for function arguments

Some functions are very flexible because they accept a wide range of input and meet a variety of demands. In many cases, more flexibility means an increasing number of arguments.

If we have to specify tens of arguments each time using a very flexible function, it would certainly be a mess to look at the code. In this case, reasonable default values for arguments will largely simplify the code to call a function.

To set the default value of an argument, use arg = value. This will make the argument optional. The following example creates a function with an optional argument:

increase <- function(x, y = 1) {
x + y
}

The new function increase() allows us to call it with only x. In this case, y automatically takes 1 unless it is explictly specified.

increase(1)
## [1] 2
increase(c(1, 2, 3))
## [1] 2 3 4

Many R functions have multiple arguments and some of them are given default values. Sometimes, it is tricky to determine the default values of arguments because it heavily relies on the intention of most users.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset