Hour 7. Writing Functions: Part I


What You’ll Learn in This Hour:

Image How to write and use a simple R function

Image How to return objects from a function

Image How to control flow through a function


So far in this book you have seen many functions being used. For example, in the earlier hour on single-mode data structures you saw that you could create vectors using functions such as c, seq, and rep. One of the strengths of R is that you can extend it by writing your own functions. This allows you to create utilities that can perform a variety of tasks. In this hour, we look at ways to create our own functions, specify inputs, and return results to the user. We also introduce the “if/else” structure in R, and we use this to control the flow of code within a function.

The Motivation for Functions

You have seen that functions in R allow you to perform a number of tasks in a simple command. This approach has parallels in most programmable languages, such as “macros” in Visual Basic and SAS.

Creating your own functions is a powerful aspect of R that allows you to “wrap up” a series of steps into a simple container. This way, you can capture common workflows and utilities and call them when needed instead of producing long, verbose scripts of repeated code snippets that can be difficult to manage.

A Closer Look at an R Function

Before we write our own functions, let’s take a closer look at the structure of an existing R function. Consider, for example, the upper.tri function, which allows us to identify values in the upper triangle of a matrix:

> myMat                       # A sample matrix
     [,1] [,2] [,3]
[1,]    1    6    3
[2,]    1    3    8
[3,]    5    4    1
> upper.tri(myMat)            # Upper triangle
      [,1]  [,2]  [,3]
[1,] FALSE  TRUE  TRUE
[2,] FALSE FALSE  TRUE
[3,] FALSE FALSE FALSE
> myMat [ upper.tri(myMat) ]  # Values from upper triangle
[1] 6 3 8

As seen here, we can call the upper.tri function using round brackets, specifying the matrix as the first input. However, if we simply print the upper.tri function, we can see its contents:

> upper.tri        # Print the upper.tri function
function (x, diag = FALSE)
{
    x <- as.matrix(x)
    if (diag) row(x) <= col(x)
    else row(x) < col(x)
}

The function is split into two parts:

Image The top part defines the inputs to the function (in this case, the inputs are x and diag).

Image The next part, captured within curly brackets, contains the main “body” of the function.

In a similar way, we can create our own functions by specifying a function name, defining the function inputs, and specifying the actions we wish to take in the function body.

Creating a Simple Function

We can create a simple function in R using the function keyword. The curly brackets are used to contain the body of the function. In this simple example, we create a function that accepts a single input:

> addOne <- function(x) {
+   x + 1
+ }

Our new addOne function adds 1 to any input object. Once we’ve created a function, we can call that function in the usual way:

> addOne(x = 1:5)   # Call the addOne function
[1] 2 3 4 5 6


Tip: Saving Outputs

Here, we see the values 2 to 6 returned from a function. If we want to save the output from a function for later use, we need to assign the output from the function to an object, as shown here:

> result <- addOne(1:5)
> result
[1] 2 3 4 5 6


The function created is itself an R object. As such, it exists in the R Workspace, and can be managed and reused in future sessions if you save your Workspace objects, as discussed in Hour 2, “The R Environment.”

The body of our simple addOne function contains only one line of code. If the function body contains only a single line of code, we can omit the curly brackets, as follows:

> addOne <- function(x) x + 1
> addOne(x = 1:5)     # Call the addOne function
[1] 2 3 4 5 6


Note: Named Arguments

As you saw in Hour 6, “Common R Utility Functions,” there are many ways to call functions and define arguments. In the preceding example, addOne(x = 1:5) is equivalent to addOne(1:5). In this hour, we will name all arguments when calling the functions to aid clarity, but common convention in R is that the first argument (or arguments) is not directly named.



Caution: Continual Prompts

In many of our examples, we see the familiar command prompt for the first line of the function, with plus (+) symbols prefixing the following lines. These signify the “continuation” prompt in R, and are not part of the code itself (in other words, you should not type these symbols when creating your functions).



Tip: Using the Script Window

As mentioned earlier, functions typically contain more than one line of code. As such, the script window (in RStudio or other interface) is preferred to the console window when developing functions.


Naming a Function

A function is an R object, so it can be named like any other R object. Hence, its name

Image Can be of any length

Image Can contain any combinations of letters, numbers, underscores, and period characters

Image Cannot start with a number

One thing to note, however, is that creating a function can cause existing functions to be “masked.” Consider the following example:

> X <- 1:5                            # Create a vector
> median(X)                           # The median of the vector is 3
[1] 3
> find("median")                      # Where is the "median" function?
[1] "package:stats"

> median <- function(input) "Hello"   # Create a new "median" function
> median(X)                           # The median of the vector is "Hello"
[1] "Hello"
> find("median")                      # Where is the "median" function?
[1] ".GlobalEnv"    "package:stats"

> rm(median)                          # Remove the new "median" function from the
                                        workspace
> median(X)                           # The median of the vector is 3
[1] 3

Here we have created a new median function in the R Workspace, thus “masking” the original median function, which still exists in the stats package. As such, care should be taken when naming functions to ensure you don’t “mask” existing key functions.

Defining Function Arguments

In the previous section, we created a very simple function called addOne, defined as follows:

> addOne <- function(x) {
+   x + 1
+ }

Note that this function takes a single argument, x. If we wanted to extend this example, we could add a second argument:

> addNumber <- function(x, number) {
+ x + number
+ }
> addNumber(x = 1:5, number = 2)
[1] 3 4 5 6 7

Our new function (addNumber) now accepts two arguments (x and number) and adds these values together. Note, however, that these are both required arguments because they do not have default values. As such, calling the function without both arguments defined will result in an error:

> addNumber()                     # Calling with no arguments
Error in addNumber() : argument "x" is missing, with no default

> addNumber(x = 1:5)              # Calling with only the "x" argument
Error in addNumber(x = 1:5) : argument "number" is missing, with no default

> addNumber(number = 2)           # Calling with only the "number" argument
Error in addNumber(number = 2) : argument "x" is missing, with no default

> addNumber(x = 1:5, number = 2)  # Calling with both arguments
[1] 3 4 5 6 7

If we want to assign default values for arguments to a function, we can specify them directly in the argument definition, as follows:

> addNumber <- function(x, number = 0) {
+   x + number
+ }
> addNumber(x = 1:5)               # Call function with default (number = 0)
[1] 1 2 3 4 5
> addNumber(x = 1:5, number = 1)   # Call function with number = 1
[1] 2 3 4 5 6

Function Scoping Rules

When we define a function, we can create objects within the function body. This may help to simplify functions or make them generally more readable. For example, we may create an object to be returned:

> addNumber <- function(x, number = 0) {
+   theAnswer <- x + number    # Create "theAnswer" by adding "x" and "number"
+   theAnswer                  # Return the value
+ }

If we call the function, note that the theAnswer object is not accessible once the function has been executed:

> output <- addNumber(x = 1:5, number = 1)     # Call the function creating
                                                 "output" object
> output                                       # Look at value of "output"
[1] 2 3 4 5 6

> theAnswer                                    # "theAnswer" object does not exist
Error: object 'theAnswer' not found

When we run a function, R loads argument inputs and objectives created into a separate, temporary area of memory (a memory “frame”). Once the execution of the function is complete, the output is returned and the temporary area of memory closed. As such, objects created within a function call should be considered “local” to that function, so any required outputs must be explicitly returned from the function.

Return Objects

In the preceding example, you saw an object created within the function body. Let’s extend that example to include the creation of more “local” objects. In this example, we create a function called plusAndMinus, which creates two “local” objects (called PLUS and MINUS) and attempts to return both of them:

> plusAndMinus <- function(x, y) {
+   PLUS <- x + y                 # Define "PLUS"
+   MINUS <- x - y                # Define "MINUS"
+   PLUS                          # Return "PLUS"
+   MINUS                         # Return "MINUS"
+ }
> plusAndMinus(x = 1:5, y = 1:5)  # Call function
[1] 0 0 0 0 0

As you can see, only the last object (the MINUS object) is returned from the function—the PLUS object value is not returned and, as discussed earlier, is only a local object, so the value cannot be retrieved.

R functions can only return a single object, which is the result of the last line of code in the function. This can be confirmed by swapping the order of the PLUS and MINUS return objects:

> plusAndMinus <- function(x, y) {
+   PLUS <- x + y                 # Define "PLUS"
+   MINUS <- x - y                # Define "MINUS"
+   MINUS                         # Return "MINUS"
+   PLUS                          # Return "PLUS"
+ }
> plusAndMinus(x = 1:5, y = 1:5)  # Call function
[1]  2  4  6  8 10

If we want to return more than one value from a function (for example, the PLUS and MINUS objects), we need to combine them into a single object. First, let’s return the two values in a list:

> plusAndMinus <- function(x, y) {
+   PLUS <- x + y                 # Define "PLUS"
+   MINUS <- x - y                # Define "MINUS"
+   list(PLUS, MINUS)             # Return "PLUS" and "MINUS" in a list
+ }
> plusAndMinus(x = 1:5, y = 1:5)  # Call function
[[1]]
[1]  2  4  6  8 10

[[2]]
[1] 0 0 0 0 0

This returns a single object, a list, containing the two values. When we return a list in this way, we should name the elements so we can more easily reference the values later:

> plusAndMinus <- function(x, y) {
+   PLUS <- x + y                    # Define "PLUS"
+   MINUS <- x - y                   # Define "MINUS"
+   list(plus = PLUS, minus = MINUS) # Return "PLUS" and "MINUS" in a list
+ }
> output <- plusAndMinus(x = 1:5, y = 1:5)  # Call function, saving the output
> output                                    # Print the output
$plus
[1]  2  4  6  8 10

$minus
[1] 0 0 0 0 0

> output$plus                               # Print the "plus" element
[1]  2  4  6  8 10

The list object is an appropriate structure in this example, because we are returning multiple vectors. However, we may be returning a number of single values from a function, in which case a vector may be more suitable. Consider the following example, where we return some summary statistics as a vector:

> summaryFun <- function(vec, digits = 3) {
+
+   # Create some summary statistics
+   theMean <- mean(vec)
+   theMedian <- median(vec)
+   theMin <- min(vec)
+   theMax <- max(vec)
+
+   # Combine them into a single vector and round the values
+   output <- c(Mean = theMean, Median = theMedian, Min = theMin, Max = theMax)
+   round(output, digits = digits)
+ }
>
> X <- rnorm(50)   # Generate 50 samples from a normal distribution
> summaryFun(X)    # Produce summaries of the vector
  Mean Median    Min    Max
-0.214 -0.051 -2.633  1.764


Note: Checking Function Inputs

For the preceding functions, we frequently make assumptions about the structure of the inputs. For example, in the summaryFun function we assume the vec input is a numeric object (otherwise functions such as mean make no sense). Later, in Hour 8, “Writing Functions: Part II,” we will cover ways of checking function inputs. This includes functions for checking the structure of inputs and for producing error or warning messages when those inputs are not appropriate for the function.


The If/Else Structure

In the function examples you’ve seen so far in this hour, the “flow” through the body of the function has been completely linear and sequential. However, we may alternatively wish to control the flow based on decisions using an “if/else” statement.


Note: What Do We Mean by “If/Else”?

If you are not familiar with programming, the if/else statement is a common structure, where code is executed, or not, based on certain decisions. Consider this pseudo-code example:

IF I have enough money, I will buy a can of soda and a candy bar
ELSE I will just buy the can of soda

Often, we will only need an “IF” statement. Note that because either option in this example involves buying a can of soda, we can rewrite without the “ELSE” statement:

Buy the can of soda
IF I have enough money, I will also buy a candy bar

We can also have nested statements, such as this:

IF I have enough money, I will buy a can of soda and a candy bar
ELSE {
   IF they have my favorite type of candy bar I will just buy that
   ELSE I will just buy the can of soda
}

We can use a similar structure within our code to control the flow of the function based on specific choices.


The basic structure of an if/else statement in R is as follows:

if (something is TRUE) {
  do this
}
else {
  do this instead
}

As with functions, we use curly brackets to contain a body of code. However, if these are simple one-line statements, we may omit the curly brackets, as follows:

if (something is TRUE) do this
else do this instead

The “test” that is performed within the if statement (marked as “something is TRUE” here) is called the “condition,” and should take the form of a single TRUE or FALSE value.

A Simple R Example

Let’s look at a simple example of this in action. Here, we use the cat function, which prints text to the screen based on whether the number passed to it is positive or negative:

> posOrNeg <- function(X) {
+   if (X > 0) {
+     cat("X is Positive")
+   }
+   else {
+     cat("X is Negative")
+   }
+ }
> posOrNeg(1)    # is 1 positive or negative?
X is Positive
> posOrNeg(-1)   # is -1 positive or negative?
X is Negative
> posOrNeg(0)    # is 0 positive or negative?
X is Negative


Note: If/Else in a Script

Note that the above example of if/else is contained within a function. If, instead, the if/else code was run interactively or as part of a script, it would interpret the if part of the statement as a single command and would fail when the else statement is encountered:

> X <- 1
> if (X > 0) {
+   cat("X is Positive")
+ }
X is Positive
> else {
Error: unexpected 'else' in "else"
>   cat("X is Negative")
X is Negative
> }
Error: unexpected '}' in "}"

To guard against this issue, we can rewrite the command positioning the else statement immediately following the closing curly bracket of the if component as follows:

> X <- 1
> if (X > 0) {
+   cat("X is Positive")
+ } else {      # NOTE: "else" on same line as closing } of "if"
+   cat("X is Negative")
+ }
X is Positive


Nested Statements

In this example, positive and negative integers are handled and the function will return the correct message. However, when we pass the function a 0, this would be reported as a negative, which isn’t true (in the most popular definition 0 is neither positive nor negative).

We can improve our example by using a nested if/else statement:

> posOrNeg <- function(X) {
+   if (X > 0) {
+     cat("X is Positive")
+   }
+   else {
+     if (X == 0) cat("X is Zero")
+     else cat("X is Negative")
+   }
+ }
> posOrNeg(1)    # is 1 positive or negative?
X is Positive
> posOrNeg(0)    # is 0 positive or negative?
X is Zero

Using One Condition

Consider the following example:

> posOrNeg <- function(X) {
+   if (X > 0) {
+     cat("X is Positive")
+   }
+   else {
+     cat("")
+   }
+ }
> posOrNeg(1)    # is 1 positive or negative?
X is Positive
> posOrNeg(0)    # is 0 positive or negative?

In this example, the “else” part of the statement does nothing, so we can drop it and simplify as follows:

> posOrNeg <- function(X) {
+   if (X > 0) {
+     cat("X is Positive")
+   }
+ }
> posOrNeg(1)    # is 1 positive or negative?
X is Positive
> posOrNeg(0)    # is 0 positive or negative?

Multiple Test Values

In the preceding example, the posOrNeg function accepts an input called X and the condition is X > 0. Running this condition outside the if/else statement shows that it returns a single logical value:

> X <- 1  # Set X to 1
> X > 0   # Is X greater than 0?
[1] TRUE

> X <- 0  # Set X to 0
> X > 0   # Is X greater than 0?
[1] FALSE

If we instead provide a vector of values to this function, we get the following warning message:

> posOrNeg <- function(X) {
+   if (X > 0) cat("X is Positive")
+   else cat("X is Negative")
+ }
> posOrNeg(-2:2)    # is 1 positive or negative?
X is Negative
Warning message:
In if (X > 0) cat("X is Positive") else cat("X is Negative") :
  the condition has length > 1 and only the first element will be used

In this case, when running the condition outside the if/else statement, we can see that the result is a vector of logicals:

> X <- -2:2  # Set X to -2:2
> X > 0      # Is X greater than 0?
[1] FALSE FALSE FALSE  TRUE  TRUE

The if/else structure is looking for a single “choice” (that is, should it run the first “if” section of code or the second “else” section of code?). In this example, the condition has returned five “answers” (FALSE FALSE FALSE TRUE TRUE).

R handles this mismatch by only using the first “answer” (as per the warning message). This is FALSE, hence the result (“X is Negative”).

Summarizing to a Single Logical

In the last example, you saw that the condition should be a single TRUE or FALSE value. You also saw that warnings and unexpected behaviors can occur if multiple logical values are generated from the condition.

One way of handling this is to use the all and any functions to collapse a vector of logicals into a single TRUE or FALSE value:

> X <- -2:2  # Set X to -2:2
> X > 0      # Is X greater than 0?
[1] FALSE FALSE FALSE  TRUE  TRUE
> all(X > 0) # Are all values of X greater than 0?
[1] FALSE
> any(X > 0) # Are any values of X greater than 0?
[1] TRUE

We can use these functions directly in the condition as follows:

> posOrNeg <- function(X) {
+   if (all(X > 0)) cat("All values of X are > 0")
+   else {
+     if (any(X > 0)) cat("At least 1 value of X is > 0")
+     else cat("No values are > 0")
+   }
+ }
> posOrNeg(-2:2)
At least 1 value of X is > 0
> posOrNeg(1:5)
All values of X are > 0
> posOrNeg(-(1:5))
No values are > 0

Switching with Logical Input

Sometimes we may want the person calling the function to choose the flow of the function. In this case, we can provide a logical argument that the function passes directly to the condition in the if/else statement:

> logVector <- function(vec, logIt = FALSE) {
+   if (logIt == TRUE) vec <- log(vec)
+   else vec <- vec
+   vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE) # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379

Again, the “else” portion of this statement changes nothing, so we can drop it:

> logVector <- function(vec, logIt = FALSE) {
+   if (logIt == TRUE) vec <- log(vec)
+   vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE) # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379

There is one more simplification we can make. Consider the possible outcomes from the condition:

Image If logIt is TRUE, then logIt == TRUE will be TRUE.

Image If logIt is FALSE, then logIt == TRUE will be FALSE.

So, regardless of the result, logIt == TRUE will always return the same value as logIt. Therefore, we can simplify the condition as follows:

> logVector <- function(vec, logIt = FALSE) {
+   if (logIt) vec <- log(vec)
+   vec
+ }
> logVector(1:5)
[1] 1 2 3 4 5
> logVector(1:5, logIt = TRUE)  # Call the function with logIt = TRUE
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379

Reversing Logical Values

Using all and any, we can summarize logical vectors as follows:

> X <- -2:2  # Set X to -2:2
> X > 0      # Is X greater than 0?
[1] FALSE FALSE FALSE  TRUE  TRUE
> all(X > 0) # Are all values of X greater than 0?
[1] FALSE
> any(X > 0) # Are any values of X greater than 0?
[1] TRUE

We can introduce the ! notation before any logical statement to convert TRUE values to FALSE values and FALSE values to TRUE values. This can be seen here:

> X <- -2:2  # Set X to -2:2
> X > 0      # Is X greater than 0?
[1] FALSE FALSE FALSE  TRUE  TRUE
> !(X > 0)   # Reverse logical values
[1]  TRUE  TRUE  TRUE FALSE FALSE

We can also use the ! notation before the all and any functions to reverse the meanings of the conditions as follows:

> posOrNeg <- function(X) {
+   if (all(X > 0)) cat(" All values of X are greater than 0")
+   if (!all(X > 0)) cat(" Not all values of X are greater than 0")
+   if (any(X > 0)) cat(" At least 1 value of X is greater than 0")
+   if (!any(X > 0)) cat(" No values of X are greater than 0")
+ }
> posOrNeg(1:5)       # All > 0

All values of X are greater than 0
At least 1 value of X is greater than 0
> posOrNeg(-2:2)      # Some > 0, Some <= 0

Not all values of X are greater than 0
At least 1 value of X is greater than 0
> posOrNeg(-(1:5))    # All <= 0

Not all values of X are greater than 0
No values of X are greater than 0


Note: New Line Characters

Note the use of the character in the call to cat in the preceding example. The character specifies that a new line is written, which is why each statement printed is on a separate line. This can be further seen in this example:

> cat("Hello there")
Hello
there


Mixing Conditions

In all our examples so far, there has been a single condition. If we have more than one condition, we can use the & or | notation to combine conditions. Here is a rather contrived example to show the use of these operators:

> betweenValues <- function(X, Min = 1, Max = 10) {
+   if (X >= Min & X <= Max) cat(paste("X is between", Min, "and", Max))
+   if (X < Min | X > Max) cat(paste("X is NOT between", Min, "and", Max))
+ }
> betweenValues(5)
X is between 1 and 10
> betweenValues(5, Min = -2, Max = 2)
X is NOT between -2 and 2

We may also mix conditions that come from different sources. Consider the following example that mixes a condition passed from the user with one derived within the function:

> logVector <- function(vec, logIt = FALSE) {
+   if (all(vec > 0) & logIt) vec <- log(vec)
+   vec
+ }
> logVector(1:5, logIt = TRUE)  # Logs the data
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
> logVector(-5:5, logIt = TRUE) # Doesn't log the data because first condition not met
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

Control And/Or Statements

When multiple conditions are combined with & and/or | conditions, each condition is evaluated separately, and the each result is compared. To illustrate this, consider the following example:

> logVector <- function(vec) {
+   if (all(vec > 0) & all(log(vec) <= 2)) cat("Numbers in range")
+   else cat("Numbers not in range")
+ }
> logVector(1:10)    # Some logged values are greater than 2
Numbers not in range
> logVector(1:5)     # All values are in range
Numbers in range

Let’s consider the way in which the condition from the last call is evaluated:

Image The all(vec > 0) statement is evaluated, resulting in a TRUE value.

Image The all(log(vec) <= 2) statement is evaluated, also resulting in a TRUE value.

Image The results of the two statements are compared: TRUE & TRUE = TRUE.

Now consider the following example:

> logVector(-2:2)
Numbers not in range
Warning message:
In log(vec) : NaNs produced

In this example, we see a return value (“Numbers not in range”) and also a warning message. This message occurs because both conditions are evaluated and compared. The first condition returns a FALSE value, but the second condition generates a warning message because the function is attempting to calculate logs of negative numbers (which is not mathematically possible).

To remedy these issues, we can use the “control” versions of the & and | operators. This changes the flow so that the second condition is only evaluated if the result of the first is inconclusive. To use the “control” and/or statement, we use double notation (&& or ||). Let’s update our logVector function with “control” notation:

> logVector <- function(vec) {
+   if (all(vec > 0) && all(log(vec) <= 2)) cat("Numbers in range")
+   else cat("Numbers not in range")
+ }
> logVector(-2:2)
Numbers not in range

You can see that the earlier message has been avoided because we specified a “control and” using the && notation. Now, the flow of the condition is as follows:

Image The all(vec > 0) statement is evaluated, resulting in a FALSE value.

Image Because the first condition is FALSE, the whole statement must be FALSE, so a FALSE value is returned without evaluating the second condition.

Returning Early

Earlier in this hour, in the “Return Objects” section, you saw that the last evaluated line of code within a function generates the return value. Consider this example:

> verboseFunction <- function(X) {
+   if (all(X > 0)) output <- X   # if all values of X > 0, set output to X
+   else {
+     X [ X <= 0 ] <- 0.1         # Set all values <=0 to 0.1
+     output <- log(X)            # Take logs of the X input data, set as output
+   }
+   output                        # Return the value of output
+ }
> verboseFunction(-2:2)           # Call our function
[1] -2.3025851 -2.3025851 -2.3025851  0.0000000  0.6931472

If all the values of X are greater than 0, we set the output to 0. At this point in the function (that is, the first line of the body of the function) we already know the value we want to return from the function. If we wish to return the result of the function early, we can force this to happen using the return function. This way, we can rewrite our function as follows:

> verboseFunction <- function(X) {
+   if (all(X > 0)) return(X)     # Return early if all values of X are > 0
+
+   # Carry on if not returned already
+   X [ X <= 0 ] <- 0.1           # Set all values <=0 to 0.1
+   log(X)                        # Return the logged X values
+ }
> verboseFunction(-2:2)
[1] -2.3025851 -2.3025851 -2.3025851  0.0000000  0.6931472

This provides a clear, readable behavior where results are returned earlier in the function when certain conditions are met.

A Worked Example

So far in this hour, all our examples have been very simple (and, often, rather useless). This has been done to ensure we focus on the basic syntax of R functions, but at this point it is worth exploring a more complete and useful worked example to see the various components discussed in this hour in action.

The following function summarizes a numeric object, calculating a variety of statistics:

> summaryFun <- function(vec, digits = 3) {
+   N <- length(vec)                     # Calculate the number of values in "vec"
+   if (N == 0) return(NULL)             # Return NULL if "vec" is empty
+
+   testMissing <- is.na(vec)            # Look for missing values
+   if (all(testMissing)) {
+     output <- c( N = N, nMissing = N, pMissing = 100)
+     return(output)                     # Return simple summary if all missing
                                           values
+   }
+
+   nMiss <- sum(testMissing)            # Calculate the number of missing values
+   pMiss <- 100 * nMiss / N             # Calculate the percentage of missing values
+   vec <- vec [ !testMissing ]          # Remove missing values from the vector
+   someStats <- c(Mean = mean(vec), Median = median(vec), SD = sd(vec),
+       Min = min(vec), Max = max(vec))  # Calculate a number of statistics
+
+   output <- c(someStats, N = N, nMissing = nMiss, pMissing = pMiss)
+   round(output, digits = digits)
+ }

> summaryFun(c())                        # Empty Vector
NULL
> summaryFun(rep(NA, 10))                # Vector of missing values
       N nMissing pMissing
      10       10      100
> summaryFun(1:10)                       # Basic numeric vector
    Mean   Median       SD      Min      Max        N nMissing pMissing
   5.500    5.500    3.028    1.000   10.000   10.000    0.000    0.000
> summaryFun(airquality$Ozone)           # Vector containing missings
    Mean   Median       SD      Min      Max        N nMissing pMissing
  42.129   31.500   32.988    1.000  168.000  153.000   37.000   24.183

Summary

In this hour, we have covered the basic structure of an R function, and you have seen how to create simple functions of your own. In particular, you saw how to specify the function inputs, define what your functions “do” with those inputs, and how results are returned from your functions. Beyond this, we covered the if/else structure, which allows you to control the overall flow through a function.

In the next hour, we will use the skills you learned here to create more complex functions, including the use of error messaging and the checking of function inputs.

Q&A

Q. Is there a convention for naming functions in R?

A. During the history of R, a number of naming conventions have come and gone. The current convention (which I’ve followed in this hour) is to use camel-case starting with a lower case letter (e.g. myFunction). However, there are no specific rules as to how functions should be named.

Q. How do I load and share my functions?

A. Functions are R objects so, when created, they exist in the workspace of the current session. If you save that workspace and restart in the same working directory, your function (and other) objects should still exist. If you want to share with other users, or reuse your functions in other projects, we can do the following:

Image Save the function definitions as scripts, then open and re-execute them in other sessions.

Image Save your functions together in your own “package,” which can be shared and loaded into R (you’ll see how to do this in Hour 19, “Package Building”).

Q. Can I “globally assign” local objects so they can be seen later?

A. Yes, this can be achieved with the assign function. However, this practice is discouraged, and we recommend that any required results are passed back to the user in the manner discussed in this hour.

Q. What is the difference between the cat and print functions?

A. In this section, we make heavy use of the cat function to demonstrate the flow of a function when using if/else statements. The cat function simply prints the value of an object without printing the structure of that object. The print function also returns the structure of the object. This can be seen with a simple example:

> cat("Hello")
Hello
> print("Hello")
[1] "Hello"

Q. How do missing values impact “conditions”?

A. If the condition results in a single missing value, then an error is returned:

> testMissing <- function(X) {
+   if (X > 0) cat("Success")
+ }
> testMissing(NA)
Error in if (X > 0) cat("Success") :
  missing value where TRUE/FALSE needed

If you use the all function with a condition that contains any missing values, the result is missing, which will also result in an error (because you do not know if “all” the conditions are met):

> allMissings <- rep(NA, 5)   # All missing values
> someMissings <- c(NA, 1:4)  # Some missing values
> all(allMissings > 0)
[1] NA
> all(someMissings > 0)
[1] NA

If you use the any function with a condition that contains all missing values, the result is a missing value. If, however, you use the any function with a vector where not all values are missing, some conditions may be met:

> any(allMissings > 0)
[1] NA
> any(someMissings > 0)
[1] TRUE

Workshop

The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.

Quiz

1. How do you specify default inputs to a function?

2. What value will be held in the result1 object when the following code is executed?

> qaFun <- function(X) {
+   addOne <- X + 1
+   minusOne <- X - 1
+   addOne
+   minusOne
+ }
> result1 <- qaFun(1)

3. What value will be held in the result2 object when the following code is executed?

> qaFun <- function(X) {
+   addOne <- X + 1
+   minusOne <- X - 1
+   c(ADD = addOne, MINUS = minusOne)
+ }
> result2 <- qaFun(1)

4. When you specify an if/else statement, what object should the “condition” (that is, the statement within the if call) return?

5. What is the difference between all(X > 0) and !all(X > 0)?

6. What is the difference between & and && when used in a condition?

7. What function can you use to return an object early (that is, before the last line of the function)?

Answers

1. You specify default values directly in the input statement with “equals” (for example, function(x = 1)).

2. The result1 object will contain a 0, because only the last line is returned (the value of minusOne, created as X – 1 = 0).

3. The result2 object will contain a vector of length 2, containing the values 2 and 0. The elements of the vector will be named ADD and MINUS.

4. The condition should return a single logical value. If multiple logical values are returned, unexpected behaviors can occur.

5. The all function returns a TRUE value if all the values of X are greater than 0 (and non-missing). The ! prefix in !all reverses the logical values, so this would return a TRUE if “not all” values of X are greater than 0 (that is, at least one is less than or equal to 0).

6. When you use a single &, the conditions each side of the & are evaluated and the outputs compared to see whether both conditions are met. Therefore, if you have test1 & test2, both test1 and test2 are evaluated, then they are compared. If instead you use the “control” && (for example, in test1 && test2), then the first condition (test1) is evaluated, and the second condition (test2) is only evaluated if the first condition is TRUE.

7. You can use the return function to return a result earlier in the function call.

Activities

1. Create a function that accepts two inputs (X and Y) and returns the value of X + Y. Test your function by calling it with X and Y inputs.

2. Update your function so that Y has a default value. Test your function by calling it with only an X input, then try specifying a value for Y.

3. Create a function called firstLast that accepts a vector and returns the first and last values. Test your function.

4. Update your firstLast function so that, if the vector input only has a single value (that is, it is of length 1), only that single value is returned.

5. Update your firstLast function so that, if all values of the vector are less than zero, a message is printed to the user informing him or her of this fact.

6. Update your firstLast function so that, if any values of the vector are missing, the first value, last value, and the number of missing values are returned to the user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset