It is common that the logic of a program is not perfectly sequential but contains several branches dependent on certain conditions. Therefore, one of the most basic constructs of a typical programming language is its conditional expressions. In R, if
can be used to branch the logic flow by logical conditions.
Like many other programming languages, the if
expression works with a logical condition. In R, a logical condition is represented by an expression producing a single-element logical vector. For example, we can write a simple function check_positive
that returns 1
if a positive number is provided and nothing otherwise:
check_positive <- function(x) { if (x > 0) { return(1) } }
In the preceding function, x > 0
is the condition to check. If the condition is satisfied, then the function returns 1
. Let's verify the function with various inputs:
check_positive(1) ## [1] 1 check_positive(0)
It seems that the function works as expected. If we add some else if
and else
branches, the function can be generalized as the sign function that returns 1 for positive input, -1
for negative input, and 0
for 0:
check_sign <- function(x) { if (x > 0) { return(1) } else if (x < 0) { return(-1) } else { return(0) } }
The preceding function has the same functionality as the built-in function sign()
. To verify its logic, just call it with different inputs with full coverage of the conditional branches:
check_sign(15) ## [1] 1 check_sign(-3.5) ## [1] -1 check_sign(0) ## [1] 0
The function does not need to return anything. We can also perform actions that return nothing (more accurately, NULL
) depending on various conditions. The following function always does not explicitly return a value, but it sends a message in the console. The kind of message depends on the sign of the input number:
say_sign <- function(x) { if (x > 0) { cat("The number is greater than 0") } else if (x < 0) { cat("The number is less than 0") } else { cat("The number is 0") } }
We can use a similar method, that is say_sign()
, to test its logic:
say_sign(0) ## The number is 0 say_sign(3) ## The number is greater than 0 say_sign(-9) ## The number is less than 0
The workflow for evaluating if
statement branches is quite straightforward:
cond1
in the first if (cond1) { expr1 }
.cond1
is TRUE
, then evaluate its corresponding expression { expr1 }
. Otherwise, evaluate the cond2
condition in the next else if (cond2)
branch and so forth.if
and else if
branches are violated, then evaluate the expression in the else
branch, if any.According to the workflow, an if
statement can be more flexible than you might think. For example, an if
statement can be in one of the following forms.
The simplest form is a simple if
statement branch:
if (cond1) { # do something }
A more complete form is with an else
branch that deals with situations where cond1
is not TRUE
:
if (cond1) { # do something } else { # do something else }
An even more complex form is with one or more else if
branches:
if (cond1) { expr1 } else if (cond2) { expr2 } else if (cond3) { expr3 } else { expr4 }
In the preceding conditional branches, the branch conditions (cond1
, cond2
, and cond3
) may or may not be related. For example, the simple grading policy perfectly fits the branching logic in the preceding template in which each branch condition is a slice of the score range:
grade <- function(score) { if (score >= 90) { return("A") } else if (score >= 80) { return("B") } else if (score >= 70) { return("C") } else if (score >= 60) { return("D") } else { return("F") } } c(grade(65), grade(59), grade(87), grade(96)) ## [1] "D" "F" "B" "A"
In this case, each branch condition in else if
actually implicitly assumes that the previous condition does not hold; that is, score >= 80
actually means score < 90
and
score >= 80
, which is dependent on previous conditions. As a result, we cannot switch the order of these branches without explicitly stating the assumptions and making all branches independent.
Let's assume we switch some of the branches:
grade2 <- function(score) { if (score >= 60) { return("D") } else if (score >= 70) { return("C") } else if (score >= 80) { return("B") } else if (score >= 90) { return("A") } else { return("F") } } c(grade2(65), grade2(59), grade2(87), grade2(96)) ## [1] "D" "F" "D" "D"
It is obvious that only grade(59)
got the right grade and all others are broken. To fix the function without reordering the conditions, we need to rewrite the condition so that they do not depend on the order of evaluation:
grade2 <- function(score) { if (score >= 60 && score < 70) { return("D") } else if (score >= 70 && score < 80) { return("C") } else if (score >= 80 && score < 90) { return("B") } else if (score >= 90) { return("A") } else { return("F") } } c(grade2(65), grade2(59), grade2(87), grade2(96)) ## [1] "D" "F" "B" "A"
This makes the function much more verbose than the first correct version. Therefore, it is important to figure out the correct order for branch conditions and be careful of the dependency of each branch.
Fortunately, R provides convenient functions such as cut()
, which does exactly the same thing. Read the documentation by typing in ?cut
for more details.
Since if
is in essence a primitive function, its returned value is the value of the expression in the branch whose condition is satisfied. Therefore, if
can be used as an inline expression too. Take the check_positive()
method for example. Rather than writing return()
in the conditional expression, we can also return the value of the if
statement expression in the function body to achieve the same goal:
check_positive <- function(x) { return(if (x > 0) { 1 }) }
In fact, the expression syntax can to be simplified to merely one line:
check_positive <- function(x) { return(if (x > 0) 1) }
Since the return value of a function is the value of its last expression in the function body, return()
can be removed in this case:
check_positive <- function(x) { if (x > 0) 1 }
The same principle also applies to the check_sign()
method. A simpler form of check_sign()
is as follows:
check_sign <- function(x) { if (x > 0) 1 else if (x < 0) -1 else 0 }
To explicitly get the value of the if
expression, we can implement a grade reporting function that mentions the grade of a student, given the student name and their score:
say_grade <- function(name, score) { grade <- if (score >= 90) "A" else if (score >= 80) "B" else if (score >= 70) "C" else if (score >= 60) "D" else "F" cat("The grade of", name, "is", grade) } say_grade("Betty", 86) ## The grade of Betty is B
Using the if
statement as an expression seems more compact and less verbose. However, in practice, it is rarely the case that all conditions are simple numeric comparisons and return simple values. For more complex conditions and branching, I suggest that you use if
as a statement to clearly state different branches and do not omit {}
to avoid unnecessary mistakes. The following function is a bad example:
say_grade <- function(name, score) { if (score >= 90) grade <- "A" cat("Congratulations! ") else if (score >= 80) grade <- "B" else if (score >= 70) grade <- "C" else if (score >= 60) grade <- "D" else grade <- "F" cat("What a pity! ") cat("The grade of", name, "is", grade) }
The function author wants to add something to say to some branches. Without {}
brackets around the branch expression, you are very likely to write code with syntax errors when you add more behaviors to conditional branches. If you evaluate the preceding code in the console, you will get enough errors to confuse you for a while:
>say_grade <- function(name, score) { + if (score >= 90) grade <- "A" + cat("Congratulations! ") + else if (score >= 80) grade <- "B" Error: unexpected 'else' in: " cat("Congratulations! ") else" > else if (score >= 70) grade <- "C" Error: unexpected 'else' in " else" > else if (score >= 60) grade <- "D" Error: unexpected 'else' in " else" > else grade <- "F" Error: unexpected 'else' in " else" > cat("What a pity! ") What a pity! > cat("The grade of", name, "is", grade) Error in cat("The grade of", name, "is", grade) : object 'name' not found > } Error: unexpected '}' in "}"
A better form of the function that avoids such potential pitfalls is as follows:
say_grade <- function(name, score) { if (score >= 90) { grade <- "A" cat("Congratulations! ") } else if (score >= 80) { grade <- "B" } else if (score >= 70) { grade <- "C" } else if (score >= 60) { grade <- "D" } else { grade <- "F" cat("What a pity! ") } cat("The grade of", name, "is", grade) } say_grade("James", 93) ## Congratulations! ## The grade of James is A
The function seems a bit more verbose, but it is more robust to changes and clearer in its logic. Remember, it is always better to be correct than short.
All the example functions created earlier only work with a single-value input. If we provide a vector, the functions will produce warnings because if does not work with multi-element vectors:
check_positive(c(1, -1, 0)) ## Warning in if (x > 0) 1: the condition has length > 1 and only the first ## element will be used ## [1] 1
From the preceding output, we can see that the if
statement ignores all but the first element, if a multi-element logical vector is supplied:
num <- c(1, 2, 3) if (num > 2) { cat("num > 2!") } ## Warning in if (num > 2) {: the condition has length > 1 and only the first ## element will be used
The expression throws a warning saying that only the first element (1 > 2
) will be used. In fact, its logic is unclear when we try to condition an expression on a logical vector since its values can be mixed up with TRUE
and FALSE
values.
Some logical functions are useful to avoid such ambiguity. For example, the any()
method returns TRUE
if at least one element in the given vector is TRUE
:
any(c(TRUE, FALSE, FALSE)) ## [1] TRUE any(c(FALSE, FALSE)) ## [1] FALSE
Therefore, if what we really mean is to print the message if any single value is greater than 2
, we should call the any()
method in the condition:
if (any(num > 2)) { cat("num > 2!") } ## num > 2!
If we mean to print the first message if all values are greater than 2
, we should instead call the all()
method:
if (all(num > 2)) { cat("num > 2!") } else { cat("Not all values are greater than 2!") } ## Not all values are greater than 2!
Therefore, every time we use an if
expression to branch the workflow, we should ensure that the condition is a single-value logical vector. Otherwise, something unexpected may happen.
Another exception is NA
, which is also a single-value logical vector but can cause an error as an if
condition without notice:
check <- function(x) { if (all(x > 0)) { cat("All input values are positive!") } else { cat("Some values are not positive!") } }
The check()
function works perfectly for typical numeric vectors with no missing values. However, if argument x
contains a missing value, the function may end up in an error:
check(c(1, 2, 3)) ## All input values are positive! check(c(1, 2, NA, -1)) ## Some values are not positive! check(c(1, 2, NA)) ## Error in if (all(x > 0)) {: missing value where TRUE/FALSE needed
From this example, we should be careful of missing values when we write if
conditions. If the logic is complicated and the input data is diverse, you cannot easily walk around handling missing values in appropriate ways. Note that the any()
and all()
methods both accept na.rm
to handle missing values. We should take this into account too when writing conditions.
One way to simplify condition checking is to use isTRUE(x)
, which calls
identical(TRUE, x)
internally. In this case, only a single TRUE
value will meet the condition and all other values will not.
An alternate method to branch a computation is ifelse()
. This function accepts a logical vector as the test condition and returns a vector. For each element in the logical test condition, if the value is TRUE
, then the corresponding element in the second argument yes
will be chosen. If the value is FALSE
, then the corresponding element in the third argument no
will be chosen. In other words, ifelse()
is the vectorized version of if
, as demonstrated here:
ifelse(c(TRUE, FALSE, FALSE), c(1, 2, 3), c(4, 5, 6)) ## [1] 1 5 6
Since the yes
and no
arguments can be recycled, we can rewrite check_positive()
using ifelse()
:
check_positive2 <- function(x) { ifelse(x, 1, 0) }
One difference between check_positive()
(using the if
statement) and check_positive2()
(using ifelse
) is subtle: check_positive(-1)
does not return values explicitly, but chek_positive2(-1)
returns 0. The if
statement does not necessarily return a value explicitly by using only one if
but not else
. By contrast, ifelse()
always returns a vector because you have to specify the values in both yes
and no
arguments.
Another reminder is that ifelse()
and if
are not always able to achieve the same goal if you simply replace one with the other. For example, imagine you want to return a two-element vector according to a condition. Let's assume we use ifelse()
:
ifelse(TRUE, c(1,2), c(2,3)) ## [1] 1
Only the first element of the yes
argument is returned. If you want to return the yes
argument, you need to modify the condition to c(TRUE, TRUE)
, which looks a bit unnatural.
If we use if
, then the expression looks much more natural:
if (TRUE) c(1,2) else c(2,3) ## [1] 1 2
If the demand is a vectorized input and output, then another problem is that, if the yes
argument is a numeric vector and the no
argument is a character vector, a condition with mixed TRUE
and FALSE
values will coerce all elements in the output vector to be able to represent all values. Thus, a character vector is produced:
ifelse(c(TRUE, FALSE), c(1, 2), c("a", "b")) ## [1] "1" "b"
In contrast with if, which deals with TRUE
and FALSE
conditions, switch
works with a number or a string and chooses a branch to return according to the input.
Suppose the input is an integer n
. The switch
keyword works in a way that returns the value of the nth arguments in following the first argument:
switch(1, "x", "y") ## [1] "x" switch(2, "x", "y") ## [1] "y"
If the input integer is out of bounds and does not match any given argument, no visible value is explicitly returned (in fact, an invisible NULL
is returned):
switch(3, "x", "y")
The switch()
method has a different behavior when working with string input. It returns the value of the first argument that matches its name with the input:
switch("a", a = 1, b = 2) ## [1] 1 switch("b", a = 1, b = 2) ## [1] 2
For the first switch
, a = 1
matches the variable a
. For the second, b = 2
matches the variable b
. If no argument matches the input, an invisible NULL
value will be returned:
switch("c", a = 1, b = 2)
To cover all possibilities, we can add a last argument (without argument name) that captures all other inputs:
switch("c", a = 1, b = 2, 3) ## [1] 3
Compared to the ifelse()
method, switch()
behaves more like if()
method. It only accepts a single value input (number of string) but it can return anything:
switch_test <- function(x) { switch(x, a = c(1, 2, 3), b = list(x = 0, y = 1), c = { cat("You choose c! ") list(name = "c", value = "something") }) } switch_test("a") ## [1] 1 2 3 switch_test("b") ## $x ## [1] 0 ## ## $y ## [1] 1 switch_test("c") ## You choose c! ## $name ## [1] "c" ## ## $value ## [1] "something"
In conclusion, if
, ifelse()
, and switch()
have slightly different behaviors. You should apply them in different situations accordingly.