Conditional expressions

It is common that the logic of a program is not perfectly sequential but contains several branches dependent on certain conditions. Therefore, one of the most basic constructs of a typical programming language is its conditional expressions. In R, if can be used to branch the logic flow by logical conditions.

Using if as a statement

Like many other programming languages, the if expression works with a logical condition. In R, a logical condition is represented by an expression producing a single-element logical vector. For example, we can write a simple function check_positive that returns 1 if a positive number is provided and nothing otherwise:

check_positive <- function(x) { 
  if (x > 0) { 
    return(1) 
  } 
} 

In the preceding function, x > 0 is the condition to check. If the condition is satisfied, then the function returns 1. Let's verify the function with various inputs:

check_positive(1)
## [1] 1
check_positive(0)

It seems that the function works as expected. If we add some else if and else branches, the function can be generalized as the sign function that returns 1 for positive input, -1 for negative input, and 0 for 0:

check_sign <- function(x) { 
  if (x > 0) { 
    return(1) 
  } else if (x < 0) { 
    return(-1) 
  } else { 
    return(0) 
  } 
} 

The preceding function has the same functionality as the built-in function sign(). To verify its logic, just call it with different inputs with full coverage of the conditional branches:

check_sign(15)
## [1] 1
check_sign(-3.5)
## [1] -1
check_sign(0)
## [1] 0 

The function does not need to return anything. We can also perform actions that return nothing (more accurately, NULL) depending on various conditions. The following function always does not explicitly return a value, but it sends a message in the console. The kind of message depends on the sign of the input number:

say_sign <- function(x) { 
  if (x > 0) { 
    cat("The number is greater than 0") 
  } else if (x < 0) { 
    cat("The number is less than 0") 
  } else { 
    cat("The number is 0") 
  } 
} 

We can use a similar method, that is say_sign(), to test its logic:

say_sign(0)
## The number is 0
say_sign(3)
## The number is greater than 0
say_sign(-9)
## The number is less than 0 

The workflow for evaluating if statement branches is quite straightforward:

  1. First, evaluate cond1 in the first if (cond1) { expr1 }.
  2. If cond1 is TRUE, then evaluate its corresponding expression { expr1 }. Otherwise, evaluate the cond2 condition in the next else if (cond2) branch and so forth.
  3. If the conditions in all if and else if branches are violated, then evaluate the expression in the else branch, if any.

According to the workflow, an if statement can be more flexible than you might think. For example, an if statement can be in one of the following forms.

The simplest form is a simple if statement branch:

if (cond1) { 
  # do something 
} 

A more complete form is with an else branch that deals with situations where cond1 is not TRUE:

if (cond1) { 
  # do something 
} else { 
  # do something else 
} 

An even more complex form is with one or more else if branches:

if (cond1) { 
  expr1 
} else if (cond2) { 
  expr2 
} else if (cond3) { 
  expr3 
} else { 
  expr4 
} 

In the preceding conditional branches, the branch conditions (cond1cond2, and cond3) may or may not be related. For example, the simple grading policy perfectly fits the branching logic in the preceding template in which each branch condition is a slice of the score range:

grade <- function(score) {
  if (score >= 90) {
    return("A")
  } else if (score >= 80) {
    return("B")
  } else if (score >= 70) {
    return("C")
  } else if (score >= 60) {
    return("D")
  } else {
    return("F")
  }
}
c(grade(65), grade(59), grade(87), grade(96))
## [1] "D" "F" "B" "A" 

In this case, each branch condition in else if actually implicitly assumes that the previous condition does not hold; that is, score >= 80 actually means score < 90 and score >= 80, which is dependent on previous conditions. As a result, we cannot switch the order of these branches without explicitly stating the assumptions and making all branches independent.

Let's assume we switch some of the branches:

grade2 <- function(score) {
  if (score >= 60) {
    return("D")
  } else if (score >= 70) {
    return("C")
  } else if (score >= 80) {
    return("B")
  } else if (score >= 90) {
    return("A")
  } else {
    return("F")
  }
}
c(grade2(65), grade2(59), grade2(87), grade2(96))
## [1] "D" "F" "D" "D" 

It is obvious that only grade(59) got the right grade and all others are broken. To fix the function without reordering the conditions, we need to rewrite the condition so that they do not depend on the order of evaluation:

grade2 <- function(score) {
  if (score >= 60 && score < 70) {
    return("D")
  } else if (score >= 70 && score < 80) {
    return("C")
  } else if (score >= 80 && score < 90) {
    return("B")
  } else if (score >= 90) {
    return("A")
  } else {
    return("F")
  }
}
c(grade2(65), grade2(59), grade2(87), grade2(96))
## [1] "D" "F" "B" "A" 

This makes the function much more verbose than the first correct version. Therefore, it is important to figure out the correct order for branch conditions and be careful of the dependency of each branch.

Fortunately, R provides convenient functions such as cut(), which does exactly the same thing. Read the documentation by typing in ?cut for more details.

Using if as an expression

Since if is in essence a primitive function, its returned value is the value of the expression in the branch whose condition is satisfied. Therefore, if can be used as an inline expression too. Take the check_positive() method for example. Rather than writing return() in the conditional expression, we can also return the value of the if statement expression in the function body to achieve the same goal:

check_positive <- function(x) { 
  return(if (x > 0) { 
    1 
  }) 
} 

In fact, the expression syntax can to be simplified to merely one line:

check_positive <- function(x) { 
  return(if (x > 0) 1) 
} 

Since the return value of a function is the value of its last expression in the function body, return() can be removed in this case:

check_positive <- function(x) { 
  if (x > 0) 1 
} 

The same principle also applies to the check_sign() method. A simpler form of check_sign() is as follows:

check_sign <- function(x) { 
  if (x > 0) 1 else if (x < 0) -1 else 0 
} 

To explicitly get the value of the if expression, we can implement a grade reporting function that mentions the grade of a student, given the student name and their score:

say_grade <- function(name, score) {
  grade <- if (score >= 90) "A"
    else if (score >= 80) "B"
    else if (score >= 70) "C"
    else if (score >= 60) "D"
    else "F"
  cat("The grade of", name, "is", grade)
}
say_grade("Betty", 86)
## The grade of Betty is B 

Using the if statement as an expression seems more compact and less verbose. However, in practice, it is rarely the case that all conditions are simple numeric comparisons and return simple values. For more complex conditions and branching, I suggest that you use if as a statement to clearly state different branches and do not omit {} to avoid unnecessary mistakes. The following function is a bad example:

say_grade <- function(name, score) { 
  if (score >= 90) grade <- "A" 
  cat("Congratulations!
") 
  else if (score >= 80) grade <- "B" 
  else if (score >= 70) grade <- "C" 
  else if (score >= 60) grade <- "D" 
  else grade <- "F" 
  cat("What a pity!
") 
  cat("The grade of", name, "is", grade) 
} 

The function author wants to add something to say to some branches. Without {} brackets around the branch expression, you are very likely to write code with syntax errors when you add more behaviors to conditional branches. If you evaluate the preceding code in the console, you will get enough errors to confuse you for a while:

>say_grade <- function(name, score) { 
+   if (score >= 90) grade <- "A" 
+   cat("Congratulations!
") 
+   else if (score >= 80) grade <- "B" 
Error: unexpected 'else' in: 
"  cat("Congratulations!
") 
  else" 
>   else if (score >= 70) grade <- "C" 
Error: unexpected 'else' in "  else" 
>   else if (score >= 60) grade <- "D" 
Error: unexpected 'else' in "  else" 
>   else grade <- "F" 
Error: unexpected 'else' in "  else" 
>   cat("What a pity!
") 
What a pity! 
>   cat("The grade of", name, "is", grade) 
Error in cat("The grade of", name, "is", grade) : object 'name' not found 
> } 
Error: unexpected '}' in "}" 

A better form of the function that avoids such potential pitfalls is as follows:

say_grade <- function(name, score) {
  if (score >= 90) {
    grade <- "A"
    cat("Congratulations!
")
  } else if (score >= 80) {
    grade <- "B"
  }
  else if (score >= 70) {
    grade <- "C"
  }
  else if (score >= 60) {
    grade <- "D"
  } else {
    grade <- "F"
    cat("What a pity!
")
  }
  cat("The grade of", name, "is", grade)
}
say_grade("James", 93)
## Congratulations! 
## The grade of James is A 

The function seems a bit more verbose, but it is more robust to changes and clearer in its logic. Remember, it is always better to be correct than short.

Using if with vectors

All the example functions created earlier only work with a single-value input. If we provide a vector, the functions will produce warnings because if does not work with multi-element vectors:

check_positive(c(1, -1, 0))
## Warning in if (x > 0) 1: the condition has length > 1 and only the first
## element will be used
## [1] 1 

From the preceding output, we can see that the if statement ignores all but the first element, if a multi-element logical vector is supplied:

num <- c(1, 2, 3)
if (num > 2) {
cat("num > 2!")
}
## Warning in if (num > 2) {: the condition has length > 1 and only the first 
## element will be used 

The expression throws a warning saying that only the first element (1 > 2) will be used. In fact, its logic is unclear when we try to condition an expression on a logical vector since its values can be mixed up with TRUE and FALSE values.

Some logical functions are useful to avoid such ambiguity. For example, the any() method returns TRUE if at least one element in the given vector is TRUE:

any(c(TRUE, FALSE, FALSE))
## [1] TRUE
any(c(FALSE, FALSE))
## [1] FALSE 

Therefore, if what we really mean is to print the message if any single value is greater than 2, we should call the any() method in the condition:

if (any(num > 2)) {
  cat("num > 2!")
}
## num > 2! 

If we mean to print the first message if all values are greater than 2, we should instead call the all() method:

if (all(num > 2)) {
  cat("num > 2!")
} else {
  cat("Not all values are greater than 2!")
}
## Not all values are greater than 2! 

Therefore, every time we use an if expression to branch the workflow, we should ensure that the condition is a single-value logical vector. Otherwise, something unexpected may happen.

Another exception is NA, which is also a single-value logical vector but can cause an error as an if condition without notice:

check <- function(x) { 
  if (all(x > 0)) { 
    cat("All input values are positive!") 
  } else { 
    cat("Some values are not positive!") 
  } 
} 

The check() function works perfectly for typical numeric vectors with no missing values. However, if argument x contains a missing value, the function may end up in an error:

check(c(1, 2, 3))
## All input values are positive!
check(c(1, 2, NA, -1))
## Some values are not positive!
check(c(1, 2, NA))
## Error in if (all(x > 0)) {: missing value where TRUE/FALSE needed 

From this example, we should be careful of missing values when we write if conditions. If the logic is complicated and the input data is diverse, you cannot easily walk around handling missing values in appropriate ways. Note that the any() and all() methods both accept na.rm to handle missing values. We should take this into account too when writing conditions.

One way to simplify condition checking is to use isTRUE(x), which calls  identical(TRUE, x) internally. In this case, only a single TRUE value will meet the condition and all other values will not.

Using vectorized if: ifelse

An alternate method to branch a computation is ifelse(). This function accepts a logical vector as the test condition and returns a vector. For each element in the logical test condition, if the value is TRUE, then the corresponding element in the second argument yes will be chosen. If the value is FALSE, then the corresponding element in the third argument no will be chosen. In other words, ifelse() is the vectorized version of if, as demonstrated here:

ifelse(c(TRUE, FALSE, FALSE), c(1, 2, 3), c(4, 5, 6))
## [1] 1 5 6 

Since the yes and no arguments can be recycled, we can rewrite check_positive() using ifelse():

check_positive2 <- function(x) { 
  ifelse(x, 1, 0) 
} 

One difference between check_positive() (using the if statement) and check_positive2() (using ifelse) is subtle: check_positive(-1) does not return values explicitly, but chek_positive2(-1) returns 0. The if statement does not necessarily return a value explicitly by using only one if but not else. By contrast, ifelse() always returns a vector because you have to specify the values in both yes and no arguments.

Another reminder is that ifelse() and if are not always able to achieve the same goal if you simply replace one with the other. For example, imagine you want to return a two-element vector according to a condition. Let's assume we use ifelse():

ifelse(TRUE, c(1,2), c(2,3))
## [1] 1

Only the first element of the yes argument is returned. If you want to return the yes argument, you need to modify the condition to c(TRUE, TRUE), which looks a bit unnatural.

If we use if, then the expression looks much more natural:

if (TRUE) c(1,2) else c(2,3)
## [1] 1 2

If the demand is a vectorized input and output, then another problem is that, if the yes argument is a numeric vector and the no argument is a character vector, a condition with mixed TRUE and FALSE values will coerce all elements in the output vector to be able to represent all values. Thus, a character vector is produced:

ifelse(c(TRUE, FALSE), c(1, 2), c("a", "b"))
## [1] "1" "b"

Using switch to branch values

In contrast with if, which deals with TRUE and FALSE conditions, switch works with a number or a string and chooses a branch to return according to the input.

Suppose the input is an integer n. The switch keyword works in a way that returns the value of the nth arguments in following the first argument:

switch(1, "x", "y")
## [1] "x"
switch(2, "x", "y")
## [1] "y" 

If the input integer is out of bounds and does not match any given argument, no visible value is explicitly returned (in fact, an invisible NULL is returned):

switch(3, "x", "y") 

The switch() method has a different behavior when working with string input. It returns the value of the first argument that matches its name with the input:

switch("a", a = 1, b = 2)
## [1] 1
switch("b", a = 1, b = 2)
## [1] 2 

For the first switcha = 1 matches the variable a. For the second, b = 2 matches the variable b. If no argument matches the input, an invisible NULL value will be returned:

switch("c", a = 1, b = 2) 

To cover all possibilities, we can add a last argument (without argument name) that captures all other inputs:

switch("c", a = 1, b = 2, 3)
## [1] 3 

Compared to the ifelse() method, switch() behaves more like if() method. It only accepts a single value input (number of string) but it can return anything:

switch_test <- function(x) {
  switch(x,
    a = c(1, 2, 3),
    b = list(x = 0, y = 1),
    c = {
      cat("You choose c!
")
      list(name = "c", value = "something")
    })
}
switch_test("a")
## [1] 1 2 3
switch_test("b")
## $x
## [1] 0
##
## $y
## [1] 1
switch_test("c")
## You choose c!
## $name 
## [1] "c" 
##  
## $value 
## [1] "something" 

In conclusion, ififelse(), and switch() have slightly different behaviors. You should apply them in different situations accordingly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset