Using logical functions

A logical vector only takes TRUE or FALSE and is mostly used to filter data. In practice, it is common to create joint conditions by multiple logical vectors where a number of logical operators and functions may involve.

Logical operators

Like many other programming languages, R enables a few operators to do basic logical calculations. The following table demonstrates what they do:

Symbol

Description

Example

Result

&

Vectorized AND

c(T, T) & c(T, F)

c(TRUE, FALSE)

|

Vectorized OR

c(T, T) | c(T, F)

c(TRUE, TRUE)

&&

Univariate AND

c(T, T) && c(F, T)

FALSE

||

Univariate OR

c(T, T) || c(F, T)

TRUE

!

Vectorized NOT

!c(T, F)

c(FALSE, TRUE)

%in%

Vectorized IN

c(1, 2) %in% c(1, 3, 4, 5)

c(TRUE, FALSE)

Note that in an if expression, && and || are often used to perform logical calculations that are only needed to yield a single-element logical vector. However, the potential risk of using && is that if it is made to work with multi-element vectors, it will silently ignore all but the first element of the vectors on both sides. The following example demonstrates the difference in behavior of using either && or & in conditional statements.

The following code creates a test_direction function that tells the monotonicity of supplied argument values. We'll build on this example through the next section. If the values of xy, and z are monotonically increasing, the function returns 1; if they are monotonically decreasing, the function returns -1. Otherwise, it returns 0. Note that the function uses & to perform a vectorized AND operation:

test_direction <- function(x, y, z) {
  if (x < y & y < z) 1
  else if (x > y & y > z) -1
  else 0
} 

If the arguments are supplied scalar numbers, the function works perfectly:

test_direction(1, 2, 3)
## [1] 1 

Note that & performs a vectorized calculation and thus returns a multi-element vector if one argument has more than one element. However, if only works with a single-value logical vector; otherwise, it would produce a warning:

test_direction(c(1, 2), c(2, 3), c(3, 4))
## Warning in if (x < y & y < z) 1 else if (x > y & y > z)
## -1 else 0: the condition has length > 1 and only the first
## element will be used
## [1] 1 

If we replace both & operators present in test_direction2 with && and create a new function test_direction2, the function would look as follows:

test_direction2 <- function(x, y, z) {
  if (x < y && y < z) 1
  else if (x > y && y > z) -1
  else 0
} 

Then, the two example test cases may have different behaviors. For scalar input, the behavior of the two versions are exactly the same:

test_direction2(1, 2, 3)
## [1] 1 

However, for multi-element input, test_direction2 silently ignores the second element of each input vector and thus does not produce any warnings:

test_direction2(c(1, 2), c(2, 3), c(3, 4))
## [1] 1 

Finally, which is the correct use, & or &&? It all depends on your demand. What behavior do you expect under all circumstances? What do you expect if the input is scalar values or multi-element vectors? If you expect the function to tell you whether all elements in the same position of each input vector have monotonicity, then both uses are incorrect in part and require the use of logical aggregation functions, to be introduced in the next section.

Logical functions

In this section, we will look at aggregating logical vectors and finding the true elements.

Aggregating logical vectors

In addition to the binary logical operators, a few logical aggregation functions are very useful, as we mentioned earlier.

The most commonly used two logical aggregation functions are any() and all(). The any() function returns TRUE if any (for example, at least one) element of the given logical vector is TRUE; otherwise, it will return FALSE. The all() function returns TRUE if all elements of the given logical vector are TRUE; otherwise, it will return FALSE:

x <- c(-2, -3, 2, 3, 1, 0, 0, 1, 2)
any(x > 1)
## [1] TRUE
all(x <= 1)
## [1] FALSE 

One common point of the two functions is that they only return a single TRUE or FALSE value and never return a multi-element logical vector. Therefore, to implement a function that meets the demand in the previous section, use all() and & together in the if conditions:

test_all_direction <- function(x, y, z) {
  if (all(x < y & y < z)) 1
  else if (all(x > y & y > z)) -1
  else 0
} 

For scalar input, test_all_direction() behaves exactly the same with the test_direction() and test_direction2() functions:

test_all_direction(1, 2, 3)
## [1] 1 

For vector input, the function tests whether c(1, 2, 3) and c(2, 3, 4) have (the same) monotonicity:

test_all_direction(c(1, 2), c(2, 3), c(3, 4))
## [1] 1 

The following code is a counterexample in which the elements at position 2, that is, c(2, 4, 4), have no monotonicity:

test_all_direction(c(1, 2), c(2, 4), c(3, 4))
## [1] 0 

The value returned by the function is thus meaningful because it correctly implements the demand of testing whether all elements at each position in the three input vectors have monotonicity.

The function has several possible variations that instead uses any() or &&. You may try to figure out the underlying demand (what are these functions trying to do?) of each of the following versions:

test_any_direction <- function(x, y, z) {
  if (any(x < y & y < z)) 1
  else if (any(x > y & y > z)) -1
  else 0
}
test_all_direction2 <- function(x, y, z) {
  if (all(x < y) && all(y < z)) 1
  else if (all(x > y) && all(y > z)) -1
  else 0
}
test_any_direction2 <- function(x, y, z) {
  if (any(x < y) && any(y < z)) 1
  else if (any(x > y) && any(y > z)) -1
  else 0
} 

Asking which elements are TRUE

The logical operations we introduced earlier usually return a logical vector to indicate whether a certain condition is TRUE or FALSE. It is also useful to know which elements satisfy those conditions. The which() function returns the positions (or indices) of TRUE elements in a logical vector:

x
## [1] -2 -3 2 3 1 0 0 1 2
abs(x) >= 1.5
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE
which(abs(x) >= 1.5)
## [1] 1 2 3 4 9 

If we take a closer look at what happens, it should be clear that at first, abs(x) >= 1.5 is evaluated to be a logical vector, and then, which() returns the positions of those TRUE elements in that logical vector.

The mechanism is quite similar when we use a logical condition to filter elements from a vector or list:

x[x >= 1.5]
## [1] 2 3 2 

In the preceding example, x >= 1.5 is evaluated to be a logical vector. Then, it is used to select elements in x corresponding to TRUE values.

A special case is that we can even use a logical vector with all FALsE values. A zero-length numeric vector is returned since the logical vector only contains FALSE values, and thus, no element in x is singled out:

x[x >= 100]
## numeric(0) 

Dealing with missing values

Real-world data often contains missing values represented by NA. The following numeric vector is a simple example:

x <- c(-2, -3, NA, 2, 3, 1, NA, 0, 1, NA, 2) 

Arithmetic calculations with missing values also produce missing values:

x + 2
##  [1]  0 -1 NA  4  5  3 NA  2  3 NA  4 

To take this into account, a logical vector has to accept not only TRUE and FALSE values but also NA values to represent unknown truthfulness:

x > 2
## [1] FALSE FALSE NA FALSE TRUE FALSE NA FALSE FALSE
## [10]    NA FALSE 

As a consequence, logical aggregation functions such as any() and all() have to deal with missing values too:

x
## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2
any(x > 2)
## [1] TRUE
any(x < -2)
## [1] TRUE
any(x < -3)
## [1] NA 

The preceding output demonstrates the default behavior of any() when it deals with a logical vector that contains missing values. More specifically, if any element in the input vector is TRUE, then the function will return TRUE. If no element in the input vector is TRUE in which any missing value is present, then the function will return NA. Otherwise, if the input vector contains only FALSE, then the function will return FALSE. To verify the preceding logic, just run the following code:

any(c(TRUE, FALSE, NA))
## [1] TRUE
any(c(FALSE, FALSE, NA))
## [1] NA
any(c(FALSE, FALSE))
## [1] FALSE 

To directly ignore all missing values, just specify na.rm = TRUE in the call:

any(x < -3, na.rm = TRUE)
## [1] FALSE 

A similar but somehow opposite logic applies to all():

x
## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2
all(x > -3)
## [1] FALSE
all(x >= -3)
## [1] NA
all(x < 4)
## [1] NA 

If any element in the input vector is FALSE, then the function will return FALSE. If no element in the input vector is FALSE in which any missing value is present, then the function will return NA. Otherwise, if the input vector contains only TRUE, then it will return TRUE. To verify the logic, just run the following code:

all(c(TRUE, FALSE, NA))
## [1] FALSE
all(c(TRUE, TRUE, NA))
## [1] NA
all(c(TRUE, TRUE))
## [1] TRUE 

Similarly, na.rm = TRUE forces the function to directly ignore all missing values:

all(x >= -3, na.rm = TRUE)
## [1] TRUE 

Apart from logical aggregation functions, data filtering also behaves differently when missing values involve. For example, the following code will preserve the missing values at corresponding positions of the logical vector produced by x >= 0:

x
## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2
x[x >= 0]
## [1] NA  2  3  1 NA  0  1 NA  2 

By contrast, which() does not preserve the missing values present in the input logical vector:

which(x >= 0)
## [1]  4  5  6  8  9 11 

Therefore, the vector subsetted by the indices does not contain missing values in the following case:

x[which(x >= 0)]
## [1] 2 3 1 0 1 2 

Logical coercion

Some functions that are supposed to take logical input also accept non-logical vectors such as numeric vectors. However, the behavior of the function may not be different from what they do with logical vectors. This is because the non-logical vectors are coerced to logical values.

For example, if we put a numeric vector in the if condition, it will be coerced:

if (2) 3
## [1] 3
if (0) 0 else 1
## [1] 1 

In R, all non-zero values in a numeric vector or integer vector can be coerced to TRUE, only zero values will be coerced to FALSE, and string values cannot be coerced to logical values:

if ("a") 1 else 2
## Error in if ("a") 1 else 2: argument is not interpretable as logical 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset