A logical vector only takes TRUE
or FALSE
and is mostly used to filter data. In practice, it is common to create joint conditions by multiple logical vectors where a number of logical operators and functions may involve.
Like many other programming languages, R enables a few operators to do basic logical calculations. The following table demonstrates what they do:
Symbol |
Description |
Example |
Result |
|
Vectorized AND |
|
|
|
Vectorized OR |
|
|
|
Univariate AND |
|
|
|
Univariate OR |
|
|
|
Vectorized NOT |
|
|
|
Vectorized IN |
c |
|
Note that in an if
expression, &&
and ||
are often used to perform logical calculations that are only needed to yield a single-element logical vector. However, the potential risk of using &&
is that if it is made to work with multi-element vectors, it will silently ignore all but the first element of the vectors on both sides. The following example demonstrates the difference in behavior of using either &&
or &
in conditional statements.
The following code creates a test_direction
function that tells the monotonicity of supplied argument values. We'll build on this example through the next section. If the values of x
, y
, and z
are monotonically increasing, the function returns 1
; if they are monotonically decreasing, the function returns -1
. Otherwise, it returns 0
. Note that the function uses &
to perform a vectorized AND operation:
test_direction <- function(x, y, z) { if (x < y & y < z) 1 else if (x > y & y > z) -1 else 0 }
If the arguments are supplied scalar numbers, the function works perfectly:
test_direction(1, 2, 3) ## [1] 1
Note that &
performs a vectorized calculation and thus returns a multi-element vector if one argument has more than one element. However, if
only works with a single-value logical vector; otherwise, it would produce a warning:
test_direction(c(1, 2), c(2, 3), c(3, 4)) ## Warning in if (x < y & y < z) 1 else if (x > y & y > z) ## -1 else 0: the condition has length > 1 and only the first ## element will be used ## [1] 1
If we replace both &
operators present in test_direction2
with &&
and create a new function test_direction2
, the function would look as follows:
test_direction2 <- function(x, y, z) { if (x < y && y < z) 1 else if (x > y && y > z) -1 else 0 }
Then, the two example test cases may have different behaviors. For scalar input, the behavior of the two versions are exactly the same:
test_direction2(1, 2, 3) ## [1] 1
However, for multi-element input, test_direction2
silently ignores the second element of each input vector and thus does not produce any warnings:
test_direction2(c(1, 2), c(2, 3), c(3, 4)) ## [1] 1
Finally, which is the correct use, &
or &&
? It all depends on your demand. What behavior do you expect under all circumstances? What do you expect if the input is scalar values or multi-element vectors? If you expect the function to tell you whether all elements in the same position of each input vector have monotonicity, then both uses are incorrect in part and require the use of logical aggregation functions, to be introduced in the next section.
In this section, we will look at aggregating logical vectors and finding the true elements.
In addition to the binary logical operators, a few logical aggregation functions are very useful, as we mentioned earlier.
The most commonly used two logical aggregation functions are any()
and all()
. The any()
function returns TRUE
if any (for example, at least one) element of the given logical vector is TRUE
; otherwise, it will return FALSE
. The all()
function returns TRUE
if all elements of the given logical vector are TRUE
; otherwise, it will return FALSE
:
x <- c(-2, -3, 2, 3, 1, 0, 0, 1, 2) any(x > 1) ## [1] TRUE all(x <= 1) ## [1] FALSE
One common point of the two functions is that they only return a single TRUE
or FALSE
value and never return a multi-element logical vector. Therefore, to implement a function that meets the demand in the previous section, use all()
and &
together in the if
conditions:
test_all_direction <- function(x, y, z) { if (all(x < y & y < z)) 1 else if (all(x > y & y > z)) -1 else 0 }
For scalar input, test_all_direction()
behaves exactly the same with the test_direction()
and test_direction2()
functions:
test_all_direction(1, 2, 3) ## [1] 1
For vector input, the function tests whether c(1, 2, 3)
and c(2, 3, 4)
have (the same) monotonicity:
test_all_direction(c(1, 2), c(2, 3), c(3, 4)) ## [1] 1
The following code is a counterexample in which the elements at position 2
, that is, c(2, 4, 4)
, have no monotonicity:
test_all_direction(c(1, 2), c(2, 4), c(3, 4)) ## [1] 0
The value returned by the function is thus meaningful because it correctly implements the demand of testing whether all elements at each position in the three input vectors have monotonicity.
The function has several possible variations that instead uses any()
or &&
. You may try to figure out the underlying demand (what are these functions trying to do?) of each of the following versions:
test_any_direction <- function(x, y, z) { if (any(x < y & y < z)) 1 else if (any(x > y & y > z)) -1 else 0 } test_all_direction2 <- function(x, y, z) { if (all(x < y) && all(y < z)) 1 else if (all(x > y) && all(y > z)) -1 else 0 } test_any_direction2 <- function(x, y, z) { if (any(x < y) && any(y < z)) 1 else if (any(x > y) && any(y > z)) -1 else 0 }
The logical operations we introduced earlier usually return a logical vector to indicate whether a certain condition is TRUE
or FALSE
. It is also useful to know which elements satisfy those conditions. The which()
function returns the positions (or indices) of TRUE
elements in a logical vector:
x ## [1] -2 -3 2 3 1 0 0 1 2 abs(x) >= 1.5 ## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE which(abs(x) >= 1.5) ## [1] 1 2 3 4 9
If we take a closer look at what happens, it should be clear that at first, abs(x) >= 1.5
is evaluated to be a logical vector, and then, which()
returns the positions of those TRUE
elements in that logical vector.
The mechanism is quite similar when we use a logical condition to filter elements from a vector or list:
x[x >= 1.5] ## [1] 2 3 2
In the preceding example, x >= 1.5
is evaluated to be a logical vector. Then, it is used to select elements in x
corresponding to TRUE
values.
A special case is that we can even use a logical vector with all FALsE
values. A zero-length numeric vector is returned since the logical vector only contains FALSE
values, and thus, no element in x
is singled out:
x[x >= 100] ## numeric(0)
Real-world data often contains missing values represented by NA
. The following numeric vector is a simple example:
x <- c(-2, -3, NA, 2, 3, 1, NA, 0, 1, NA, 2)
Arithmetic calculations with missing values also produce missing values:
x + 2 ## [1] 0 -1 NA 4 5 3 NA 2 3 NA 4
To take this into account, a logical vector has to accept not only TRUE
and FALSE
values but also NA
values to represent unknown truthfulness:
x > 2 ## [1] FALSE FALSE NA FALSE TRUE FALSE NA FALSE FALSE ## [10] NA FALSE
As a consequence, logical aggregation functions such as any()
and all()
have to deal with missing values too:
x ## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2 any(x > 2) ## [1] TRUE any(x < -2) ## [1] TRUE any(x < -3) ## [1] NA
The preceding output demonstrates the default behavior of any()
when it deals with a logical vector that contains missing values. More specifically, if any element in the input vector is TRUE
, then the function will return TRUE
. If no element in the input vector is TRUE
in which any missing value is present, then the function will return NA
. Otherwise, if the input vector contains only FALSE
, then the function will return FALSE
. To verify the preceding logic, just run the following code:
any(c(TRUE, FALSE, NA)) ## [1] TRUE any(c(FALSE, FALSE, NA)) ## [1] NA any(c(FALSE, FALSE)) ## [1] FALSE
To directly ignore all missing values, just specify na.rm = TRUE
in the call:
any(x < -3, na.rm = TRUE) ## [1] FALSE
A similar but somehow opposite logic applies to all()
:
x ## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2 all(x > -3) ## [1] FALSE all(x >= -3) ## [1] NA all(x < 4) ## [1] NA
If any element in the input vector is FALSE
, then the function will return FALSE
. If no element in the input vector is FALSE
in which any missing value is present, then the function will return NA
. Otherwise, if the input vector contains only TRUE
, then it will return TRUE
. To verify the logic, just run the following code:
all(c(TRUE, FALSE, NA)) ## [1] FALSE all(c(TRUE, TRUE, NA)) ## [1] NA all(c(TRUE, TRUE)) ## [1] TRUE
Similarly, na.rm = TRUE
forces the function to directly ignore all missing values:
all(x >= -3, na.rm = TRUE) ## [1] TRUE
Apart from logical aggregation functions, data filtering also behaves differently when missing values involve. For example, the following code will preserve the missing values at corresponding positions of the logical vector produced by x >= 0
:
x ## [1] -2 -3 NA 2 3 1 NA 0 1 NA 2 x[x >= 0] ## [1] NA 2 3 1 NA 0 1 NA 2
By contrast, which()
does not preserve the missing values present in the input logical vector:
which(x >= 0) ## [1] 4 5 6 8 9 11
Therefore, the vector subsetted by the indices does not contain missing values in the following case:
x[which(x >= 0)] ## [1] 2 3 1 0 1 2
Some functions that are supposed to take logical input also accept non-logical vectors such as numeric vectors. However, the behavior of the function may not be different from what they do with logical vectors. This is because the non-logical vectors are coerced to logical values.
For example, if we put a numeric vector in the if
condition, it will be coerced:
if (2) 3 ## [1] 3 if (0) 0 else 1 ## [1] 1
In R, all non-zero values in a numeric vector or integer vector can be coerced to TRUE
, only zero values will be coerced to FALSE
, and string values cannot be coerced to logical values:
if ("a") 1 else 2 ## Error in if ("a") 1 else 2: argument is not interpretable as logical