Understanding lexical scoping

In the previous section, we introduced the copy-on-modify mechanism. The examples demonstrated two cases in which this mechanism happens. When an object has multiple names or is passed as an argument to a function, modifying it will cause the object to be copied, and it is the copied version that is actually modified.

To modify an object outside a function, we introduced the use of <<-, which finds the variable outside the function first and modifies that object instead of copying one locally. This leads to an important idea that a function has inside and outside. Inside a function, we can somehow refer to variables and functions outside.

For example, the following function uses two outside variables:

start_num <- 1
end_num <- 10
fun1 <- function(x) {
  c(start_num, x, end_num)
} 

We first create two variables and define a function called fun1. The function simply puts together start_num, argument x, and end_num into a new vector. It is clear that start_num and end_num are not defined in the function but outside it while x is the argument of the function. Let's see if it works:

fun1(c(4, 5, 6))
## [1]  1  4  5  6 10 

The function works by successfully getting the value of the two variables outside the function. You may guess that when we define the function, the values are captured so that start_num and end_num in fun1 just take the values from outside. In fact, two experiments can be performed to prove it wrong.

The first experiment is simple. Let's remove the two variables:

rm(start_num, end_num)
fun1(c(4, 5, 6))
## Error in fun1(c(4, 5, 6)): object 'start_num' not found 

Then, the function does not work anymore. If the values of the two variables are captured when the function is defined, the removal of them should not paralyze the function.

The second experiment is the other way around. Let's remove the function as well as the two variables. We will first define the function:

rm(fun1, start_num, end_num)
## Warning in rm(fun1, start_num, end_num): object 'start_num'
## not found
## Warning in rm(fun1, start_num, end_num): object 'end_num'
## not found
fun1 <- function(x) {
c(start_num, x, end_num)
} 

If the creation of the function has to capture the two variables that are not present in it, the preceding code should result in an error saying start_num and end_num are missing. Clearly, there's no error, and the function is successfully created. Let's call it now:

fun1(c(4, 5, 6))
## Error in fun1(c(4, 5, 6)): object 'start_num' not found 

The function does not work because the two variables are not found. We will then define the two variables and again call the function with the same argument:

start_num <- 1
end_num <- 10
fun1(c(4, 5, 6))
## [1]  1  4  5  6 10 

The function works again. This leads to the conclusion that the function actually tries to look for the variables when it is called. Actually, during the execution of the function, when a symbol is encountered, it will first look for it inside the function. More specifically, if the symbol is passed in as an argument or created inside the function, the symbol will be resolved and its value is used.

Suppose we create a variable p first and then define a function fun2 in which another p variable is created and used in the value to return:

p <- 0
fun2 <- function(x) {
  p <- 1
  x + p
} 

When we call the function, which p will fun2 use in x + p? Let's find out;

fun2(1)
## [1] 2 

The output makes it clear that x + p uses p defined inside the function. The flow is simple. First, p <- 1 creates a new variable p with value 1 instead of changing p outside the function. Then, x + p is evaluated, with x being resolved as the passed-in argument and p as the local variable just defined. The rule is that only if a variable is not present inside the function will it be searched for outside.

However, what exactly does "outside" mean? The question is subtler than it appears to be. Suppose we create the following two functions:

f1 <- function(x) {
  x + p
}
g1 <- function(x) {
  p <- 1
  f1(x)
} 

The first function f1 simply adds two variables: x is an argument and p is a variable yet to be found outside. The second function g1 defines a p variable inside and calls f1. The question is, "Will f1 find p inside g1 when g1 is called?"

g1(0)
## [1] 0 

Unfortunately, f1 cannot find p inside g1 even though f1 is called in g1. If we define p and then call g1 again, the function works:

p <- 1
g1(0)
## [1] 1 

What made g1 work is that when f1 is called and p cannot be found inside f1, it will search where f1 is defined instead of where it is called. This mechanism is called lexical scoping. In the preceding code, we defined p in the same scope where f1 is defined. Then, f1 can find p when it is called inside g1.

The same scoping rule also applies to how <<- finds variables. For example, the following code defines a variable m and two functions, f2 and g2, in the same scope. In f2m is set to 2. However, in g2, a local m variable is defined and then f2 is called:

m <- 1
f2 <- function(x) {
  m <<- 2
  x
}
g2 <- function(x) {
  m <- 1
  f2(x)
  cat(sprintf("[g2] m: %d
", m))
} 

As soon as f2 is called, the value of m in g2 is printed. Let's call g2 and see what happens:

g2(1)
## [g2] m: 1 

The printed text shows that the value of m in g2 remains unchanged, but the value of m outside f2 and g2 is changed as can be verified:

m
## [1] 2 

The preceding experiment confirms that m <<- 2 follows the rule of lexical scoping.

The following two examples look even more complex. The functions are nested. In f, we not only create local variables such as p and q but also a local function f2, in which another local p variable is defined:

f <- function(x) {
  p <- 1
  q <- 2
  cat(sprintf("1. [f1] p: %d, q: %d
", p, q))
  f2 <- function(x) {
    p <- 3
    cat(sprintf("2. [f2] p: %d, q: %d
", p, q))
    c(x = x, p = p, q = q)
  }
  cat(sprintf("3. [f1] p: %d, q: %d
", p, q))
  f2(x)
} 

If you understand lexical scoping, you should be able to predict the result given an arbitrary input x. We add some cat() functions to make it easier to track the values of variables in each scope level. The cat() message includes an order, the function scope, and the values of p and q. Now, we will run f(0) and you may predict the result:

f(0)
## 1. [f1] p: 1, q: 2
## 3. [f1] p: 1, q: 2
## 2. [f2] p: 3, q: 2
## x p q
## 0 3 2 

The executing order of the three cat() functions are 13, and 2, and the values of p and q in each scope are consistent with lexical scoping rules. In the following example, we will also use <<-:

g <- function(x) {
  p <- 1
  q <- 2
  cat(sprintf("1. [f1] p: %d, q: %d
", p, q))
  g2 <- function(x) {
    p <<- 3
    p <- 2
    cat(sprintf("2. [f2] p: %d, q: %d
", p, q))
    c(x = x, p = p, q = q)
  }
  cat(sprintf("3. [f1] p: %d, q: %d
", p, q))
  result <- g2(x)
  cat(sprintf("4. [f1] p: %d, q: %d
", p, q))
  result
} 

You may analyze the flow of the function by predicting the order of execution and the values of the printed variables:

g(0)
## 1. [f1] p: 1, q: 2
## 3. [f1] p: 1, q: 2
## 2. [f2] p: 2, q: 2
## 4. [f1] p: 3, q: 2
## x p q
## 0 2 2 

If you do not succeed in predicting the behavior of the preceding function, go through the examples in this section more carefully.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset