Hour 22. Formal Class Systems


What You’ll Learn in This Hour:

Image S4 classes

Image Reference classes

Image R6 classes

Image Other available class systems


In Hour 21, “Writing R Classes,” you were introduced to the concept of classes, and we walked through the basic features of an S3 class in R. The S3 system provides a soft introduction to classes, allowing much of the flexibility that we have become accustomed to with R. In order to provide this flexibility, however, some of the main benefits of a more formal class system have been sacrificed. When developing S3 classes, we still need to be very careful to check that the input values are handled appropriately. Further, inheritance is not formally defined and we must be careful to write functions that allow for it.

During this hour, we look closely at two alternative class systems available in R: the S4 system and Reference Classes. Along the way, you will be introduced to new concepts such as validity checking, multiple dispatch, message-passing object orientation, and mutable objects.

S4

The S4 system was introduced in S version 4. Like S3, the S4 system is a form of generic function object-oriented programming. However, the system is much more formal and requires that we define the class structure before instantiating objects. This makes it easier to write methods because it is not possible to pass an object with the wrong structure to an S4 method.

The S4 system also benefits from a more formal form of inheritance that is specified when we define a class. When we extend an S4 class, all of the type and structure checking from the parent class is passed on to the child, thus reducing the need for duplicate code. Finally, S4 supports something called multiple dispatch, meaning that generic functions can operate based on multiple inputs.

Instances of S4 structures are rare in the base and recommended R packages, though the structure is used in several of the additional packages available on CRAN and throughout the BioConductor package repository. There is a tendency for S4 package names to end in 4, particularly where they implement something that has already been implemented in an S3 structure. This is not strictly adhered to, however.

Working with S4 Classes

It is slightly easier to find information about S4 classes and methods than it is with S3. To start with, we can find out if any object is an S4 object using the function isS4, to which we pass any R object. The isS4 function simply returns TRUE if an object is an S4 object and FALSE otherwise. Once we know that we have an S4 object and have ascertained the class (using the class function), we can call upon a number of other useful functions to find out more information about the class. Table 22.1 lists three functions that can be used to find out more information about a class. The table also describes their usage, with an example of usage for the merMod class contained within the lme4 modeling package.

Image

TABLE 22.1 Querying S4 Classes

If we are working with a new package, we can find out what classes it contains using the getClasses function—for example, getClasses("package:lme4"). The same function can also be used to list all classes currently defined within in an R session. Similarly, the getGenerics function can be used to list all available generics within a package or an R session generally.

A list of all the methods available for a generic function may be obtained via the showMethods function. Here’s an example:

> showMethods("tail")
Function: tail (package utils)
x="ANY"
x="Matrix"
x="sparseVector"

The methods function you saw in Hour 21 also works with S4 classes.


Tip: Help with S4

Constructors and generics are named R functions, and we can find help in the standard way, either via the RStudio GUI or by typing ?functionName. Unlike S3 classes, S4 classes are formally defined and can therefore be documented. We use a special syntax of the form class?className in order to find out more about the class.


Defining an S4 Class

In the previous section, we stated that an S4 class must first be defined before we can instantiate objects. In other words, we cannot simply take an object and assign it a new class as we could with S3. This means that S4 classes can take longer to construct; however, the more formal definition provides us with benefits, such as the following:

Image Type-checking

Image Validity

Type-checking and validity ensure that when we define a class, objects within that class adhere to a particular structure and type. Unlike with S3, we can therefore assume that the structure is correct when we write methods for our class. This saves us from having to write additional error-handling steps within the methods and avoids duplication of code, thereby improving the maintainability of our code.

Setting the Class

To formally define an S4 class, we use the setClass function. The setClass function lives in the methods package, which is loaded by default when we start R interactively. Structurally, you can think of an S4 class as being a bit like an R list, where each element of the list is an R object with its own type and structure. In S4 terminology, we refer to these elements as “slots.” The formal structure of an S4 class requires that we define the required structure for each slot—for example, integer, numeric, character, matrix, and so on. The two primary arguments to the setClass function are therefore the name of the class and a slots argument that defines the structure of the class. The slots argument expects either a list or a named character vector, where the names represent the names of the slots and the data represents the object type.


Caution: Loading the methods Package

When we start R in interactive mode, the methods package is loaded by default. However, R can also be executed in batch mode via Rscript, which does not load the methods package by default. When integrating an S4 structure into your own package, you should add a dependency on the methods package.


Let’s start by looking back at the modInt structure that we defined in Hour 21. We take the basic concept of the structure and define it instead as an S4 class named modInt4. For any object in our class, we must store two important pieces of information: the base number and its modulus. Each of these is integer, so we specify their structure using the integer class. Note that although modular arithmetic only works with integer values, we don’t actually need to store the data as integer, because numeric would suffice. However, we later use the data type to illustrate the impact of this formal definition.

> setClass("modInt4", slots=c(x = "integer", modulus = "integer"))


Caution: Change in Definition

Historically, S4 slots were defined via a representation argument within the setClass function. A representation function was then used to define both the slot structure and any inheritance. Although this functionality is now deprecated, representation is still the second argument to setClass for compatibility reasons. The S3methods, access, and version arguments are similarly deprecated. Further information is provided within the setClass help file.


We also use the setClass function to define inheritance, which we’ll return to later in the hour.

Creating a New S4 Instance

Once we have formally defined a class, we can begin to create objects of that class. As with S3, it is good practice to do so via a class constructor function, though again it is not necessary. To generate an S4 object, the constructor function must include a call to the new function. The new function does the hard work of creating a prototype object from the class definition and populating the slots with any inputs we provide. The call to new ensures that our class has the required slots and that the information contained within each slot is of the correct type.

The first argument is the Class argument. This tells R what class is to be instantiated. Any slot names for the class are passed via an ellipsis (...). In the following example we create a constructor function for the modInt4 class that we previously defined. The final line contains the required call to the new function.

> modInt4 <- function(x, modulus){
+   # Divide by the modulus to get new number appropriate for that modulus
+   x <- x %% modulus
+   # Create a new instance
+   new("modInt4", x = x, modulus = modulus)
+ }

Having defined the constructor, we are now ready to create objects of our class. The following examples demonstrate the behavior of the type checking. In the first example, we pass the non-integer pi value and the integer 12L. We use L to ensure that the value is stored as integer as opposed to numeric.

Because pi is non-integer, the object cannot be created, and we receive an appropriate error message. In the second example, we pass two integer values that are actually stored as numeric in R. Again, the object cannot be created because both x and modulus must be of integer type. In the final example, we pass 4L and 12L. Both are integers, and our object is successfully created. Note that by default the name of the class is printed along with each of the slots.

> # Try to create some objects of our class
> modInt4(pi, 12L)
Error in validObject(.Object) :
  invalid class "modInt4" object: invalid object for slot "x" in class "modInt4":
  got class "numeric", should be or extend class "integer"

> modInt4(4, 12)
Error in validObject(.Object) :
  invalid class "modInt4" object: 1: invalid object for slot "x" in class "modInt4":
  got class "numeric", should be or extend class "integer"
invalid class "modInt4" object: 2: invalid object for slot "modulus" in class "modInt4":
  got class "numeric", should be or extend class "integer"

> modInt4(4L, 12L)
An object of class "modInt4"
Slot "x":
[1] 4

Slot "modulus":
[1] 12

Here we match the name of the constructor function to the name of the class as well as the names of the arguments to the names of the class slots. This is a very simple example of a class, and it makes sense to do so. However, the constructor function can take any arguments so long as the arguments that are eventually passed to new match those we defined using setClass. A good example of this is the lmer function in lme4, which takes arguments such as formula and data, fits a linear mixed-effects model, and generates an object of class merMod, which contains slots such as theta and beta.

Validity

As you have seen, the slot structure of an S4 class provides a handy mechanism for checking that the information provided is of the correct type. Occasionally we may need to provide some additional checks to ensure that an object conforms to expectations. Consider the data frame definition that we provided in Hour 21. A data frame consists of a list of vectors, but these vectors must also be of consistent length. In the S4 framework, we can provide such a check using a validity function.

A validity function is simply a function that contains all the checks we require in order to ensure that an object is of the correct structure. There are no naming restrictions on validity functions; however, it is standard practice to include the name of the class within the name. The “lowerCamelCase” convention is most commonly used, and periods should be avoided because they can falsely imply an S3 structure.

We now define a validity function for our modInt4 class. The check ensures that the two values are positive integers and that the base number is less than the modulus. Validity functions should return TRUE if the object is considered valid and FALSE if any of the checks are violated. The validity function should expect an S4 object as its only argument. It is good practice to name the argument object.

validModInt4Object <- function(object) {
  # Define checks
  # Note that the class definition already ensures that x and mod are integer
  xNonNeg                 <- object@x >= 0
  modulusPositive         <- object@modulus > 0
  xLessThanEqualToModulus <- object@x <= object@modulus
  # Combine checks
  isObjectValid <- xNonNeg & modulusPositive & xLessThanEqualToModulus
  # Return TRUE or FALSE
  isObjectValid
}

Once we have defined the check, we need to link it to our class. We do so via the setValidity function. The setValidity function expects two main arguments:

Image Class—The name of the class as a character string

Image method—The name of the validity function

We can now link the validModInt4Object validity function to our modInt4 class, like so:

> setValidity("modInt4", validModInt4Object)
Class "modInt4" [in ".GlobalEnv"]

Slots:

Name:        x modulus
Class: integer integer


Note: Defining Validity with setClass

In addition to setValidity, we can use the validity argument to the setClass function to link the function to the class that it checks.


Methods

As with S3, the S4 framework implements generic function object orientation. In order to define a method for our class, we must first define a generic. We then link the method back to the generic and our class using the setMethod function. Let’s look first at the setMethod function. Table 22.2 lists the three required arguments to setMethod, along with a description of how they are used.

Image

TABLE 22.2 The setMethod Function

As with S3, a number of generic functions are available “out of the box.” In particular, S4 objects have a default show method, equivalent to print in S3. We can define a new show method to control how an object prints to screen. In the following example, we define a new show method for the modInt4 class and then use the setMethod function to link the method to the class and generic function:

> showModInt4 <- function(object){
+   # Extract the relevant components from the object
+   theValue <- object@x
+   theModulus <- object@modulus
+   # Print the object in the desired form
+   cat(theValue, " (mod ", theModulus, ") ", sep = "")
+ }
>
> # Link the previous function to the show generic and modInt4 class
> setMethod("show", signature = "modInt4", showModInt4)
[1] "show"
>
> # Display an object
> modInt4(3L, 12L)
3 (mod 12)

The more formal S4 framework and validity checking ensures that any object of modInt4 class is of the correct structure and that any slots are of the correct type. The show method requires no additional checking. It is very clear and straightforward to follow.


Caution: Editing Methods

Methods must be linked to a generic and class via setMethod. If we redefine a method, we must then call setMethod again to relink the method to the generic and class.


Defining New Generics

In the previous example, we defined a new method for an existing generic, show. As with S3 classes, it is also possible to define new generics. We do so via the setGeneric function, which has two main arguments, as described in Table 22.3.

Image

TABLE 22.3 Main Arguments to setGeneric

In the following example, we first define a function called square4, an S4 equivalent of the square function we defined in Hour 21. We then turn the function into a generic with setGeneric.

> square4 <- function(x){
+   x^2
+ }
> setGeneric("square4")
[1] "square4"

Once the generic has been created, we can define new methods, which we link to classes via the setMethod function:

> squareModInt4 <- function(x) {
+    # Standard square
+    simpleSquare <- as.integer(x@x^2)   # Ensure value is valid
+    # Use correct modulus
+    modInt4(simpleSquare, x@modulus)
+ }
>
> # Link the modInt4 method to the square4 generic and modInt4 class
> setMethod("square4", signature = "modInt4", squareModInt4)
[1] "square4"
>
> # Test the method
> a <- modInt4(5L, 12L)
> a
5 (mod 12)
> square4(a)
1 (mod 12)

It is important to ensure that argument names match between the methods and the generic. If they don’t, this is not only bad practice, but R throws a warning to tell you that it has changed the argument name in the method to match the generic.

Multiple Dispatch

In the following example, we create a new generic, add, and define what happens when we add two objects of class modInt4. This is an example of multiple dispatch, whereby a generic function can dispatch (pick a method) based on multiple arguments. Note that although we provide two objects of the same class, the multiple dispatch mechanism could be used to define what happens when we add objects of a different class. As in the previous example, we start by defining a function, add, and then turn it into a generic with setGeneric.

> add <- function(a, b){
+   a + b
+ }
> setGeneric("add")
[1] "add"

The add function we defined acts as the default method for the generic. Next, we define a method for our modInt4 object. Because the add function requires two objects, we must be careful to define an appropriate signature to ensure that the generic dispatches correctly.

> # Define a function that adds modInt4 objects
> addModInt4Objects <-  function(a, b){
+   # Sometimes we still need to define checks within the method
+   if(a@modulus != b@modulus){
+     stop("Cannot add numbers of differing modulus")
+   }
+   # Add the numbers together
+   totalNumber <- a@x + b@x
+   # Return the correct class
+   theResult <- modInt4(totalNumber, a@modulus)
+   theResult
+ }
>
> # Link the previous function to the add generic and modInt4 class
> setMethod("add", signature = c(a = "modInt4", b = "modInt4"),
+           addModInt4Objects)
[1] "add"
>
> # Test the function
> p <- modInt4(3L, 12L)
> q <- modInt4(7L, 12L)
> add(p, q)
10 (mod 12)
> add(q, q)
2 (mod 12)

Inheritance

You were introduced to the idea of inheritance in the previous hour. It is possible for S3 objects to inherit from one another, but as with much of S3 it is not formally defined. Inheritance is much better defined for S4 classes. We specify the inheritance when defining the class with setClass using the contains argument. Though the argument name may seem counterintuitive, we use contains to specify superclasses—in other words, classes that our class inherits from.

Consider the example of the 12-hour clock and the clockTime class we discussed in Hour 21. We define an S4 equivalent that inherits from modInt4 as follows:

> setClass("clockTime4", contains = "modInt4")

At this point, our class is exactly the same as the modInt4 class and contains slots x and modulus. It has also inherited all of the methods from the modInt4 class without us having to think about inheritance when defining the modInt4 methods.

> getSlots("clockTime4")
        x   modulus
"integer" "integer"
>
> methods(class = "clockTime4")
[1] add  show
see '?methods' for accessing help and source code

In Listing 22.1 we walk through a complete example, defining the class as we did earlier and then walking through some of the possible follow-on actions. In particular, we define a constructor function (lines 5 through 10) and a validity function (lines 14 through 17) to ensure that the modulus is equal to 12. We also define the print (show) method (lines 31 through 36). If we felt the need, we could define any additional methods specific to our clockTime4 class.

LISTING 22.1 Building a clockTime4 Class


 1: > # Define the class
 2: > setClass("clockTime4", contains = "modInt4")
 3: >
 4: > # Define constructor
 5: > clockTime4 <- function(x){
 6: +   # Ensure that x is in mod 12
 7: +   x <- x %% 12L
 8: +   # Create a new instance
 9: +   new("clockTime4", x = x, modulus = 12L)
10: + }
11: >
12: > # Define validity
13: > # Existing modInt4 validity is inherited
14: > validclockTime4Object <- function(object) {
15: +   isMod12 <- object@modulus == 12L
16: +   isMod12
17: + }
18: >
19: > # Link the validity function with the clockTime4 class
20: > setValidity("clockTime4", validclockTime4Object)
21: Class "clockTime4" [in ".GlobalEnv"]
22:
23: Slots:
24:
25: Name:        x modulus
26: Class: integer integer
27:
28: Extends: "modInt4"
29: >
30: > # Redefine show method
31: > showclockTime4 <- function(object){
32: +   # Print the object in the desired form
33: +   cat(object@x, ":00 ", sep = "")
34: + }
35: > setMethod("show", signature = "clockTime4", showclockTime4)
36: [1] "show"
37: >
38: > # Test the class
39: > clockTime4(5L)
30: 5:00
41: > clockTime4(13L)
42: 1:00


Listing 22.1 highlights another property of S4 inheritance, which is that validity is also inherited. This significantly cuts down on the amount of checking we have to do.

Documenting S4

The formal declaration of an S4 class requires some additional effort when it comes to documenting the class with roxygen2. The call to setClass should be documented with a standard title and description of the class. Each slot should be documented using the @slot tag.

#' An S4 Class that implements modular arithmetic
#'
#' @slot x An integer value in the specified code{modulus}
#' @slot modulus An integer value representing the modulus for code{x}
setClass("modInt4", slots=c(x = "integer", modulus = "integer"))

We must document S4 methods, but we have a choice as to whether to document in the class, in the generic, or separately within its own specific help file. Generally the decision as to where to document the method depends on how complicated the method is and how the method is to be used. Clearly we can only document the method via the generic if we created the generic ourselves, however.

We can control where the method is documented using either the @describeIn tag or @rdname tag. For example, to document the addModInt4Objects function within the help file for the add generic, we first create an roxygen2 header for the generic add function and separately add a single roxygen2 header line above the function definition for addModInt4Objects that contains a @describeIn tag.

#' @describeIn add Adds two modInt4 objects of the same modulus
addModInt4Objects <-  function(a, b){
  # Sometimes we still need to define checks within the method
  if(a@modulus != b@modulus){
    stop("Cannot add numbers of differing modulus")
  }
  # Add the numbers together
  totalNumber <- a@x + b@x
  # Return the correct class
  theResult <- modInt4(totalNumber, a@modulus)
  theResult
}

Reference Classes

Reference Classes were developed by John Chambers and have been available in the methods package since R version 2.12. Because they were the first new class implementation in R and because they followed S3 and S4, they are often referred to as “R5” classes. However, unlike with the S3 and S4 classes, the number 5 has nothing to do with the R version and is essentially meaningless.

Reference Classes are quite different from S3 and S4 and implement a much more common form of object-orientated programming known as message-passing object orientation. In message-passing object orientation, methods belong to the class and generic functions are not required. Message-passing object orientation is also used in Python, C++, and Java.

Creating a New Reference Class

Much like S4, we begin by defining the class. We do so via the function setRefClass. In terms of usage, the main difference between setClass and setRefClass is that with setRefClass we use the term “fields” instead of “slots.” The similarity extends to inheritance, for which we use the contains argument.

One important difference with Reference Classes is that we save the output of the setRefClass function as an object. The object should have the same name as the class as defined by the first argument to setRefClass. We’ll walk through Reference Classes using a variant on the modular arithmetic example that we used for S3 and S4 classes. However, message-passing object orientation is very different from generic function object orientation, and in practice message-passing object orientation is typically used to solve a different kind of problem. In particular, message-passing tends to be better suited to software development.

> modIntRef <- setRefClass("modIntRef",
+                          fields=c(x = "integer", modulus = "integer"))

This is the first time we have created a class as an object. Like with any R object, we can type its name to see what it looks like and query its class.

> class(modIntRef)
[1] "refObjectGenerator"
attr(,"package")
[1] "methods"

The object that we have created is a refObjectGenerator object. The refObjectGenerator object is a function that generates new objects from the class. The object that it generates is an environment much like a package environment or the global environment. The subject of environments is an advanced topic, but in essence an environment is a lot like a list, and we can access elements using the $ syntax myEnvironmentName$ObjectName. It can be very useful to think of Reference Classes and the objects we create from them as lists. We store all relevant information for the class in this list, including the fields, inheritance, and methods. There is no need for generic functions.


Caution: S4 or Reference Class?

Reference Classes are actually implemented as S4 classes with the data stored in an environment. Because the Reference Classes system is built on top of the S4 system, the isS4 function also returns TRUE for Reference Class objects.


Defining the class effectively creates our constructor function for us. We can instantiate new modIntRef objects using the modIntRef function that was created by the call to setRefClass.

> a <- modIntRef(x = 3L, modulus = 12L)
> a
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12

Because Reference Classes are based on S4 classes, we can use the new function to generate classes directly, though the practice is generally discouraged. The new function is also a method for our class, however, and can be invoked in the standard Reference Class manner.

> b <- modIntRef$new(x = 4L, modulus = 6L)
> b
Reference class object of class "modIntRef"
Field "x":
[1] 4
Field "modulus":
[1] 6


Tip: What Does a Reference Class Contain?

Because Reference Class objects are environments, we can use the objects function to see what they contain. Here’s an example:

> objects(a)
[1] "copy"     "field"    "getClass" "modulus"  "show"     "x"


Defining Methods

With Reference Classes, methods are stored as part of the object that defines the class. They can be accessed and modified using className$methods syntax. We can also think of the methods element itself as another list, where each element is a defined method. Because there are no generic functions, we can generally name methods in any way we like, though some methods have a special meaning (for example, initialize).


Tip: Using setRefClass to Define Methods

Methods can also be defined directly when calling setRefClass.


In the following sections, we look at redefining our modular arithmetic class using a Reference Class context. We briefly revisit some of the key themes we have just seen with S4 classes.

Initialization

The initialize method is the Reference Class equivalent to a constructor function. However, instead of generating an object containing the required fields (slots), we generate each field separately using a special assignment operator, <<-. When we call the new function, the class structure does the rest for us, ensuring that new objects of our class contain the correct fields.


Caution: The <<- Operator

The <<- operator assigns directly to a function’s parent environment. This can make it difficult to track what a function is doing; therefore, the use of <<- should generally be avoided.


In Listing 22.2 we create an initialize method for our modIntRef class based on the constructor function we defined earlier for modInt4 objects. We must explicitly create both x and modulus using the <<- assignment operator, even though the modulus argument is unaltered by the function. This is due to scoping, but it is not something we will explore any further.

LISTING 22.2 Defining an initialize Method


 1: > modIntRef$methods(list(initialize = function(x, modulus){
 2: +   # Create the object from the starting number, x and modulus, modulus
 3: +   # Divide by the modulus to get new number appropriate for that modulus
 4: +   # Assign fields *if* they are provided (ensures we can copy the object)
 5: +   if (!missing(x)) {
 6: +     x <<- x %% modulus
 7: +   }
 8: +   if (!missing(modulus)) {
 9: +     modulus <<- modulus
10: +   }
11: + }))


Notice the syntax in the first line of Listing 22.2. We are updating the methods argument to modIntRef by defining a list. All methods are stored as a named list of method names. When creating new methods, however, we do not need to redefine old methods. Another important step here is to ensure that variables are only assigned if they are provided by the user. This enables us to create a template object if required but also enables us to copy the object later on.

Mutable Objects

Mutability is quite a common term in object-oriented programming; however, it may be unfamiliar if you come from an analytic background. Generally R is not mutable, meaning that we do not directly edit or change objects when we execute functions. Instead, we have to force R to overwrite an object. For example, suppose we define a vector, x, that we want to sort:

> x <- c(1, 3, 2)

We can use the sort function to sort x, but the operation does not actually update x:

> sort(x)
[1] 1 2 3
> x
[1] 1 3 2

To overwrite x, we need to assign the result back to x, like so:

> x <- sort(x)
> x
[1] 1 2 3

Because R stores values in memory, what we actually do here is copy the result to memory before overwriting x. Reference Classes are mutable, meaning that the methods we define directly update the object. This is a behavior you briefly saw in Hour 12, “Efficient Data Handling in R,” when working with the data.table package. We referred to mutable behavior as “updating by reference.”

The fact that Reference Classes are mutable changes the way in which we think about objects. Methods are applied directly to an object in order to change it. For that reason, the application of Reference Classes usually differs from standard S3 or S4 applications. We must therefore write methods in a similar vein to the initialize function defined in Listing 22.2 by updating fields directly.

Method Definition

When developing methods for a Reference Class, we are working within the class’s environment. At the time the method is called, we can be sure that all the fields we require exist and are of the correct type and structure, as defined by the initialize function. We do not therefore need to pass field names to any methods we write. Arguments that are not available as fields in our class are passed in the standard way.

Let’s look at an example of defining and calling a method. In Listing 22.3 we define an addNumber method that adds a number to an object of the modIntRef class. The number is provided by the user of our function, but the x and modulus values that we refer to in lines 3 and 5 come from the class fields. Note that we use the double-headed assignment arrow, <<-, to update x in the original object. From line 8 onward, we demonstrate the mutability of the object by adding 1 and then 10 to the object, which is updated directly.


Caution: Local Variables

As with any R function, we can create temporary objects within the body of our function. These objects are removed once the function has finished executing. Due to functional scoping, you should avoid naming dummy variables after field names because the function can be confusing. If you do, R throws a warning at the point at which the method is defined.


LISTING 22.3 Defining Methods


 1: > modIntRef$methods(list(addNumber = function(aNumber){
 2: +   # Add aNumber to x locally
 3: +   x <<- x + aNumber
 4: +   # Ensure x is correct for the modulus
 5: +   x <<- x %% modulus
 6: + }))
 7: >
 8: > a <- modIntRef$new(x = 3L, modulus = 12L)
 9: > a
10: Reference class object of class "modIntRef"
11: Field "x":
13: [1] 3
13: Field "modulus":
14: [1] 12
15: > a$addNumber(1L)
16: > a
17: Reference class object of class "modIntRef"
18: Field "x":
19: [1] 4
20: Field "modulus":
21: [1] 12
22: > a$addNumber(10L)
23: > a
24: Reference class object of class "modIntRef"
25: Field "x":
26: [1] 2
27: Field "modulus":
28: [1] 12


Copying Reference Class Objects

For the immutable objects we worked with in previous hours, copying an object was very straightforward. Once we have copied an object, all links between the new object and the original object are lost. For example, consider an object, y, that we clone from another object, x, in the following example:

> x <- 5
> y <- x

The object y is a clone of x, and at this point both objects have the same value, 5. However, there is no link between them. We can change the value of x to 6, but y still retains the value 5, as you can see here:

> x <- 6
> x
[1] 6
> y
[1] 5

Mutable objects do not behave like this. Consider the object a that we created and modified in Listing 22.3. The object has the modIntRef class and is therefore mutable. Now let’s try to copy a in the traditional way to create a new object, b:

> # Remind ourselves of the value of a
> a
Reference class object of class "modIntRef"
Field "x":
[1] 2
Field "modulus":
[1] 12
> # Create b as a copy of a in the traditional way
> b <- a
> b
Reference class object of class "modIntRef"
Field "x":
[1] 2
Field "modulus":
[1] 12

Now we add 1 to a using our addNumber method:

> a$addNumber(1L)
> a
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12
> b
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12

The object b has also been updated! This is updating by reference and is a property of mutable objects. It can be extremely useful, but to those unfamiliar with the concept, it is also a potentially dangerous trap. Luckily all Reference Classes inherit from a base envRefClass object that has a copy method. The copy method enables us to copy in the traditional manner. Here’s an example:

> a <- modIntRef$new(x = 3L, modulus = 12L)
> b <- a$copy()
> b
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12

Documenting Reference Classes

It is actually much simpler to document a Reference Class system than an S4 system. This is because methods are stored with the class as opposed to being linked via generic functions. We therefore need only document the class. A special @field tag is used for documenting class fields.

R6 Classes

The R6 class system was developed by Winston Chang and first released to CRAN in 2014. The name builds on the “R5” nickname given to R’s standard Reference Class implementation. The R6 implementation is essentially a variant of the Reference Class implementation that does not rely on S4 classes.

The R6 system is not part of base R. It is contained within a package called R6 that must be installed from CRAN. Once it is loaded, we can create a new instance of an R6 class by using the R6Class function. After that, the syntax of the R6 system is extremely similar to that of R’s standard Reference Class system. We instantiate new objects using the new method and can define an initialize method to check inputs and construct the class.

Public and Private Members

One potential advantage of using the R6 implementation is that it contains the notion of public and private fields and methods, an object-oriented programming concept generally known as encapsulation. The terminology gets very confusing very quickly, but the basic idea is to distinguish between members (fields or methods) that are accessible from anywhere (public) and members that are only accessible from within the class itself (private).

The benefits of encapsulation are probably best described elsewhere, but the main aim is to provide control over what others have access to in your class. Because private methods are not generally available, no other classes can depend on them. This leaves you free to adjust or change the method at a later date. In contrast, a public method is one that you are happy for someone else to use and build upon.

An R6 Example

The example in Listing 22.4 walks through a brief but complete example of creating an R6 class with public and private methods. The example contains a complete definition of the class, modInt6, and three public methods: initialize, show, and square. To illustrate the concept of private methods, a private method, adjustForModulus, has also been defined. This method ensures that the value of x is always less than the modulus. The method is accessed by the public square method via private$adjustForModulus and updates by reference when called.

One of the main differences in terms of usage between R6 and standard Reference Classes is the use of self to refer to the object instead of the double-headed assignment arrow, <<-.

LISTING 22.4 Defining an R6 Class


 1: > library("R6")
 2: > modInt6 <- R6Class("modInt6",
 3: +         # Define public elements
 4: +         public = list(
 5: +           # Fields
 6: +           x = NA,
 7: +           modulus = NA,
 8: +           # Methods
 9: +           initialize = function(x, modulus){
10: +             if (!missing(x)) {
12: +               self$x <- x %% modulus
13: +             }
14: +             if (!missing(modulus)) {
15: +               self$modulus <- modulus
16: +             }
17: +           },
18: +           show = function(){
19: +             cat(self$x, " (mod ", self$modulus, ")", sep = "")
20: +           },
21: +           square = function(){
22: +             self$x <- self$x^2
23: +             # Use private method to ensure x < modulus
24: +             private$adjustForModulus()
25: +           }
26: +         ),
27: +         # Define private methods
28: +         private = list(
29: +           # Function to ensure correct modulus
30: +           adjustForModulus = function(){
31: +             self$x <- self$x %% self$modulus
32: +           }
33: +         )
34: + )
35: > a <- modInt6$new(3L, 12L)
36: > a$show()
37: 3 (mod 12)
38: > # Now square a
39: > a$square()
40: > a$show()
41: 9 (mod 12)


There is plenty more that R6 classes can offer; however, the usage is very similar to that of standard Reference Classes.


Note: Active Bindings

The notion of active bindings is also supported in R6. Active bindings look like fields but call a function each time they are accessed.


Other Class Systems

The object-oriented programming options available in R are by no means limited to the set you have seen in the past two hours. The R.oo package has been around since 2001 and provides convenience wrappers for setting up S3 classes as well as an Object class from which you are able to extend in order to create objects that can be modified by reference.

Another relatively popular alternative is the proto package. The proto package enables prototype programming, a form of object-oriented programming with no classes! Beyond that, there are a few more packages that implement forms of object-oriented programming, but we won’t describe them all here. No doubt more will be written in the future.

Summary

Following on from Hour 21, where we were introduced you to the concept of writing an S3 class, we have now looked in greater detail at R’s more formal class systems, S4 and Reference Classes, including a brief tour of the R6 implementation and some of the other options available. Each of the implementations has its advantages and disadvantages, and it is up to you to decide which, if any, is of most use to you. It’s worth bearing in mind, however, that R has been written in order to be flexible and fast to type. It has not been written in order to facilitate object-oriented programming!

In the “Activities” section, you now have the opportunity to build your own S4 and Reference Classes and develop methods for these classes.

Q&A

Q. What’s best for me? S3, S4, standard Reference Classes, or R6?

A. If you’re starting out with classes, then S3 or S4 classes are a good place to start because they’re not too dissimilar from standard R coding. If you’re comfortable with the concepts of object-oriented programming, however, then one of the two forms of reference classes discussed in this hour will give you a lot more control. However, be aware that as the level of control increases, flexibility tends to be reduced.

Q. If S3 classes have the convention [genericFunction].[class], what are the S4 and Reference Class naming conventions?

A. There is no required naming convention due to the different dispatch mechanism used by setMethod for S4 classes and the message-passing approach used in Reference Classes. The “lowerCamelCase” naming convention is extremely popular for classes and indeed any objects in R. There is also a growing trend of using underscores to separate words within an object name.

Workshop

The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.

Quiz

1. True or false? An S4 object is a special type of list.

2. True or false? A Reference Class object is a special type of list.

3. What is multiple dispatch?

4. What is a mutable object?

5. What is the difference between a slot and a field?

Answers

1. False. It can be helpful to think of an S4 object as being like a list, but it is not. For one thing, we access elements using @ as opposed to $.

2. False. A Reference Class object may appear even more like a list than an S4 object due to the $ syntax we use. However, it is actually an environment, not a list.

3. In generic function object orientation, method dispatch controls which method is selected when a generic function is called. When the dispatch mechanism can depend on multiple arguments, we call this multiple dispatch.

4. A mutable object is simply one that can be changed. In R, we typically deal with immutable objects. Instead of changing an object, we overwrite it with a new value. Reference Class objects are mutable, however.

5. We say “slots” when working with S4 classes and “fields” when working with all forms of reference class, but they essentially refer to the same thing.

Activities

1. Define a new S4 class. The aim of the class is to store simulated data from various known statistical distributions. In order to construct the new class, you need to create the following:

Image A constructor function that takes inputs n and distribution, representing the number of values to sample and the distribution to sample from. Ensure that the function has the option for other parameter arguments, as needed.

Image A print method that displays a table of summary statistics for the simulated data (mean, median, standard deviation, min, and max).

Image A new generic combine method that enables two objects (provided they are of the same distribution) to be combined to form a new set of samples, where the total number of samples is the sum of the number of samples from the original objects.

2. Define a new Reference Class. The aim of the class is to store financial account information:

Image Define the class as standardAccount. The class should have a single field, balance, that defaults to $50 (a minimum initial deposit to set up the account).

Image Write methods called deposit and withdraw that update the account balance field when called. The withdraw method should not allow the balance to go into the red (that is, fall below zero).

Image Extend the class by creating a new class, goldAccount. The goldAccount class should allow an overdraft of $1,000.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset