What You’ll Learn in This Hour:
S4 classes
Reference classes
R6 classes
Other available class systems
In Hour 21, “Writing R Classes,” you were introduced to the concept of classes, and we walked through the basic features of an S3 class in R. The S3 system provides a soft introduction to classes, allowing much of the flexibility that we have become accustomed to with R. In order to provide this flexibility, however, some of the main benefits of a more formal class system have been sacrificed. When developing S3 classes, we still need to be very careful to check that the input values are handled appropriately. Further, inheritance is not formally defined and we must be careful to write functions that allow for it.
During this hour, we look closely at two alternative class systems available in R: the S4 system and Reference Classes. Along the way, you will be introduced to new concepts such as validity checking, multiple dispatch, message-passing object orientation, and mutable objects.
The S4 system was introduced in S version 4. Like S3, the S4 system is a form of generic function object-oriented programming. However, the system is much more formal and requires that we define the class structure before instantiating objects. This makes it easier to write methods because it is not possible to pass an object with the wrong structure to an S4 method.
The S4 system also benefits from a more formal form of inheritance that is specified when we define a class. When we extend an S4 class, all of the type and structure checking from the parent class is passed on to the child, thus reducing the need for duplicate code. Finally, S4 supports something called multiple dispatch, meaning that generic functions can operate based on multiple inputs.
Instances of S4 structures are rare in the base and recommended R packages, though the structure is used in several of the additional packages available on CRAN and throughout the BioConductor package repository. There is a tendency for S4 package names to end in 4, particularly where they implement something that has already been implemented in an S3 structure. This is not strictly adhered to, however.
It is slightly easier to find information about S4 classes and methods than it is with S3. To start with, we can find out if any object is an S4 object using the function isS4
, to which we pass any R object. The isS4
function simply returns TRUE
if an object is an S4 object and FALSE
otherwise. Once we know that we have an S4 object and have ascertained the class (using the class
function), we can call upon a number of other useful functions to find out more information about the class. Table 22.1 lists three functions that can be used to find out more information about a class. The table also describes their usage, with an example of usage for the merMod
class contained within the lme4 modeling package.
If we are working with a new package, we can find out what classes it contains using the getClasses
function—for example, getClasses("package:lme4")
. The same function can also be used to list all classes currently defined within in an R session. Similarly, the getGenerics
function can be used to list all available generics within a package or an R session generally.
A list of all the methods available for a generic function may be obtained via the showMethods
function. Here’s an example:
> showMethods("tail")
Function: tail (package utils)
x="ANY"
x="Matrix"
x="sparseVector"
The methods
function you saw in Hour 21 also works with S4 classes.
Tip: Help with S4
Constructors and generics are named R functions, and we can find help in the standard way, either via the RStudio GUI or by typing ?
functionName. Unlike S3 classes, S4 classes are formally defined and can therefore be documented. We use a special syntax of the form class?
className in order to find out more about the class.
In the previous section, we stated that an S4 class must first be defined before we can instantiate objects. In other words, we cannot simply take an object and assign it a new class as we could with S3. This means that S4 classes can take longer to construct; however, the more formal definition provides us with benefits, such as the following:
Type-checking
Validity
Type-checking and validity ensure that when we define a class, objects within that class adhere to a particular structure and type. Unlike with S3, we can therefore assume that the structure is correct when we write methods for our class. This saves us from having to write additional error-handling steps within the methods and avoids duplication of code, thereby improving the maintainability of our code.
To formally define an S4 class, we use the setClass
function. The setClass
function lives in the methods package, which is loaded by default when we start R interactively. Structurally, you can think of an S4 class as being a bit like an R list, where each element of the list is an R object with its own type and structure. In S4 terminology, we refer to these elements as “slots.” The formal structure of an S4 class requires that we define the required structure for each slot—for example, integer
, numeric
, character
, matrix
, and so on. The two primary arguments to the setClass
function are therefore the name of the class and a slots
argument that defines the structure of the class. The slots
argument expects either a list or a named character vector, where the names represent the names of the slots and the data represents the object type.
Caution: Loading the methods Package
When we start R in interactive mode, the methods package is loaded by default. However, R can also be executed in batch mode via Rscript, which does not load the methods package by default. When integrating an S4 structure into your own package, you should add a dependency on the methods package.
Let’s start by looking back at the modInt
structure that we defined in Hour 21. We take the basic concept of the structure and define it instead as an S4 class named modInt4
. For any object in our class, we must store two important pieces of information: the base number and its modulus. Each of these is integer, so we specify their structure using the integer
class. Note that although modular arithmetic only works with integer values, we don’t actually need to store the data as integer
, because numeric
would suffice. However, we later use the data type to illustrate the impact of this formal definition.
> setClass("modInt4", slots=c(x = "integer", modulus = "integer"))
Caution: Change in Definition
Historically, S4 slots were defined via a representation
argument within the setClass
function. A representation
function was then used to define both the slot structure and any inheritance. Although this functionality is now deprecated, representation
is still the second argument to setClass
for compatibility reasons. The S3methods
, access
, and version
arguments are similarly deprecated. Further information is provided within the setClass
help file.
We also use the setClass
function to define inheritance, which we’ll return to later in the hour.
Once we have formally defined a class, we can begin to create objects of that class. As with S3, it is good practice to do so via a class constructor function, though again it is not necessary. To generate an S4 object, the constructor function must include a call to the new
function. The new
function does the hard work of creating a prototype object from the class definition and populating the slots with any inputs we provide. The call to new
ensures that our class has the required slots and that the information contained within each slot is of the correct type.
The first argument is the Class
argument. This tells R what class is to be instantiated. Any slot names for the class are passed via an ellipsis (...
). In the following example we create a constructor function for the modInt4
class that we previously defined. The final line contains the required call to the new
function.
> modInt4 <- function(x, modulus){
+ # Divide by the modulus to get new number appropriate for that modulus
+ x <- x %% modulus
+ # Create a new instance
+ new("modInt4", x = x, modulus = modulus)
+ }
Having defined the constructor, we are now ready to create objects of our class. The following examples demonstrate the behavior of the type checking. In the first example, we pass the non-integer pi
value and the integer 12L
. We use L
to ensure that the value is stored as integer
as opposed to numeric
.
Because pi
is non-integer, the object cannot be created, and we receive an appropriate error message. In the second example, we pass two integer values that are actually stored as numeric
in R. Again, the object cannot be created because both x
and modulus
must be of integer
type. In the final example, we pass 4L
and 12L
. Both are integers, and our object is successfully created. Note that by default the name of the class is printed along with each of the slots.
> # Try to create some objects of our class
> modInt4(pi, 12L)
Error in validObject(.Object) :
invalid class "modInt4" object: invalid object for slot "x" in class "modInt4":
got class "numeric", should be or extend class "integer"
> modInt4(4, 12)
Error in validObject(.Object) :
invalid class "modInt4" object: 1: invalid object for slot "x" in class "modInt4":
got class "numeric", should be or extend class "integer"
invalid class "modInt4" object: 2: invalid object for slot "modulus" in class "modInt4":
got class "numeric", should be or extend class "integer"
> modInt4(4L, 12L)
An object of class "modInt4"
Slot "x":
[1] 4
Slot "modulus":
[1] 12
Here we match the name of the constructor function to the name of the class as well as the names of the arguments to the names of the class slots. This is a very simple example of a class, and it makes sense to do so. However, the constructor function can take any arguments so long as the arguments that are eventually passed to new
match those we defined using setClass
. A good example of this is the lmer
function in lme4, which takes arguments such as formula
and data
, fits a linear mixed-effects model, and generates an object of class merMod
, which contains slots such as theta
and beta
.
As you have seen, the slot structure of an S4 class provides a handy mechanism for checking that the information provided is of the correct type. Occasionally we may need to provide some additional checks to ensure that an object conforms to expectations. Consider the data frame definition that we provided in Hour 21. A data frame consists of a list of vectors, but these vectors must also be of consistent length. In the S4 framework, we can provide such a check using a validity function.
A validity function is simply a function that contains all the checks we require in order to ensure that an object is of the correct structure. There are no naming restrictions on validity functions; however, it is standard practice to include the name of the class within the name. The “lowerCamelCase
” convention is most commonly used, and periods should be avoided because they can falsely imply an S3 structure.
We now define a validity function for our modInt4
class. The check ensures that the two values are positive integers and that the base number is less than the modulus. Validity functions should return TRUE
if the object is considered valid and FALSE
if any of the checks are violated. The validity function should expect an S4 object as its only argument. It is good practice to name the argument object
.
validModInt4Object <- function(object) {
# Define checks
# Note that the class definition already ensures that x and mod are integer
xNonNeg <- object@x >= 0
modulusPositive <- object@modulus > 0
xLessThanEqualToModulus <- object@x <= object@modulus
# Combine checks
isObjectValid <- xNonNeg & modulusPositive & xLessThanEqualToModulus
# Return TRUE or FALSE
isObjectValid
}
Once we have defined the check, we need to link it to our class. We do so via the setValidity
function. The setValidity
function expects two main arguments:
Class—The name of the class as a character string
method—The name of the validity function
We can now link the validModInt4Object
validity function to our modInt4
class, like so:
> setValidity("modInt4", validModInt4Object)
Class "modInt4" [in ".GlobalEnv"]
Slots:
Name: x modulus
Class: integer integer
Note: Defining Validity with setClass
In addition to setValidity
, we can use the validity
argument to the setClass
function to link the function to the class that it checks.
As with S3, the S4 framework implements generic function object orientation. In order to define a method for our class, we must first define a generic. We then link the method back to the generic and our class using the setMethod
function. Let’s look first at the setMethod
function. Table 22.2 lists the three required arguments to setMethod
, along with a description of how they are used.
As with S3, a number of generic functions are available “out of the box.” In particular, S4 objects have a default show
method, equivalent to print
in S3. We can define a new show
method to control how an object prints to screen. In the following example, we define a new show
method for the modInt4
class and then use the setMethod
function to link the method to the class and generic function:
> showModInt4 <- function(object){
+ # Extract the relevant components from the object
+ theValue <- object@x
+ theModulus <- object@modulus
+ # Print the object in the desired form
+ cat(theValue, " (mod ", theModulus, ")
", sep = "")
+ }
>
> # Link the previous function to the show generic and modInt4 class
> setMethod("show", signature = "modInt4", showModInt4)
[1] "show"
>
> # Display an object
> modInt4(3L, 12L)
3 (mod 12)
The more formal S4 framework and validity checking ensures that any object of modInt4
class is of the correct structure and that any slots are of the correct type. The show
method requires no additional checking. It is very clear and straightforward to follow.
Caution: Editing Methods
Methods must be linked to a generic and class via setMethod
. If we redefine a method, we must then call setMethod
again to relink the method to the generic and class.
In the previous example, we defined a new method for an existing generic, show
. As with S3 classes, it is also possible to define new generics. We do so via the setGeneric
function, which has two main arguments, as described in Table 22.3.
In the following example, we first define a function called square4
, an S4 equivalent of the square
function we defined in Hour 21. We then turn the function into a generic with setGeneric
.
> square4 <- function(x){
+ x^2
+ }
> setGeneric("square4")
[1] "square4"
Once the generic has been created, we can define new methods, which we link to classes via the setMethod
function:
> squareModInt4 <- function(x) {
+ # Standard square
+ simpleSquare <- as.integer(x@x^2) # Ensure value is valid
+ # Use correct modulus
+ modInt4(simpleSquare, x@modulus)
+ }
>
> # Link the modInt4 method to the square4 generic and modInt4 class
> setMethod("square4", signature = "modInt4", squareModInt4)
[1] "square4"
>
> # Test the method
> a <- modInt4(5L, 12L)
> a
5 (mod 12)
> square4(a)
1 (mod 12)
It is important to ensure that argument names match between the methods and the generic. If they don’t, this is not only bad practice, but R throws a warning to tell you that it has changed the argument name in the method to match the generic.
In the following example, we create a new generic, add
, and define what happens when we add two objects of class modInt4
. This is an example of multiple dispatch, whereby a generic function can dispatch (pick a method) based on multiple arguments. Note that although we provide two objects of the same class, the multiple dispatch mechanism could be used to define what happens when we add objects of a different class. As in the previous example, we start by defining a function, add
, and then turn it into a generic with setGeneric
.
> add <- function(a, b){
+ a + b
+ }
> setGeneric("add")
[1] "add"
The add
function we defined acts as the default method for the generic. Next, we define a method for our modInt4
object. Because the add
function requires two objects, we must be careful to define an appropriate signature to ensure that the generic dispatches correctly.
> # Define a function that adds modInt4 objects
> addModInt4Objects <- function(a, b){
+ # Sometimes we still need to define checks within the method
+ if(a@modulus != b@modulus){
+ stop("Cannot add numbers of differing modulus")
+ }
+ # Add the numbers together
+ totalNumber <- a@x + b@x
+ # Return the correct class
+ theResult <- modInt4(totalNumber, a@modulus)
+ theResult
+ }
>
> # Link the previous function to the add generic and modInt4 class
> setMethod("add", signature = c(a = "modInt4", b = "modInt4"),
+ addModInt4Objects)
[1] "add"
>
> # Test the function
> p <- modInt4(3L, 12L)
> q <- modInt4(7L, 12L)
> add(p, q)
10 (mod 12)
> add(q, q)
2 (mod 12)
You were introduced to the idea of inheritance in the previous hour. It is possible for S3 objects to inherit from one another, but as with much of S3 it is not formally defined. Inheritance is much better defined for S4 classes. We specify the inheritance when defining the class with setClass
using the contains
argument. Though the argument name may seem counterintuitive, we use contains
to specify superclasses—in other words, classes that our class inherits from.
Consider the example of the 12-hour clock and the clockTime
class we discussed in Hour 21. We define an S4 equivalent that inherits from modInt4
as follows:
> setClass("clockTime4", contains = "modInt4")
At this point, our class is exactly the same as the modInt4
class and contains slots x
and modulus
. It has also inherited all of the methods from the modInt4
class without us having to think about inheritance when defining the modInt4
methods.
> getSlots("clockTime4")
x modulus
"integer" "integer"
>
> methods(class = "clockTime4")
[1] add show
see '?methods' for accessing help and source code
In Listing 22.1 we walk through a complete example, defining the class as we did earlier and then walking through some of the possible follow-on actions. In particular, we define a constructor function (lines 5 through 10) and a validity function (lines 14 through 17) to ensure that the modulus is equal to 12. We also define the print
(show
) method (lines 31 through 36). If we felt the need, we could define any additional methods specific to our clockTime4
class.
1: > # Define the class
2: > setClass("clockTime4", contains = "modInt4")
3: >
4: > # Define constructor
5: > clockTime4 <- function(x){
6: + # Ensure that x is in mod 12
7: + x <- x %% 12L
8: + # Create a new instance
9: + new("clockTime4", x = x, modulus = 12L)
10: + }
11: >
12: > # Define validity
13: > # Existing modInt4 validity is inherited
14: > validclockTime4Object <- function(object) {
15: + isMod12 <- object@modulus == 12L
16: + isMod12
17: + }
18: >
19: > # Link the validity function with the clockTime4 class
20: > setValidity("clockTime4", validclockTime4Object)
21: Class "clockTime4" [in ".GlobalEnv"]
22:
23: Slots:
24:
25: Name: x modulus
26: Class: integer integer
27:
28: Extends: "modInt4"
29: >
30: > # Redefine show method
31: > showclockTime4 <- function(object){
32: + # Print the object in the desired form
33: + cat(object@x, ":00
", sep = "")
34: + }
35: > setMethod("show", signature = "clockTime4", showclockTime4)
36: [1] "show"
37: >
38: > # Test the class
39: > clockTime4(5L)
30: 5:00
41: > clockTime4(13L)
42: 1:00
Listing 22.1 highlights another property of S4 inheritance, which is that validity is also inherited. This significantly cuts down on the amount of checking we have to do.
The formal declaration of an S4 class requires some additional effort when it comes to documenting the class with roxygen2. The call to setClass
should be documented with a standard title and description of the class. Each slot should be documented using the @slot
tag.
#' An S4 Class that implements modular arithmetic
#'
#' @slot x An integer value in the specified code{modulus}
#' @slot modulus An integer value representing the modulus for code{x}
setClass("modInt4", slots=c(x = "integer", modulus = "integer"))
We must document S4 methods, but we have a choice as to whether to document in the class, in the generic, or separately within its own specific help file. Generally the decision as to where to document the method depends on how complicated the method is and how the method is to be used. Clearly we can only document the method via the generic if we created the generic ourselves, however.
We can control where the method is documented using either the @describeIn
tag or @rdname
tag. For example, to document the addModInt4Objects
function within the help file for the add
generic, we first create an roxygen2 header for the generic add
function and separately add a single roxygen2 header line above the function definition for addModInt4Objects
that contains a @describeIn
tag.
#' @describeIn add Adds two modInt4 objects of the same modulus
addModInt4Objects <- function(a, b){
# Sometimes we still need to define checks within the method
if(a@modulus != b@modulus){
stop("Cannot add numbers of differing modulus")
}
# Add the numbers together
totalNumber <- a@x + b@x
# Return the correct class
theResult <- modInt4(totalNumber, a@modulus)
theResult
}
Reference Classes were developed by John Chambers and have been available in the methods package since R version 2.12. Because they were the first new class implementation in R and because they followed S3 and S4, they are often referred to as “R5” classes. However, unlike with the S3 and S4 classes, the number 5 has nothing to do with the R version and is essentially meaningless.
Reference Classes are quite different from S3 and S4 and implement a much more common form of object-orientated programming known as message-passing object orientation. In message-passing object orientation, methods belong to the class and generic functions are not required. Message-passing object orientation is also used in Python, C++, and Java.
Much like S4, we begin by defining the class. We do so via the function setRefClass
. In terms of usage, the main difference between setClass
and setRefClass
is that with setRefClass
we use the term “fields” instead of “slots.” The similarity extends to inheritance, for which we use the contains
argument.
One important difference with Reference Classes is that we save the output of the setRefClass
function as an object. The object should have the same name as the class as defined by the first argument to setRefClass
. We’ll walk through Reference Classes using a variant on the modular arithmetic example that we used for S3 and S4 classes. However, message-passing object orientation is very different from generic function object orientation, and in practice message-passing object orientation is typically used to solve a different kind of problem. In particular, message-passing tends to be better suited to software development.
> modIntRef <- setRefClass("modIntRef",
+ fields=c(x = "integer", modulus = "integer"))
This is the first time we have created a class as an object. Like with any R object, we can type its name to see what it looks like and query its class.
> class(modIntRef)
[1] "refObjectGenerator"
attr(,"package")
[1] "methods"
The object that we have created is a refObjectGenerator
object. The refObjectGenerator
object is a function that generates new objects from the class. The object that it generates is an environment much like a package environment or the global environment. The subject of environments is an advanced topic, but in essence an environment is a lot like a list, and we can access elements using the $
syntax myEnvironmentName$
ObjectName. It can be very useful to think of Reference Classes and the objects we create from them as lists. We store all relevant information for the class in this list, including the fields, inheritance, and methods. There is no need for generic functions.
Caution: S4 or Reference Class?
Reference Classes are actually implemented as S4 classes with the data stored in an environment. Because the Reference Classes system is built on top of the S4 system, the isS4
function also returns TRUE
for Reference Class objects.
Defining the class effectively creates our constructor function for us. We can instantiate new modIntRef
objects using the modIntRef
function that was created by the call to setRefClass
.
> a <- modIntRef(x = 3L, modulus = 12L)
> a
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12
Because Reference Classes are based on S4 classes, we can use the new
function to generate classes directly, though the practice is generally discouraged. The new
function is also a method for our class, however, and can be invoked in the standard Reference Class manner.
> b <- modIntRef$new(x = 4L, modulus = 6L)
> b
Reference class object of class "modIntRef"
Field "x":
[1] 4
Field "modulus":
[1] 6
Tip: What Does a Reference Class Contain?
Because Reference Class objects are environments, we can use the objects
function to see what they contain. Here’s an example:
> objects(a)
[1] "copy" "field" "getClass" "modulus" "show" "x"
With Reference Classes, methods are stored as part of the object that defines the class. They can be accessed and modified using className$methods
syntax. We can also think of the methods
element itself as another list, where each element is a defined method. Because there are no generic functions, we can generally name methods in any way we like, though some methods have a special meaning (for example, initialize
).
Tip: Using setRefClass to Define Methods
Methods can also be defined directly when calling setRefClass
.
In the following sections, we look at redefining our modular arithmetic class using a Reference Class context. We briefly revisit some of the key themes we have just seen with S4 classes.
The initialize
method is the Reference Class equivalent to a constructor function. However, instead of generating an object containing the required fields (slots), we generate each field separately using a special assignment operator, <<-
. When we call the new
function, the class structure does the rest for us, ensuring that new objects of our class contain the correct fields.
Caution: The <<- Operator
The <<-
operator assigns directly to a function’s parent environment. This can make it difficult to track what a function is doing; therefore, the use of <<-
should generally be avoided.
In Listing 22.2 we create an initialize
method for our modIntRef
class based on the constructor function we defined earlier for modInt4
objects. We must explicitly create both x
and modulus
using the <<- assignment
operator, even though the modulus
argument is unaltered by the function. This is due to scoping, but it is not something we will explore any further.
1: > modIntRef$methods(list(initialize = function(x, modulus){
2: + # Create the object from the starting number, x and modulus, modulus
3: + # Divide by the modulus to get new number appropriate for that modulus
4: + # Assign fields *if* they are provided (ensures we can copy the object)
5: + if (!missing(x)) {
6: + x <<- x %% modulus
7: + }
8: + if (!missing(modulus)) {
9: + modulus <<- modulus
10: + }
11: + }))
Notice the syntax in the first line of Listing 22.2. We are updating the methods
argument to modIntRef
by defining a list. All methods are stored as a named list of method names. When creating new methods, however, we do not need to redefine old methods. Another important step here is to ensure that variables are only assigned if they are provided by the user. This enables us to create a template object if required but also enables us to copy the object later on.
Mutability is quite a common term in object-oriented programming; however, it may be unfamiliar if you come from an analytic background. Generally R is not mutable, meaning that we do not directly edit or change objects when we execute functions. Instead, we have to force R to overwrite an object. For example, suppose we define a vector, x
, that we want to sort:
> x <- c(1, 3, 2)
We can use the sort
function to sort x
, but the operation does not actually update x
:
> sort(x)
[1] 1 2 3
> x
[1] 1 3 2
To overwrite x
, we need to assign the result back to x
, like so:
> x <- sort(x)
> x
[1] 1 2 3
Because R stores values in memory, what we actually do here is copy the result to memory before overwriting x
. Reference Classes are mutable, meaning that the methods we define directly update the object. This is a behavior you briefly saw in Hour 12, “Efficient Data Handling in R,” when working with the data.table package. We referred to mutable behavior as “updating by reference.”
The fact that Reference Classes are mutable changes the way in which we think about objects. Methods are applied directly to an object in order to change it. For that reason, the application of Reference Classes usually differs from standard S3 or S4 applications. We must therefore write methods in a similar vein to the initialize
function defined in Listing 22.2 by updating fields directly.
When developing methods for a Reference Class, we are working within the class’s environment. At the time the method is called, we can be sure that all the fields we require exist and are of the correct type and structure, as defined by the initialize
function. We do not therefore need to pass field names to any methods we write. Arguments that are not available as fields in our class are passed in the standard way.
Let’s look at an example of defining and calling a method. In Listing 22.3 we define an addNumber
method that adds a number to an object of the modIntRef
class. The number is provided by the user of our function, but the x
and modulus
values that we refer to in lines 3 and 5 come from the class fields. Note that we use the double-headed assignment arrow, <<-
, to update x
in the original object. From line 8 onward, we demonstrate the mutability of the object by adding 1 and then 10 to the object, which is updated directly.
Caution: Local Variables
As with any R function, we can create temporary objects within the body of our function. These objects are removed once the function has finished executing. Due to functional scoping, you should avoid naming dummy variables after field names because the function can be confusing. If you do, R throws a warning at the point at which the method is defined.
1: > modIntRef$methods(list(addNumber = function(aNumber){
2: + # Add aNumber to x locally
3: + x <<- x + aNumber
4: + # Ensure x is correct for the modulus
5: + x <<- x %% modulus
6: + }))
7: >
8: > a <- modIntRef$new(x = 3L, modulus = 12L)
9: > a
10: Reference class object of class "modIntRef"
11: Field "x":
13: [1] 3
13: Field "modulus":
14: [1] 12
15: > a$addNumber(1L)
16: > a
17: Reference class object of class "modIntRef"
18: Field "x":
19: [1] 4
20: Field "modulus":
21: [1] 12
22: > a$addNumber(10L)
23: > a
24: Reference class object of class "modIntRef"
25: Field "x":
26: [1] 2
27: Field "modulus":
28: [1] 12
For the immutable objects we worked with in previous hours, copying an object was very straightforward. Once we have copied an object, all links between the new object and the original object are lost. For example, consider an object, y
, that we clone from another object, x
, in the following example:
> x <- 5
> y <- x
The object y
is a clone of x
, and at this point both objects have the same value, 5. However, there is no link between them. We can change the value of x
to 6, but y
still retains the value 5, as you can see here:
Mutable objects do not behave like this. Consider the object a
that we created and modified in Listing 22.3. The object has the modIntRef
class and is therefore mutable. Now let’s try to copy a
in the traditional way to create a new object, b
:
> # Remind ourselves of the value of a
> a
Reference class object of class "modIntRef"
Field "x":
[1] 2
Field "modulus":
[1] 12
> # Create b as a copy of a in the traditional way
> b <- a
> b
Reference class object of class "modIntRef"
Field "x":
[1] 2
Field "modulus":
[1] 12
Now we add 1 to a
using our addNumber
method:
> a$addNumber(1L)
> a
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12
> b
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12
The object b
has also been updated! This is updating by reference and is a property of mutable objects. It can be extremely useful, but to those unfamiliar with the concept, it is also a potentially dangerous trap. Luckily all Reference Classes inherit from a base envRefClass
object that has a copy
method. The copy
method enables us to copy in the traditional manner. Here’s an example:
> a <- modIntRef$new(x = 3L, modulus = 12L)
> b <- a$copy()
> b
Reference class object of class "modIntRef"
Field "x":
[1] 3
Field "modulus":
[1] 12
It is actually much simpler to document a Reference Class system than an S4 system. This is because methods are stored with the class as opposed to being linked via generic functions. We therefore need only document the class. A special @field
tag is used for documenting class fields.
The R6 class system was developed by Winston Chang and first released to CRAN in 2014. The name builds on the “R5” nickname given to R’s standard Reference Class implementation. The R6 implementation is essentially a variant of the Reference Class implementation that does not rely on S4 classes.
The R6 system is not part of base R. It is contained within a package called R6 that must be installed from CRAN. Once it is loaded, we can create a new instance of an R6 class by using the R6Class
function. After that, the syntax of the R6 system is extremely similar to that of R’s standard Reference Class system. We instantiate new objects using the new
method and can define an initialize
method to check inputs and construct the class.
One potential advantage of using the R6 implementation is that it contains the notion of public and private fields and methods, an object-oriented programming concept generally known as encapsulation. The terminology gets very confusing very quickly, but the basic idea is to distinguish between members (fields or methods) that are accessible from anywhere (public) and members that are only accessible from within the class itself (private).
The benefits of encapsulation are probably best described elsewhere, but the main aim is to provide control over what others have access to in your class. Because private methods are not generally available, no other classes can depend on them. This leaves you free to adjust or change the method at a later date. In contrast, a public method is one that you are happy for someone else to use and build upon.
The example in Listing 22.4 walks through a brief but complete example of creating an R6 class with public and private methods. The example contains a complete definition of the class, modInt6
, and three public methods: initialize
, show
, and square
. To illustrate the concept of private methods, a private method, adjustForModulus
, has also been defined. This method ensures that the value of x
is always less than the modulus. The method is accessed by the public square
method via private$adjustForModulus
and updates by reference when called.
One of the main differences in terms of usage between R6 and standard Reference Classes is the use of self
to refer to the object instead of the double-headed assignment arrow, <<-
.
1: > library("R6")
2: > modInt6 <- R6Class("modInt6",
3: + # Define public elements
4: + public = list(
5: + # Fields
6: + x = NA,
7: + modulus = NA,
8: + # Methods
9: + initialize = function(x, modulus){
10: + if (!missing(x)) {
12: + self$x <- x %% modulus
13: + }
14: + if (!missing(modulus)) {
15: + self$modulus <- modulus
16: + }
17: + },
18: + show = function(){
19: + cat(self$x, " (mod ", self$modulus, ")", sep = "")
20: + },
21: + square = function(){
22: + self$x <- self$x^2
23: + # Use private method to ensure x < modulus
24: + private$adjustForModulus()
25: + }
26: + ),
27: + # Define private methods
28: + private = list(
29: + # Function to ensure correct modulus
30: + adjustForModulus = function(){
31: + self$x <- self$x %% self$modulus
32: + }
33: + )
34: + )
35: > a <- modInt6$new(3L, 12L)
36: > a$show()
37: 3 (mod 12)
38: > # Now square a
39: > a$square()
40: > a$show()
41: 9 (mod 12)
There is plenty more that R6 classes can offer; however, the usage is very similar to that of standard Reference Classes.
Note: Active Bindings
The notion of active bindings is also supported in R6. Active bindings look like fields but call a function each time they are accessed.
The object-oriented programming options available in R are by no means limited to the set you have seen in the past two hours. The R.oo package has been around since 2001 and provides convenience wrappers for setting up S3 classes as well as an Object
class from which you are able to extend in order to create objects that can be modified by reference.
Another relatively popular alternative is the proto package. The proto package enables prototype programming, a form of object-oriented programming with no classes! Beyond that, there are a few more packages that implement forms of object-oriented programming, but we won’t describe them all here. No doubt more will be written in the future.
Following on from Hour 21, where we were introduced you to the concept of writing an S3 class, we have now looked in greater detail at R’s more formal class systems, S4 and Reference Classes, including a brief tour of the R6 implementation and some of the other options available. Each of the implementations has its advantages and disadvantages, and it is up to you to decide which, if any, is of most use to you. It’s worth bearing in mind, however, that R has been written in order to be flexible and fast to type. It has not been written in order to facilitate object-oriented programming!
In the “Activities” section, you now have the opportunity to build your own S4 and Reference Classes and develop methods for these classes.
A. If you’re starting out with classes, then S3 or S4 classes are a good place to start because they’re not too dissimilar from standard R coding. If you’re comfortable with the concepts of object-oriented programming, however, then one of the two forms of reference classes discussed in this hour will give you a lot more control. However, be aware that as the level of control increases, flexibility tends to be reduced.
Q. If S3 classes have the convention [genericFunction].[class], what are the S4 and Reference Class naming conventions?
A. There is no required naming convention due to the different dispatch mechanism used by setMethod
for S4 classes and the message-passing approach used in Reference Classes. The “lowerCamelCase
” naming convention is extremely popular for classes and indeed any objects in R. There is also a growing trend of using underscores to separate words within an object name.
The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.
1. True or false? An S4 object is a special type of list.
2. True or false? A Reference Class object is a special type of list.
3. What is multiple dispatch?
4. What is a mutable object?
5. What is the difference between a slot and a field?
1. False. It can be helpful to think of an S4 object as being like a list, but it is not. For one thing, we access elements using @
as opposed to $
.
2. False. A Reference Class object may appear even more like a list than an S4 object due to the $
syntax we use. However, it is actually an environment, not a list.
3. In generic function object orientation, method dispatch controls which method is selected when a generic function is called. When the dispatch mechanism can depend on multiple arguments, we call this multiple dispatch.
4. A mutable object is simply one that can be changed. In R, we typically deal with immutable objects. Instead of changing an object, we overwrite it with a new value. Reference Class objects are mutable, however.
5. We say “slots” when working with S4 classes and “fields” when working with all forms of reference class, but they essentially refer to the same thing.
1. Define a new S4 class. The aim of the class is to store simulated data from various known statistical distributions. In order to construct the new class, you need to create the following:
A constructor function that takes inputs n
and distribution
, representing the number of values to sample and the distribution to sample from. Ensure that the function has the option for other parameter arguments, as needed.
A print method that displays a table of summary statistics for the simulated data (mean, median, standard deviation, min, and max).
A new generic combine
method that enables two objects (provided they are of the same distribution) to be combined to form a new set of samples, where the total number of samples is the sum of the number of samples from the original objects.
2. Define a new Reference Class. The aim of the class is to store financial account information:
Define the class as standardAccount
. The class should have a single field, balance
, that defaults to $50 (a minimum initial deposit to set up the account).
Write methods called deposit
and withdraw
that update the account balance
field when called. The withdraw
method should not allow the balance to go into the red (that is, fall below zero).
Extend the class by creating a new class, goldAccount
. The goldAccount
class should allow an overdraft of $1,000.