In the previous section, we introduced the S3 system. Unlike the object-oriented systems in most other programming languages, the S3 system is much less strict than a system in which classes are defined with a fixed structure and certain method dispatch as the program compiles. When we define an S3 class, almost nothing can be sure. We can not only add or remove methods of the class at any time but also insert or delete data elements from the object as we wish. In addition, S3 only supports single dispatch, that is, methods are chosen according to the class of only one argument, mostly the first argument.
Then, R introduces a more formal and stricter object-oriented system, S4. This system allows us to define formal classes with pre-specified definition and inheritance structure. It also supports multiple dispatch, that is, methods are chosen according to the classes of multiple arguments.
In this section, you will learn how to define S4 classes and methods.
Unlike S3 classes, which are simply represented by character vectors, S4 classes require formal definition of classes and methods. To define an S4 class, we need to call setClass
and supply a representation of the class members, which are called slots. The representation is specified by the name and class of each slot. In this section, we'll redefine the product objects using an S4 class:
setClass("Product", representation(name = "character", price = "numeric", inventory = "integer"))
Once the class is defined, we can get the slots from its class definition by getSlots()
:
getSlots("Product") ## name price inventory ## "character" "numeric" "integer"
S4 is stricter than S3, not only because S4 requires class definition, but also because R will ensure that the classes of the members that create a new instance are consistent with the class representation. Now, we will use new()
to create a new instance of an S4 class and specify the values of the slots:
laptop <- new("Product", name = "Laptop-A", price = 299, inventory = 100) ## Error in validObject(.Object): invalid class "Product" object: invalid object for slot "inventory" in class "Product": got class "numeric", should be or extend class "integer"
It might surprise you that the preceding code produces an error. If you take a closer look at the class representation, you will find that inventory
must be an integer. In other words, 100
is a numeric value, which is not of class integer
. It requires 100L
instead:
laptop <- new("Product", name = "Laptop-A", price = 299, inventory = 100L) laptop ## An object of class "Product" ## Slot "name": ## [1] "Laptop-A" ## ## Slot "price": ## [1] 299 ## ## Slot "inventory": ## [1] 100
Now, a new instance of Product
, laptop
, is created. It is printed as an object of class Product
. The values of all slots are automatically printed.
For an S4 object, we can still use typeof()
and class()
to get some type information:
typeof(laptop) ## [1] "S4" class(laptop) ## [1] "Product" ## attr(,"package") ## [1] ".GlobalEnv"
This time, the type is S4
instead of list
or other data types, and the class is the name of the S4 class. The S4 object is also a first-class citizen in R because it has a checking function:
isS4(laptop) ## [1] TRUE
Unlike accessing a list or environment with $
, we need to use @
to access a slot of an S4 object:
laptop@price * laptop@inventory ## [1] 29900
Alternatively, we can call slot()
to access a slot with its name as a string. This is equivalent to accessing an element of a list or environment with double brackets ([[]]
):
slot(laptop, "price") ## [1] 299
We can also modify an S4 object in the same way we modify a list:
laptop@price <- 289
However, we cannot supply to a slot something that is not consistent with the class representation:
laptop@inventory <- 200 ## Error in (function (cl, name, valueClass) : assignment of an object of class "numeric" is not valid for @'inventory' in an object of class "Product"; is(value, "integer") is not TRUE
Neither can we create a new slot just like adding a new element to a list because the structure of an S4 object is fixed to its class representation:
laptop@value <- laptop@price * laptop@inventory ## Error in (function (cl, name, valueClass) : 'value' is not a slot in class "Product"
Now, we will create another instance with the values of slots partially supplied:
toy <- new("Product", name = "Toys", price = 10) toy ## An object of class "Product" ## Slot "name": ## [1] "Toys" ## ## Slot "price": ## [1] 10 ## ## Slot "inventory": ## integer(0)
The preceding code does not specify inventory
, so the resulting object, toy
, takes an empty integer vector as inventory
. If you think it is not a good default value, we can specify a prototype of the class so that each instance will be created from it as a template:
setClass("Product", representation(name = "character", price = "numeric", inventory = "integer"), prototype(name = "Unnamed", price = NA_real_, inventory = 0L))
In the preceding prototype, we set the default value of price
to be the numeric missing value and inventory to be integer zero. Note that NA
is logical and cannot be used here because it is not consistent with the class representation.
Then, we will recreate toy
with the same code:
toy <- new("Product", name = "Toys", price = 5) toy ## An object of class "Product" ## Slot "name": ## [1] "Toys" ## ## Slot "price": ## [1] 5 ## ## Slot "inventory": ## [1] 0
This time, inventory
takes the default value 0L
from the prototype. However, what if we need more constraints on the input arguments? Although the classes of the arguments are checked, we can still supply values that are not meaningful as an instance of Product
. For example, we can create a bottle
class with negative inventory:
bottle <- new("Product", name = "Bottle", price = 1.5, inventory = -2L) bottle ## An object of class "Product" ## Slot "name": ## [1] "Bottle" ## ## Slot "price": ## [1] 1.5 ## ## Slot "inventory": ## [1] -2
The following code is a validation function that ensures that the slots of a Product
object are meaningful. The validation function is somehow special because when there is no error about the input object, it should return TRUE
. When there are errors, it should return a character vector that describe the errors. Therefore, it is best not to use stop()
or warning()
when a slot is not valid.
Here, we will validate the object by checking the length of each slot and whether they are missing values. Also, the price must be positive, and the inventory must be non-negative:
validate_product <- function(object) { errors <- c( if (length(object@name) != 1) "Length of name should be 1" else if (is.na(object@name)) "name should not be missing value", if (length(object@price) != 1) "Length of price should be 1" else if (is.na(object@price)) "price should not be missing value" else if (object@price <= 0) "price must be positive", if (length(object@inventory) != 1) "Length of inventory should be 1" else if (is.na(object@inventory)) "inventory should not be missing value" else if (object@inventory < 0) "inventory must be non-negative") if (length(errors) == 0) TRUE else errors }
We write a long combination of values to make up the error messages. This works because if (FALSE) expr
returns NULL
and c(x, NULL)
returns x
. At last, if no error message is produced, the function returns TRUE
, otherwise it returns the error messages.
With this function defined, we can directly use it to validate bottle
:
validate_product(bottle) ## [1] "inventory must be non-negative"
The validation results in an error message as supposed. Now, we need to make the class perform validation each time an instance is being created. We only need to specify the validity
argument when we use setClass
for Product
class:
setClass("Product", representation(name = "character", price = "numeric", inventory = "integer"), prototype(name = "Unnamed", price = NA_real_, inventory = 0L), validity = validate_product)
Then, each time we try to create an instance of the Product
class, the supplied values are automatically checked. Even the prototype is checked. Here are two cases that fail the validation:
bottle <- new("Product", name = "Bottle") ## Error in validObject(.Object): invalid class "Product" object: price should not be missing value
The preceding code fails because the default value of price
is NA_real_
in the prototype. In the validation, however, the price cannot be a missing value:
bottle <- new("Product", name = "Bottle", price = 3, inventory = -2L) ## Error in validObject(.Object): invalid class "Product" object: inventory must be non-negative
This fails because inventory
must be a non-negative integer.
Note that the validation only occurs when we create a new instance of an S4 class. Once the object is created, however, the validation does not happen anymore. In other words, we can still set a slot to a bad value unless we explicitly validate it.
The S3 system is loose and flexible. Each S3 object of the same class may have different members. For S4, however, this cannot happen, that is, we cannot arbitrarily add a slot that is not in the class definition when we create a new instance of the class.
For example, we cannot put a volume
slot when we create a new instance of Product
:
bottle <- new("Product", name = "Bottle", price = 3, inventory = 100L, volume = 15) ## Error in initialize(value, ...): invalid name for slot of class "Product": volume
Instead, we can only do this through proper inheritance. We need to create a new class that contains (or inherits from) the original class. In this case, we can define a Container
class that inherits from Product
and has a new numeric slot named volume
:
setClass("Container", representation(volume = "numeric"), contains = "Product")
Since Container
inherits from Product
, any instance of Container
has all the slots of Product
. We can use getSlots()
to view them:
getSlots("Container") ## volume name price inventory ## "numeric" "character" "numeric" "integer"
Now, we can create an instance of Container
that has a volume
slot:
bottle <- new("Container", name = "Bottle", price = 3, inventory = 100L, volume = 15)
Note that the validation of Product
still functions when we create an instance of Container
:
bottle <- new("Container", name = "Bottle", price = 3, inventory = -10L, volume = 15) ## Error in validObject(.Object): invalid class "Container" object: inventory must be non-negative
Therefore, the checking ensures it is a valid Product
class, but it still does not check anything about Container
:
bottle <- new("Container", name = "Bottle", price = 3, inventory = 100L, volume = -2)
Just like we defined a validation function for Product
, we can define another for Container
:
validate_container <- function(object) { errors <- c( if (length(object@volume) != 1) "Length of volume must be 1", if (object@volume <= 0) "volume must be positive" ) if (length(errors) == 0) TRUE else errors }
Then, we will redefine Container
with this validation function:
setClass("Container", representation(volume = "numeric"), contains = "Product", validity = validate_container)
Note that we don't need to call validate_product
in validate_container
because both validation functions will be called in turn to make sure all classes in the inheritance chain are properly checked with their validation functions. You may add some text-printing code to the validating functions to confirm that validate_product
is always called before validate_container
when we create an instance of Container
:
bottle <- new("Container", name = "Bottle", price = 3, inventory = 100L, volume = -2) ## Error in validObject(.Object): invalid class "Container" object: volume must be positive bottle <- new("Container", name = "Bottle", price = 3, inventory = -5L, volume = 10) ## Error in validObject(.Object): invalid class "Container" object: inventory must be non-negative
In the previous examples, we saw that S4 is much more formal than S3 because the S4 class requires a class definition. Likewise, S4 generic functions are more formal too.
Here is an example where we define a series of S4 classes with a simple hierarchy of inheritance relationships. The example is about shapes. First, Shape
is a root class. Both Polygon
and Circle
inherit from Shape
, while Triangle
and Rectangle
inherit from Polygon
. The inheritance structure of these shapes is illustrated here:
Each class except Shape
has some necessary slots to describe itself:
setClass("Shape") setClass("Polygon", representation(sides = "integer"), contains = "Shape") setClass("Triangle", representation(a = "numeric", b = "numeric", c = "numeric"), prototype(a = 1, b = 1, c = 1, sides = 3L), contains = "Polygon") setClass("Rectangle", representation(a = "numeric", b = "numeric"), prototype(a = 1, b = 1, sides = 4L), contains = "Polygon") setClass("Circle", representation(r = "numeric"), prototype(r = 1, sides = Inf), contains = "Shape")
With these classes defined, we can set up a generic function to calculate the area of a Shape
object. To do this, we need to call setGeneric()
upon area
and supply a function that calls standardGeneric("area")
to make area
a generic function and ready for S4 method dispatch. The valueClass
is used to ensure that the return value of each method must be of class numeric
:
setGeneric("area", function(object) { standardGeneric("area") }, valueClass = "numeric") ## [1] "area"
Once the generic function is set up, we go on to implement different methods for different kinds of shapes. For Triangle
, we use Heron's formula (https://en.wikipedia.org/wiki/Heron's_formula) to calculate its area, given the lengths of the three sides:
setMethod("area", signature("Triangle"), function(object) { a <- object@a b <- object@b c <- object@c s <- (a + b + c) / 2 sqrt(s * (s - a) * (s - b) * (s - c)) }) ## [1] "area"
For Rectangle
and Circle
, it is easy to write out the area formula for each of them:
setMethod("area", signature("Rectangle"), function(object) { object@a * object@b }) ## [1] "area" setMethod("area", signature("Circle"), function(object) { pi * object@r ^ 2 }) ## [1] "area"
Now, we can create an instance of Triangle
and see whether area()
dispatches to the correct method and returns the correct answer:
triangle <- new("Triangle", a = 3, b = 4, c = 5) area(triangle) ## [1] 6
We also create an instance of Circle
and see whether method dispatch works:
circle <- new("Circle", r = 3) area(circle) ## [1] 28.27433
Both answers are correct. The area()
function just works like an S3 generic function that performs method dispatch according to the class of the input object.
An S4 generic function is more flexible because it also supports multiple dispatch, that is, it can perform method dispatch according to the classes of multiple arguments.
Here, we will define another family of S4 classes: Object
with a numeric height
. Both Cylinder
and Cone
inherit from Object
. Later, we will use multiple dispatch to calculate the volume of a certain type of geometric object with a certain shape of the bottom surface:
setClass("Object", representation(height = "numeric")) setClass("Cylinder", contains = "Object") setClass("Cone", contains = "Object")
Now, we will define a new generic function named volume
. As its name suggests, this function is used to calculate the volume of an object that is described by the shape of the bottom surface and the form of the object:
setGeneric("volume", function(shape, object) standardGeneric("volume")) ## [1] "volume"
In the following code, we will implement two cases: one is for a rectangle-shaped cylinder and the other is for a rectangle-shaped cone:
setMethod("volume", signature("Rectangle", "Cylinder"), function(shape, object) { shape@a * shape@b * object@height }) ## [1] "volume" setMethod("volume", signature("Rectangle", "Cone"), function(shape, object) { shape@a * shape@b * object@height / 3 }) ## [1] "volume"
Note that all existing methods for volume
require two arguments. Therefore, the method dispatch happens with both arguments, that is, it requires the classes of both input objects to match to choose the correct method. Now, we will test volume
with an instance of Rectagle
and an instance of Cylinder
:
rectangle <- new("Rectangle", a = 2, b = 3) cylinder <- new("Cylinder", height = 3) volume(rectangle, cylinder) ## [1] 18
Since a relationship holds for a cylinder and a cone with the same height and the shape of bottom surface, the volume of the cylinder is three times that of the cone. To simplify the implementation of volume
methods, we can directly put Shape
in the method signature and call area()
of the shape and directly use its area in the calculation:
setMethod("volume", signature("Shape", "Cylinder"), function(shape, object) { area(shape) * object@height }) ## [1] "volume" setMethod("volume", signature("Shape", "Cone"), function(shape, object) { area(shape) * object@height / 3 }) ## [1] "volume"
Now, volume
is automatically applicable to Circle
:
circle <- new("Circle", r = 2) cone <- new("Cone", height = 3) volume(circle, cone) ## [1] 12.56637
To make volume
easier to use, we can also define a method that takes an instance of Shape
and a numeric value as the height of the cylinder:
setMethod("volume", signature("Shape", "numeric"), function(shape, object) { area(shape) * object }) ## [1] "volume"
Then, we can directly use numeric values in calculating the volume of the cylinder given its shape and height:
volume(rectangle, 3) ## [1] 18
Furthermore, we can simplify the notation by implementing a method of *
:
setMethod("*", signature("Shape", "Object"), function(e1, e2) { volume(e1, e2) }) ## [1] "*"
Now, we can calculate the volume by simply multiplying the shape and the object form:
rectangle * cone ## [1] 6
Note that an S4 object is not a list or environment, but it has copy-on-modify semantics. In this sense, when the value of a slot of an S4 object is modified by <-
in a function, it behaves more like a list, that is, the S4 object is copied in the function and the original object is not modified.
For example, in the following code, we will define a function that tries to lengthen Object
by multiplying its height with a numeric factor:
lengthen <- function(object, factor) { object@height <- object@height * factor object }
When we apply this function on cylinder
, which we previously created, its height is not changed at all. Instead, it is copied inside the function:
cylinder ## An object of class "Cylinder" ## Slot "height": ## [1] 3 lengthen(cylinder, 2) ## An object of class "Cylinder" ## Slot "height": ## [1] 6 cylinder ## An object of class "Cylinder" ## Slot "height": ## [1] 3