Working with S4

In the previous section, we introduced the S3 system. Unlike the object-oriented systems in most other programming languages, the S3 system is much less strict than a system in which classes are defined with a fixed structure and certain method dispatch as the program compiles. When we define an S3 class, almost nothing can be sure. We can not only add or remove methods of the class at any time but also insert or delete data elements from the object as we wish. In addition, S3 only supports single dispatch, that is, methods are chosen according to the class of only one argument, mostly the first argument.

Then, R introduces a more formal and stricter object-oriented system, S4. This system allows us to define formal classes with pre-specified definition and inheritance structure. It also supports multiple dispatch, that is, methods are chosen according to the classes of multiple arguments.

In this section, you will learn how to define S4 classes and methods.

Defining S4 classes

Unlike S3 classes, which are simply represented by character vectors, S4 classes require formal definition of classes and methods. To define an S4 class, we need to call setClass and supply a representation of the class members, which are called slots. The representation is specified by the name and class of each slot. In this section, we'll redefine the product objects using an S4 class:

setClass("Product",  
  representation(name = "character",  
    price = "numeric",  
    inventory = "integer")) 

Once the class is defined, we can get the slots from its class definition by getSlots():

getSlots("Product") 
##        name       price   inventory  
## "character"   "numeric"   "integer" 

S4 is stricter than S3, not only because S4 requires class definition, but also because R will ensure that the classes of the members that create a new instance are consistent with the class representation. Now, we will use new() to create a new instance of an S4 class and specify the values of the slots:

laptop <- new("Product", name = "Laptop-A", price = 299, inventory = 100) 
## Error in validObject(.Object): invalid class "Product" object: invalid object for slot "inventory" in class "Product": got class "numeric", should be or extend class "integer" 

It might surprise you that the preceding code produces an error. If you take a closer look at the class representation, you will find that inventory must be an integer. In other words, 100 is a numeric value, which is not of class integer. It requires 100L instead:

laptop <- new("Product", name = "Laptop-A", price = 299, inventory = 100L) 
laptop 
## An object of class "Product" 
## Slot "name": 
## [1] "Laptop-A" 
##  
## Slot "price": 
## [1] 299 
##  
## Slot "inventory": 
## [1] 100 

Now, a new instance of Productlaptop, is created. It is printed as an object of class Product. The values of all slots are automatically printed.

For an S4 object, we can still use typeof() and class() to get some type information:

typeof(laptop) 
## [1] "S4" 
class(laptop) 
## [1] "Product" 
## attr(,"package") 
## [1] ".GlobalEnv" 

This time, the type is S4 instead of list or other data types, and the class is the name of the S4 class. The S4 object is also a first-class citizen in R because it has a checking function:

isS4(laptop) 
## [1] TRUE 

Unlike accessing a list or environment with $, we need to use @ to access a slot of an S4 object:

laptop@price * laptop@inventory 
## [1] 29900 

Alternatively, we can call slot() to access a slot with its name as a string. This is equivalent to accessing an element of a list or environment with double brackets ([[]]):

slot(laptop, "price") 
## [1] 299 

We can also modify an S4 object in the same way we modify a list:

laptop@price <- 289 

However, we cannot supply to a slot something that is not consistent with the class representation:

laptop@inventory <- 200 
## Error in (function (cl, name, valueClass) : assignment of an object of class "numeric" is not valid for @'inventory' in an object of class "Product"; is(value, "integer") is not TRUE 

Neither can we create a new slot just like adding a new element to a list because the structure of an S4 object is fixed to its class representation:

laptop@value <- laptop@price * laptop@inventory 
## Error in (function (cl, name, valueClass) : 'value' is not a slot in class "Product" 

Now, we will create another instance with the values of slots partially supplied:

toy <- new("Product", name = "Toys", price = 10) 
toy 
## An object of class "Product" 
## Slot "name": 
## [1] "Toys" 
##  
## Slot "price": 
## [1] 10 
##  
## Slot "inventory": 
## integer(0) 

The preceding code does not specify inventory, so the resulting object, toy, takes an empty integer vector as inventory. If you think it is not a good default value, we can specify a prototype of the class so that each instance will be created from it as a template:

setClass("Product",  
  representation(name = "character",  
    price = "numeric",  
    inventory = "integer"), 
  prototype(name = "Unnamed", price = NA_real_, inventory = 0L)) 

In the preceding prototype, we set the default value of price to be the numeric missing value and inventory to be integer zero. Note that NA is logical and cannot be used here because it is not consistent with the class representation.

Then, we will recreate toy with the same code:

toy <- new("Product", name = "Toys", price = 5) 
toy 
## An object of class "Product" 
## Slot "name": 
## [1] "Toys" 
##  
## Slot "price": 
## [1] 5 
##  
## Slot "inventory": 
## [1] 0 

This time, inventory takes the default value 0L from the prototype. However, what if we need more constraints on the input arguments? Although the classes of the arguments are checked, we can still supply values that are not meaningful as an instance of Product. For example, we can create a bottle class with negative inventory:

bottle <- new("Product", name = "Bottle", price = 1.5, inventory = -2L) 
bottle 
## An object of class "Product" 
## Slot "name": 
## [1] "Bottle" 
##  
## Slot "price": 
## [1] 1.5 
##  
## Slot "inventory": 
## [1] -2 

The following code is a validation function that ensures that the slots of a Product object are meaningful. The validation function is somehow special because when there is no error about the input object, it should return TRUE. When there are errors, it should return a character vector that describe the errors. Therefore, it is best not to use stop() or warning() when a slot is not valid.

Here, we will validate the object by checking the length of each slot and whether they are missing values. Also, the price must be positive, and the inventory must be non-negative:

validate_product <- function(object) { 
  errors <- c( 
    if (length(object@name) != 1)   
      "Length of name should be 1"  
    else if (is.na(object@name))  
      "name should not be missing value", 
     
    if (length(object@price) != 1)  
      "Length of price should be 1" 
    else if (is.na(object@price))  
      "price should not be missing value" 
    else if (object@price <= 0)  
      "price must be positive", 
     
    if (length(object@inventory) != 1)  
      "Length of inventory should be 1" 
    else if (is.na(object@inventory)) 
      "inventory should not be missing value" 
    else if (object@inventory < 0)  
      "inventory must be non-negative") 
  if (length(errors) == 0) TRUE else errors 
} 

We write a long combination of values to make up the error messages. This works because if (FALSE) expr returns NULL and c(x, NULL) returns x. At last, if no error message is produced, the function returns TRUE, otherwise it returns the error messages.

With this function defined, we can directly use it to validate bottle:

validate_product(bottle) 
## [1] "inventory must be non-negative" 

The validation results in an error message as supposed. Now, we need to make the class perform validation each time an instance is being created. We only need to specify the validity argument when we use setClass for Product class:

setClass("Product",  
  representation(name = "character",  
    price = "numeric",  
    inventory = "integer"), 
  prototype(name = "Unnamed",  
    price = NA_real_, inventory = 0L), 
  validity = validate_product) 

Then, each time we try to create an instance of the Product class, the supplied values are automatically checked. Even the prototype is checked. Here are two cases that fail the validation:

bottle <- new("Product", name = "Bottle") 
## Error in validObject(.Object): invalid class "Product" object: price should not be missing value 

The preceding code fails because the default value of price is NA_real_ in the prototype. In the validation, however, the price cannot be a missing value:

bottle <- new("Product", name = "Bottle", price = 3, inventory = -2L) 
## Error in validObject(.Object): invalid class "Product" object: inventory must be non-negative 

This fails because inventory must be a non-negative integer.

Note that the validation only occurs when we create a new instance of an S4 class. Once the object is created, however, the validation does not happen anymore. In other words, we can still set a slot to a bad value unless we explicitly validate it.

Understanding S4 inheritance

The S3 system is loose and flexible. Each S3 object of the same class may have different members. For S4, however, this cannot happen, that is, we cannot arbitrarily add a slot that is not in the class definition when we create a new instance of the class.

For example, we cannot put a volume slot when we create a new instance of Product:

bottle <- new("Product", name = "Bottle",  
  price = 3, inventory = 100L, volume = 15) 
## Error in initialize(value, ...): invalid name for slot of class "Product": volume 

Instead, we can only do this through proper inheritance. We need to create a new class that contains (or inherits from) the original class. In this case, we can define a Container class that inherits from Product and has a new numeric slot named volume:

setClass("Container",  
  representation(volume = "numeric"),  
  contains = "Product") 

Since Container inherits from Product, any instance of Container has all the slots of Product. We can use getSlots() to view them:

getSlots("Container") 
##      volume        name       price   inventory  
##   "numeric" "character"   "numeric"   "integer" 

Now, we can create an instance of Container that has a volume slot:

bottle <- new("Container", name = "Bottle", 
  price = 3, inventory = 100L, volume = 15) 

Note that the validation of Product still functions when we create an instance of Container:

bottle <- new("Container", name = "Bottle", 
  price = 3, inventory = -10L, volume = 15) 
## Error in validObject(.Object): invalid class "Container" object: inventory must be non-negative 

Therefore, the checking ensures it is a valid Product class, but it still does not check anything about Container:

bottle <- new("Container", name = "Bottle", 
  price = 3, inventory = 100L, volume = -2) 

Just like we defined a validation function for Product, we can define another for Container:

validate_container <- function(object) { 
  errors <- c( 
    if (length(object@volume) != 1) 
      "Length of volume must be 1", 
    if (object@volume <= 0) 
      "volume must be positive" 
  ) 
  if (length(errors) == 0) TRUE else errors 
} 

Then, we will redefine Container with this validation function:

setClass("Container",  
  representation(volume = "numeric"),  
  contains = "Product", 
  validity = validate_container) 

Note that we don't need to call validate_product in validate_container because both validation functions will be called in turn to make sure all classes in the inheritance chain are properly checked with their validation functions. You may add some text-printing code to the validating functions to confirm that validate_product is always called before validate_container when we create an instance of Container:

bottle <- new("Container", name = "Bottle", 
  price = 3, inventory = 100L, volume = -2) 
## Error in validObject(.Object): invalid class "Container" object: volume must be positive 
bottle <- new("Container", name = "Bottle", 
  price = 3, inventory = -5L, volume = 10) 
## Error in validObject(.Object): invalid class "Container" object: inventory must be non-negative 

Defining S4 generic functions

In the previous examples, we saw that S4 is much more formal than S3 because the S4 class requires a class definition. Likewise, S4 generic functions are more formal too.

Here is an example where we define a series of S4 classes with a simple hierarchy of inheritance relationships. The example is about shapes. First, Shape is a root class. Both Polygon and Circle inherit from Shape, while Triangle and Rectangle inherit from Polygon. The inheritance structure of these shapes is illustrated here:

Defining S4 generic functions

Each class except Shape has some necessary slots to describe itself:

setClass("Shape") 
setClass("Polygon",  
  representation(sides = "integer"),  
  contains = "Shape") 
setClass("Triangle",  
  representation(a = "numeric", b = "numeric", c = "numeric"),  
  prototype(a = 1, b = 1, c = 1, sides = 3L), 
  contains = "Polygon") 
setClass("Rectangle", 
  representation(a = "numeric", b = "numeric"), 
  prototype(a = 1, b = 1, sides = 4L), 
  contains = "Polygon") 
setClass("Circle", 
  representation(r = "numeric"), 
  prototype(r = 1, sides = Inf), 
  contains = "Shape") 

With these classes defined, we can set up a generic function to calculate the area of a Shape object. To do this, we need to call setGeneric() upon area and supply a function that calls standardGeneric("area") to make area a generic function and ready for S4 method dispatch. The valueClass is used to ensure that the return value of each method must be of class numeric:

setGeneric("area", function(object) { 
  standardGeneric("area") 
}, valueClass = "numeric") 
## [1] "area" 

Once the generic function is set up, we go on to implement different methods for different kinds of shapes. For Triangle, we use Heron's formula (https://en.wikipedia.org/wiki/Heron's_formula) to calculate its area, given the lengths of the three sides:

setMethod("area", signature("Triangle"), function(object) { 
  a <- object@a 
  b <- object@b 
  c <- object@c 
  s <- (a + b + c) / 2 
  sqrt(s * (s - a) * (s - b) * (s - c)) 
}) 
## [1] "area" 

For Rectangle and Circle, it is easy to write out the area formula for each of them:

setMethod("area", signature("Rectangle"), function(object) { 
  object@a * object@b 
}) 
## [1] "area" 
setMethod("area", signature("Circle"), function(object) { 
  pi * object@r ^ 2 
}) 
## [1] "area" 

Now, we can create an instance of Triangle and see whether area() dispatches to the correct method and returns the correct answer:

triangle <- new("Triangle", a = 3, b = 4, c = 5) 
area(triangle) 
## [1] 6 

We also create an instance of Circle and see whether method dispatch works:

circle <- new("Circle", r = 3) 
area(circle) 
## [1] 28.27433 

Both answers are correct. The area() function just works like an S3 generic function that performs method dispatch according to the class of the input object.

Understanding multiple dispatch

An S4 generic function is more flexible because it also supports multiple dispatch, that is, it can perform method dispatch according to the classes of multiple arguments.

Here, we will define another family of S4 classes: Object with a numeric height. Both Cylinder and Cone inherit from Object. Later, we will use multiple dispatch to calculate the volume of a certain type of geometric object with a certain shape of the bottom surface:

setClass("Object", representation(height = "numeric")) 
setClass("Cylinder", contains = "Object") 
setClass("Cone", contains = "Object") 

Now, we will define a new generic function named volume. As its name suggests, this function is used to calculate the volume of an object that is described by the shape of the bottom surface and the form of the object:

setGeneric("volume",  
  function(shape, object) standardGeneric("volume")) 
## [1] "volume" 

In the following code, we will implement two cases: one is for a rectangle-shaped cylinder and the other is for a rectangle-shaped cone:

setMethod("volume", signature("Rectangle", "Cylinder"),  
  function(shape, object) { 
    shape@a * shape@b * object@height 
  }) 
## [1] "volume" 
setMethod("volume", signature("Rectangle", "Cone"), 
  function(shape, object) { 
    shape@a * shape@b * object@height / 3 
  }) 
## [1] "volume" 

Note that all existing methods for volume require two arguments. Therefore, the method dispatch happens with both arguments, that is, it requires the classes of both input objects to match to choose the correct method. Now, we will test volume with an instance of Rectagle and an instance of Cylinder:

rectangle <- new("Rectangle", a = 2, b = 3) 
cylinder <- new("Cylinder", height = 3) 
volume(rectangle, cylinder) 
## [1] 18 

Since a relationship holds for a cylinder and a cone with the same height and the shape of bottom surface, the volume of the cylinder is three times that of the cone. To simplify the implementation of volume methods, we can directly put Shape in the method signature and call area() of the shape and directly use its area in the calculation:

setMethod("volume", signature("Shape", "Cylinder"),  
  function(shape, object) { 
    area(shape) * object@height 
  }) 
## [1] "volume" 
setMethod("volume", signature("Shape", "Cone"), 
  function(shape, object) { 
    area(shape) * object@height / 3 
  }) 
## [1] "volume" 

Now, volume is automatically applicable to Circle:

circle <- new("Circle", r = 2) 
cone <- new("Cone", height = 3) 
volume(circle, cone) 
## [1] 12.56637 

To make volume easier to use, we can also define a method that takes an instance of Shape and a numeric value as the height of the cylinder:

setMethod("volume", signature("Shape", "numeric"), 
  function(shape, object) { 
    area(shape) * object 
  }) 
## [1] "volume" 

Then, we can directly use numeric values in calculating the volume of the cylinder given its shape and height:

volume(rectangle, 3) 
## [1] 18 

Furthermore, we can simplify the notation by implementing a method of *:

setMethod("*", signature("Shape", "Object"),  
  function(e1, e2) { 
    volume(e1, e2) 
  }) 
## [1] "*" 

Now, we can calculate the volume by simply multiplying the shape and the object form:

rectangle * cone 
## [1] 6 

Note that an S4 object is not a list or environment, but it has copy-on-modify semantics. In this sense, when the value of a slot of an S4 object is modified by <- in a function, it behaves more like a list, that is, the S4 object is copied in the function and the original object is not modified.

For example, in the following code, we will define a function that tries to lengthen Object by multiplying its height with a numeric factor:

lengthen <- function(object, factor) { 
  object@height <- object@height * factor 
  object 
} 

When we apply this function on cylinder, which we previously created, its height is not changed at all. Instead, it is copied inside the function:

cylinder 
## An object of class "Cylinder" 
## Slot "height": 
## [1] 3 
lengthen(cylinder, 2) 
## An object of class "Cylinder" 
## Slot "height": 
## [1] 6 
cylinder 
## An object of class "Cylinder" 
## Slot "height": 
## [1] 3 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset