Object Oriented Programming in R (Part 3): A Practical Guide to the S4 System

This article was first published on Appsilon | Enterprise R Shiny Dashboards , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In the previous article, we learned about the first OOP system in R called S3. In this one, we are going to dive into the S4 OOP system.

The S4 system is a more formal OOP system developed by Bell Labs and introduced into the S language in the late 1990s.

Today, we will learn about features of S4 and look at example use cases of this system in the community. We will also learn about some recommended practices to consider when using S4 classes and cover general tips on object-oriented programming in R.

Read the series from the beginning:

Table of Contents

Object-Oriented Programming in R – Our First S4 Class and Method

We will reuse our examples from our OOP in R with R6 – The complete guide article. Let’s start by defining a function which will create objects of the dog class:

Defining an S4 Class

Let’s try to recreate the dog class from the previous article but this time using S4. To create a new S4 class we use the setClass function:

setClass(
  Class = "Dog",
  slots = c(
    name = "character",
    age = "numeric"
  )
)

The Class argument defines the name of the class to which we will be referring later on and the slots parameter defines the fields of our class.

Pay attention that the values of the named vectors correspond to the type of that value. The types of those slots will be validated when creating a new object or when changing the value of a given field.

Let’s see this in action! We will try to create a couple of dog objects using the new function!

> d1 <- new("Dog", name = "Milo", age = 4) > d2 <- new("Dog", name = "Milo", age = "four years old")
Error in validObject(.Object) : 
  invalid class “Dog” object: invalid object for slot "age" in class "Dog": got class "character", should be or extend class "numeric"

As you can see, when we tried to use a character value for the age slot, we got an error.

This is our first example of how S4 is more rigorous compared to S3. In the case of the S3 system we had to manually validate the arguments in our constructors.

Using S4 Slots

We can interact with our object using @ to retrieve fields of the object:

> d1@name
[1] "Milo"
> d1@age
[1] 4

And if we try to change the value of the fields, validation will be performed as well:

> d1@age <- "four years old" Error in (function (cl, name, valueClass) : assignment of an object of class “character” is not valid for @‘age’ in an object of class “Dog”; is(value, "numeric") is not TRUE > d1@age <- 5

Another difference between @ and $ in S3 is that S4 slots are not partially matched.

new_dog <- function(name, age) {
  structure(
    list(
      name = name,
      age = age
    ),
    class = "dog"
  )
}

s3_dog <- new_dog(name = "Milo", age = 4)
s4_dog <- new("Dog", name = "Milo", age = 4) > s3_dog$a # will return value of the "age" field
[1] 4

> s4_dog@a
Error: no slot of name "a" for this object of class "Dog"

This can save us from introducing an unexpected bug caused by partial matching!

Defining an S4 Method

Let’s see how our S4 dog would get printed out:

> print(d1)
An object of class "Dog"
Slot "name":
[1] "Milo"

Slot "age":
[1] 4

Here we are using the default way of printing S4 classes. In case we want to have a different way of printing, we need to create a custom show method.

To define an S4 method we use the setMethod() function:

setMethod(
  f = "show",
  signature ="Dog",
  definition = function(object) {
    cat("Dog: \n")
    cat("\tName: ", object@name, "\n", sep = "")
    cat("\tAge: ", object@age, "\n", sep = "")
  }
)

Let’s break down the arguments one by one:

  1. f – is the name of the generic function we want to implement.
  2. signature – defines the classes required for the method arguments
  3. definition – is our actual implementation of the method

Let’s give it a try!

> print(d1)
Dog: 
	Name: Milo
	Age: 5

Defining Our Own Generic

For now, we implemented a method for an existing show generic. What if we want to create our own dog-related functionalities? Let’s create a makeSound generic:

setGeneric(
  name = "makeSound",
  def = function(x) standardGeneric("makeSound")
)

Now, we need to implement a makeSound method for our dog class:

setMethod(
  f = "makeSound",
  signature = "Dog",
  definition = function(x) {
    cat(x@name, "says", "Wooof!\n")
  }
)

Let’s give it a go:

> makeSound(d1)
Milo says Wooof!

S4 Features for Object-Oriented Programming in R

We created our first S4 class and generic. Now let’s explore some additional features that the S4 system has to offer!

Object Validation

Apart from the validation of slots, we can also define additional constraints using validators. For example, as of now, we are able to create a dog with a negative age:

dog_with_negative_age <- new("Dog", name = "Milo", age = -1)

To prevent that from happening, let’s define a validator using the setValidity method:

setValidity(
  Class = "Dog",
  method = function(object) {
    if (object@age < 0) {
      "age should be a positive number"
    } else {
      TRUE
    }
  }
)

Let’s break down the arguments one by one:

  1. Class corresponds to the name of our class.
  2. Method is a function that accepts one argument (the object to validate). The function needs to return TRUE if the object is valid or one or more descriptive strings if any problems are found.

And just to check if it’s working:

> dog_with_negative_age_take_2 <- new("Dog", name = "Milo", age = -1)

Error in validObject(.Object) : 
  invalid class “Dog” object: age should be a positive number

Important: Our custom validator will be called automatically only when creating an object, so it doesn’t prevent us from making the object invalid when changing the value of a slot.

d3 <- new("Dog", name = "Milo", age = 4)
d3@age <- -4

Dog: 
	Name: Milo
	Age: -4

But if we call explicitly our validator (by calling the validObject function), we will learn that the object is not correct anymore:

validObject(d3)

Error in validObject(d3) : 
  invalid class “Dog” object: age should be a positive number

This is why some sources recommend defining accessor functions for classes to avoid such issues. We will go back to this topic when talking about recommended practices for using S4.

Virtual Classes

The S4 system provides support for virtual classes – or classes that cannot be instantiated. Ok, but why would we want to do that?

Virtual classes can be useful in cases where you want to define implementation details that can be reused by other classes through inheritance.

We already covered inheritance in our previous blog post when using S3 methods, so let’s see how we could leverage virtual classes if we wanted to define Cat classes that have some shared functionality with Dog classes.

Both cats and dogs have a name and age, right? Let’s define a virtual class Animal that will contain both name and age fields.


setClass(
  Class = "Animal",
  contains = "VIRTUAL",
  slots = c(
    name = "character",
    age = "numeric"
  )
)

This time, because we are defining a virtual class, we set the contains parameter to “VIRTUAL”. If we try to create an animal object, we will get an error:

new("Animal", name = "Milo", age = 4)
Error in new("Animal", name = "Milo", age = 4) : 
  trying to generate an object from a virtual class ("Animal")

Now, let’s use our Animal class to create a Cat class and a new version of the Dog class:

setClass(
  Class = "Dog",
  contains = "Animal"
)

setClass(
  Class = "Cat",
  contains = "Animal"
)

Now, we can create Dog and Cat objects with name and age fields:

d <- new("Dog", name = "Milo", age = 4)
c <- new("Cat", name = "Tucker", age = 2)

This saved us some typing as we only defined the name and age slots in the Animal virtual class. Best part? Validators and methods are inherited as well:

setValidity(
  Class = "Animal",
  method = function(object) {
    if (object@age < 0) {
      "An animal cannot have a negative age"
    } else {
      TRUE
    }
  }
)

d <- new("Dog", name = "Milo", age = -4)
Error in validObject(.Object) : 
  invalid class “Dog” object: An animal cannot have a negative age

c <- new("Cat", name = "Tucker", age = -2)
Error in validObject(.Object) : 
  invalid class “Cat” object: An animal cannot have a negative age

The show method for our Cat will be very similar to the one we defined before for the Dog class, but we want to display the word “Cat” instead of “Dog”. We could implement it like this:

setMethod(
  f = "show",
  signature ="Animal",
  definition = function(object) {
    object_class <- is(object)[1]
    
    cat(object_class, " (an Animal) \n")
    cat("\tName: ", object@name, "\n", sep = "")
    cat("\tAge: ", object@age, "\n", sep = "")
  }
)

Now, let’s see what happens if we try to print our cat:

print(c)
Cat  (an Animal) 
	Name: Tucker
	Age: 2

It’s working! We already learned in the previous articles how inheritance helps with reusing code between classes, allowing us to not repeat ourselves.

Additionally, by using virtual classes we can reuse code that does not necessarily make sense when used in isolation, so we can prevent users from accidentally creating objects that might not make sense.

Multiple Dispatch

First of all,  what is method dispatch? We already made use of it in the article about S3 classes! Remember when we were defining the make_sound.cat and make_sound.dog methods?

The make_sound generic would use the class of its first argument to identify which method should be called. If it’s a dog then it uses the make_sound.dog method and if it’s a cat it uses the make_sound.cat method.

Ok, now we know what method dispatch is, but what is multiple dispatch? It is the same concept but applied to multiple arguments! That means you can use classes of multiple arguments to pick the right method!

Let’s see an example: we will create a Pizza and Pineapple class and try to combine them.

setClass(
  Class = "Pizza",
  slots = c(
    diameter = "numeric"
  )
)

setClass(
  Class = "Pineapple",
  slots = c(
    weight = "numeric"
  )
)

setGeneric(
  name = "combine",
  def = function(x, y) standardGeneric("combine"),
  signature = c("x", "y")
)

In the setGeneric function we can use the signature argument to define which arguments should be used for dispatching (Note: by default, all formal arguments except … are used, but we wanted to be explicit in this example).

Now let’s implement methods for particular cases:

  1. Combining Pizza with Pizza.
  2. Combining Pineapples with Pineapples.
  3. Combining Pizzas with Pineapple.
  4. Combining Pineapples with Pizzas (the order matters, so we need to cover this case as well!).
setMethod(
  f = "combine",
  signature = c("Pizza", "Pizza"),
  definition = function(x, y) {
    "Even more pizza!"
  }
)


setMethod(
  f = "combine",
  signature = c("Pineapple", "Pineapple"),
  definition = function(x, y) {
    "Even more pineapple!"
  }
)

setMethod(
  f = "combine",
  signature = c("Pineapple", "Pizza"),
  definition = function(x, y) {
    stop("Pineapple and pizza don't go well together!")
  }
)

setMethod(
  f = "combine",
  signature = c("Pizza", "Pineapple"),
  definition = function(x, y) {
    stop("Pineapple and pizza don't go well together!")
  }
)


pineapple <- new("Pineapple", weight = 1)
pizza <- new("Pizza", diameter = 32) > combine(pizza, pizza)
[1] "Even more pizza!"

> combine(pineapple, pineapple)
[1] "Even more pineapple!"

> combine(pineapple, pizza)
Error in combine(pineapple, pizza) : 
  Pineapple and pizza don't go well together!

> combine(pizza, pineapple)
Error in combine(pizza, pineapple) : 
  Pineapple and pizza don't go well together!

Multiple Inheritance

S4 supports multiple inheritance, which means we can inherit from more than one class. Let’s go back to our example hierarchy of Animal, Dog and Cat classes.

What if we wanted to include the owner information for both cats and dogs? We could add an owner slot to the Animal class, but what if we wanted to add a Moose class? Mooses usually are not pets!

In that case, we might want to use multiple inheritance and define a new virtual class called Pet:

setClass(
  Class = "Pet",
  contains = "VIRTUAL",
  slots = c(
    owner = "character"
  )
)

# Animal class from the previous example
setClass(
  Class = "Animal",
  contains = "VIRTUAL",
  slots = c(
    name = "character",
    age = "numeric"
  )
)

Now, we can define our Dog and Cat classes like this:

setClass(
  Class = "Dog",
  contains = c("Animal", "Pet")
)

setClass(
  Class = "Cat",
  contains = c("Animal", "Pet")
)

d <- new("Dog", name = "Milo", age = 5, owner = "Jane")
c <- new("Cat", name = "Tucker", age = 2, owner = "John")

And in case we need to add a Moose class in the future, we can do it like this:

setClass(
  Class = "Moose",
  contains = c("Animal")
)

m <- new("Moose", name = "Moose", age = 21)

Class Unions

In the Defining an S4 class section, we specified what are the types of slots for a given class. But here’s the thing, what if we wanted to be more flexible and want one field to be either an instance of one class or another class?

For example, let’s say we want a class that holds information about our data source that could be either a data.frame or path to a file containing the data.

We can use the unrestricted class ANY:

setClass(
  Class = "DataManager",
  slots = c(
    "source" = "ANY"
  )
)

However, this is not safe as we can create an incorrect object:

new("DataManager", source = 1234)

Instead, we can use a class union! Let’s define a class union that would allow us to provide either characters or data.frames for the source slot. This can be done using the setClassUnion function:

setClassUnion(
  name = "DataSource",
  members = c("data.frame", "character")
)

setClass(
  Class = "DataManager",
  slots = c(
    "source" = "DataSource"
  )
)

Now, our slot gets validated:

> new("DataManager", source = 1234)
Error in validObject(.Object) : 
  invalid class “DataManager” object: invalid object for slot "source" in class "DataManager": got class "numeric", should be or extend class "DataSource"

> Fun Fact: An interesting example of using class unions is the index class in the Matrix package. It is implemented as a union of numeric, logical, and character (source). This allows us to index matrices using numerics, logicals, or characters.

Coercion System

S4 offers a coercion system. By ‘coercion’ we mean the process of transforming a value of a given type to a value of another type. An example of coercion you might be familiar with is the as.numeric function.

For example, coercing a character value into a numeric will look like this:

> as.numeric("123")
[1] 123

But what if we want to coerce an object of one S4 class into an object of another S4 class? This is where we can leverage the S4 coercion system. Let’s assume we have a custom class for storing game scores:

setClass(
  Class = "GameScore",
  slots = c(
    "homeTeam" = "character",
    "awayTeam" = "character",
    "homeTeamScore" = "numeric",
    "awayTeamScore" = "numeric"
  )
)

game_score <- new(
  "GameScore",
  homeTeam = "Team A",
  awayTeam = "Team B",
  homeTeamScore = 114,
  awayTeamScore = 120
)

Now, what if we wanted to be able to convert an object of the GameScore class into a data.frame? We can define a method for coercing an object of GameScore into a data.frame by using the setAs method!

setAs(
  from = "GameScore",
  to = "data.frame",
  def = function(from) {
    data.frame(
      team = c(from@homeTeam, from@awayTeam),
      points = c(from@homeTeamScore, from@awayTeamScore)
    )
  }
)

Now, we can convert our game_score into a data.frame by using the as method:

> as(game_score, "data.frame")
    team points
1 Team A    114
2 Team B    120

The S4 system provides default coercion methods when coercing child classes to parent classes and the other way around. However, we have to be careful here as we might accidentally end up with an invalid object. (source)

Like here:

setClass(
  Class = "Password",
  slots = c(
    value = "character"
  )
)


setClass(
  Class = "LongPassword",
  contains = "Password",
  slots = c(
    value = "character"
  )
)

setValidity(
  Class = "LongPassword",
  method = function(object) {
    # For the sake of the example we assume 20 characters is long enough
    if (nchar(object@value) > 20) {
      TRUE
    } else {
      "Password is too short!"
    }
  }
)

Now, let’s create some password objects:

short_password <- new(
  "Password", 
  value = stringi::stri_rand_strings(n = 1, length = 3)
)

long_password <- new(
  "LongPassword", 
  value = stringi::stri_rand_strings(n = 1, length = 32)
)

Everything is ok when we coerce our LongPassword object into a Password object. However, when converting from  Password to LongPassword we end up with an invalid object:

coerced_short_password <- as(short_password, "LongPassword")

validObject(coerced_short_password)
# Error in validObject(coerced_short_password) : 
#  invalid class “LongPassword” object: Password is too short!

Recommended Practices in R Object-Oriented Programming

The S4 OOP system has a rich set of powerful features. As you know, with great power comes great responsibility – so this section brings you a recommended set of practices for using S4 classes.

There are multiple sources of recommended practices when it comes to S4 including:

  1. R’s built-in documentation
  2. The S4 chapter of the Advanced R book
  3. S4 classes and methods (from Bioconductor learning materials)

There are cases of conflicting recommendations, for example in ?setClass we can read that the prototype argument is unlikely to be useful. While in Advanced R it is considered as bad advice and says that the prototype parameter should always be provided.

Here, we will summarize the recommended practices along with their sources:

  1. New S4 generics should by convention use lowerCamelCase (Advanced R)
  1. S4 classes should by convention use UpperCamelCase (Advanced R)
  1. Consider defining (S4 classes and methods)
    • Validity methods with setValidity for your classes
    • show methods for your classes
    • A constructor function named as the class that is documented and user-friendlysome text
    • Coercion methods
    • Additional methods depending on the shape of the object. For example, adding a length() method for vector-like objects
  1. Slots of a class should be considered as internal implementation details and should not be used directly using @ outside of methods. To allow users to access values in those slots provide getters and if the objects are intended to be modified provide setters as well (both Advanced R and S4 classes and methods)
  2. Keep method dispatch as simple as possible – avoid multiple inheritance and use multiple dispatch only when absolutely necessary (Advanced R)

If you don’t know which recommended practices to follow, consider your context. For example, if you are developing a package you want to publish to Bioconductor, then consider following practices recommended by Bioconductor.

S4 Usage in the Community

The S4 OOP system has seen large adoption in Bioconductor. In BioC 2.6, 51% of Bioconductor packages defined S4 classes (source).

It has also been used in other packages outside of Bioconductor, such as:

  1. Matrix – in version version 1.3.2 it defines 102 classes, 21 generic functions, and 2005 methods. (source)
  1. Rcpp.
  2. DBI – in the History of DBI article we can learn that at some point it used S3 classes and later on converted to S4 classes.

Summing up Object Oriented Programming in R – Part 3

  1. The S4 OOP system was developed by Bell Labs and introduced into the S language in the late 1990s.
  1. S4 is more formal and rigorous compared to S3 as it allows defining the types of class slots as well as validators.
  1. S4 offers additional features compared to S3 classes, such as virtual classes, multiple dispatch, multiple inheritance, class unions, and coercion. These powerful features give us new ways of solving problems in code.
  1. Currently, there are multiple sources of recommended practices for the S4 system. There are cases when different sources give conflicting recommendations.
  1. Because of the rich set of features offered by the S4 system, it has a higher learning curve compared to the S3 system.
  1. S4 features are powerful but should be used carefully. For example, combining multiple inheritance along with multiple dispatch can lead to situations where it might be hard to reason which method will be called for which combination of inputs.
  1. S4 classes are used extensively in Bioconductor as well as in some well-known packages in the R community such as Matrix, Rcpp, and DBI.‍

Advance your R coding techniques: Embrace functional programming for enhanced code efficiency and maintainability with our guide.

The post appeared first on appsilon.com/blog/.

To leave a comment for the author, please follow the link and comment on their blog: Appsilon | Enterprise R Shiny Dashboards .

Want to share your content on python-bloggers? click here.