Object Oriented Programming in R (Part 4): Reference Classes & R6 Classes

Posted on August 6, 2024 by Ryszard Szymański in Data science | 0 Comments

This article was first published on Appsilon | Enterprise R Shiny Dashboards , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In our last article, we explored the S4 OOP system in R. Up until this point, we had only discussed functional OOP systems in R.
Today, we are going to learn about two encapsulated OOP systems available for R:

Reference Classes – introduced to R in 2010 in version 2.12.0 (source). Sometimes also referred to as R5 and RC.

R6 classes – OOP system available in the R6 package created in 2014.

We will first define what we mean by functional and encapsulated OOP, followed by example usage of Reference Classes and R6 classes.

Last but not least, we will go through example use cases of using R6 within the community.

Functional and Encapsulated OOP

We can divide OOP systems available in R into two groups. We will be using the terminology established in Extending R:

Functional OOP

Encapsulated OOP (sometimes referred to as message passing OOP like in the S7 proposal)

The difference between them is based on the relationship between methods and classes. In functional OOP methods belong to generics, while in encapsulated OOP methods belong to classes/objects.

Remember our make_sound functions in the S3 article? We would first define a generic function and later on, implement a method for a specific class. Example usage of the make_sound for S3 looked like this:

d <- new_dog(name = "Milo", age = 4)

make_sound(d)

‍

The dog object (d) is the argument of the method.

‍

In the case of encapsulated OOP, we define methods as part of the class definitions and later on call them like this:

‍

Note: We are using R6 syntax here to present the general idea. We will be explaining specifics in subsequent sections!

d <- Dog$new(name = "Milo", age = 4)
d$make_sound()

‍

Here, the make_sound method is part of the dog object (d).

‍

Encapsulated OOP is the type of OOP found in most popular programming languages. Therefore, for those familiar with, for example, Java or Python, encapsulated OOP syntax might seem more familiar! (Other languages typically use . instead of $ to access object fields and methods.)

‍

Reference Classes

Creating Our First Reference Class

To create a Reference Class we use the setRefClass function. You can think of it as the equivalent of setClass:

Dog <- setRefClass(
  Class = "Dog",
  fields = c(
    name = "character",
    age = "numeric"
  )
)

The setRefClass returns a generator function for creating objects. Now, to create objects of the class Dog, we do:

d <- Dog$new(name = "Milo", age = 4)

Reference Classes are built on top of S4 classes, so fields get validated on initialization:

> Dog$new(name = "Milo", age = "4 years old")
Error: invalid assignment for reference class field ‘age’, should be from class “numeric” or a subclass (was class “character”)

We can access the fields of the created object by using $:

> d$name
[1] "Milo"

> d$age
[1] 4

> d$name

Even though we are using $ we don’t need to worry about partial matching as under the hood, Reference classes are implemented as S4 classes, where fields are stored as named objects in an environment (see ?ReferenceClasses for more details)

> d$a
Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : 
  ‘a’ is not a valid field or method name for reference class “Dog”

Creating Our First Method

Let’s define our first method, this can be done in two ways:

By invoking $methods() on the generator function returned by setRefClass

By using the methods argument in the setRefClass call

‍

# Approach 1
Dog$methods(
  make_sound = function() {
    cat(name, "says", "Wooof!")
  }
)

# Approach 2
Dog <- setRefClass(
  Class = "Dog",
  fields = c(
    name = "character",
    age = "numeric"
  ),
  methods = list(
    make_sound = function() {
      cat(name, "says", "Wooof!")
    }
  )
)

Now, we can use our make_sound method like this:

d <- Dog$new(name = "Milo", age = 4)
d$make_sound()

You might wonder, how does the make_sound method know that the name variable corresponds to the name of the object?

In contrast to other programming languages in Reference Classes the body of the function can contain calls to any other methods or fields of the object by name (see the Writing Reference Methods section of ?ReferenceClasses for more details).

Reference Classes objects are mutable (they can be modified in place). To change the value of a field within a method we use the << - operator.

Dog$methods(
  set_name = function(new_name) {
    name <<- new_name 
  }
)

d$set_name("Tucker")

> d
Reference class object of class "Dog"
Field "name":
[1] "Tucker"
Field "age":
[1] 4

External Methods

There is also an alternative type of methods called External Methods. An external method is a method where the first argument of the method is called .self. The body of an external method behaves like in any ordinary function and we no longer can refer to other methods or fields by name.

Let’s recreate our Dog class with make_sound and set_name methods written as External Methods:

Dog <- setRefClass(
  Class = "Dog",
  fields = c(
    name = "character",
    age = "numeric"
  ),
  methods = list(
    make_sound = function(.self) {
      cat(.self$name, "says", "Wooof!")
    },

    set_name = function(.self, new_name) {
      .self$name <- new_name
    }
  )
)

d <- Dog$new(name = "Milo", age = 4)

> d$make_sound()
Milo says Wooof!

> d$set_name(new_name = "Tucker")
> d
Reference class object of class "Dog"
Field "name":
[1] "Tucker"
Field "age":
[1] 4

The reason why External Methods exist is to avoid issues when inheriting classes between packages (see ?ReferenceClasses for more details).

The documentation discourages the usage of External methods as there is no obvious advantage of using them when not needed. External methods are considered as harder to read and write and are slightly slower to execute.

We discuss inheritance in-depth in the second part of the object-oriented programming series. Check out this blog post to learn more.

Inheritance

Reference Classes support inheritance through the contains parameter of setRefClass.

Let’s define our Animal, Dog, Cat class hierarchy:

Animal <- setRefClass(
  Class = "Animal",
  fields = c(
    name = "character",
    age = "numeric"
  )
)

Dog <- setRefClass(
  Class = "Dog",
  contains = "Animal"
)

Cat <- setRefClass(
  Class = "Cat",
  contains = "Animal"
)

d <- Dog$new(name = "Milo", age = 4)
c <- Cat$new(name = "Tucker", age = 2)

Also, because under the hood Reference Classes are implemented as S4 classes we can leverage S4 features such as Virtual Classes:

Animal <- setRefClass(
  Class = "Animal",
  fields = c(
    name = "character",
    age = "numeric"
  ),
  contains = "VIRTUAL"
)


> Animal$new(name = "Milo", age = 3)
Error in methods::new(def, ...) : 
  trying to generate an object from a virtual class ("Animal")

Or multiple inheritance:

Pet <- setRefClass(
  Class = "Pet",
  contains = "VIRTUAL",
  fields = c(
    owner = "character"
  )
)

Animal <- setRefClass(
  Class = "Animal",
  contains = "VIRTUAL",
  fields = c(
    name = "character",
    age = "numeric"
  )
)

Dog <- setRefClass(
  Class = "Dog",
  contains = c("Animal", "Pet")
)

Cat <- setRefClass(
  Class = "Cat",
  contains = c("Animal", "Pet")
)

d <- new("Dog", name = "Milo", age = 5, owner = "Jane")
c <- new("Cat", name = "Tucker", age = 2, owner = "John")

Reference Classes can inherit from S4 classes; however the Inheritance section of ?ReferenceClasses discourages doing so.

S4 features Supported by Reference Classes

As mentioned in the Inheritance section, Reference Classes support S4 features such as virtual classes or multiple inheritance. In ?ReferenceClasses we can also find information that Reference Classes support:

Validation Methods

Class Unions

Validation Methods

We can define validation methods for our reference classes using the setValidity function, but now we need to remember to access object fields through $ instead of @:

Dog <- setRefClass(
  Class = "Dog",
  fields = c(
    name = "character",
    age = "numeric"
  )
)

setValidity(
  Class = "Dog",
  method = function(object) {
    if (object$age < 0) {
      "age should be a positive number"
    } else {
      TRUE
    }
  }
)

> Dog$new(name = "Milo", age = -1)
Error in validObject(.Object) : 
  invalid class “Dog” object: age should be a positive number

Class Unions

Reference Classes can be part of class unions. Let’s create a Reference Class for each particle and a class union for particles:

Proton <- setRefClass(
  Class = "Proton"
)

Neutron <- setRefClass(
  Class = "Neutron"
)

Electron <- setRefClass(
  Class = "Electron"
)

setClassUnion(
  name = "Particle",
  members = c(
    "Proton",
    "Neutron",
    "Electron"
  )
)

Coercion System

Reference Classes, by default provide $export and $import methods for coercion. Let’s consider our Animal, Dog, Cat class hierarchy. For the sake of the example, we won’t make Animal a virtual class:

Animal <- setRefClass(
  Class = "Animal",
  fields = c(
    name = "character",
    age = "numeric"
  )
)

Dog <- setRefClass(
  Class = "Dog",
  contains = "Animal"
)

Cat <- setRefClass(
  Class = "Cat",
  contains = "Animal"
)

We can use $export to coerce a Dog object into an Animal object:

> d <- Dog$new(name = "Milo", age = 4)
> a <- d$export("Animal")
> is(a)[1]
[1] "Animal"

There is also the $import method to copy corresponding fields from a superclass:

> a <- Animal$new(name = "Milo", age = 4)
> d <- Dog$new(name = "", age = NA_integer_)
> d$import(a)
> d
Reference class object of class "Dog"
Field "name":
[1] "Milo"
Field "age":
[1] 4

In case we wanted to coerce a Dog object into a Cat object, we will encounter an error:

> d <- Dog$new(name = "Milo", age = 4)
> d$export("Cat")
Error in methods::as(.self, Class) : 
  no method or default for coercing “Dog” to “Cat”

Fortunately, we can define a coercion method using the setAs function:

setAs(
  from = "Dog",
  to = "Cat",
  def = function(from) {
    Cat$new(
      name = from$name,
      age = from$age
    )
  }
)

‍

And now, we are able to do the coercion:

> d$export("Cat")
Reference class object of class "Cat"
Field "name":
[1] "Milo"
Field "age":
[1] 4

We covered what you need to know about R6. Learn more in this blog post.

R6 Classes

‍

R6 is another encapsulated OOP system available in R. However, this time it is not a part of the core language. It is a package available on CRAN since 2014.

R6 classes are similar to Reference Classes, but they are more efficient and they do not depend on S4 classes or the {methods} package.

In fact, packages like {httpuv} or {shiny} used to use Reference Classes, but switched to R6 at one point (see commits for shiny and httpuv). In Advanced R it is mentioned that this switch led to a substantial performance improvement in the shiny package.

There is a Performance vignette available in the R6 package that compares the performance of R6 and Reference classes. It showed that R6 objects take up less memory and are faster compared to Reference Classes objects.

Sounds exciting, isn’t it? Let’s have a closer look at R6 classes!

Creating Our First R6 Class

To create an R6 class, we use the R6Class function:

Dog <- R6::R6Class(
  classname = "Dog",
  public = list(
    name = NULL,
    age = NULL
  )
)

Similarly to setRefClass it returns a generator, and we can now create objects by calling Dog$new

> d <- Dog$new()
> d
<Dog>
  Public:
    age: NULL
    clone: function (deep = FALSE) 
    name: NULL

However, you may notice that if we try to provide values for age and name, it will not work:

> Dog$new(name = "Milo", age = 4)
Error in Dog$new(name = "Milo", age = 4) : 
  Called new() with arguments, but there is no initialize method.

To support arguments when creating a new object, we need to provide an initialize method:

Dog <- R6::R6Class(
  classname = "Dog",
  public = list(
    name = NULL,
    age = NULL,
    initialize = function(name, age) {
      self$name <- name
      self$age <- age
    }
  )
)

‍

To access public fields by default we need to refer to them through self. Let’s see if it works!

Note: By default R6 classes are portable to avoid issues when inheriting classes across packages. But if we use non-portable mode, R6 behaves like Reference Classes, and fields can be accessed without using self (source)

‍

> d <- Dog$new(name = "Milo", age = 4)
> d
<Dog>
  Public:
    age: 4
    clone: function (deep = FALSE) 
    initialize: function (name, age) 
    name: Milo

‍

We can access public fields by using $:

> d$age
[1] 4
> d$name
[1] "Milo"

In the case of R6 classes, objects are implemented as environments, so there is no partial matching, but we also do not get an explicit error:

> d$a
NULL

> d$nam
NULL

Creating Our Method

Actually, we already defined the initialize method, but let’s add another one! This time instead of redefining our class let’s use the set method on the generator object:

Dog$set(
  which = "public",
  name = "make_sound",
  value = function() {
    cat(self$name, "says", "Wooof!")
  }
)

Now, we can call our method in a similar way as in Reference Classes

> d <- Dog$new(name = "Milo", age = 4)
> d$make_sound()
Milo says Wooof!

Private and Public fields

Up until now we have been using the public argument when creating our R6 classes. But we don’t necessarily want to make all fields available to end users right?

‍

To make our name and age fields – a bit harder to reach, let’s make them private.

Dog <- R6::R6Class(
  classname = "Dog",
  private = list(
    name = NULL,
    age = NULL
  )
)

We will also need to adjust our initialize and make_sound methods. Previously we used self to access public fields and methods, but as they are private now we need to refer to them through private:

Dog <- R6::R6Class(
  classname = "Dog",
  private = list(
    name = NULL,
    age = NULL
  ),
  public = list(
    initialize = function(name, age) {
      private$name <- name
      private$age <- age
    },
    make_sound = function() {
      cat(private$name, "says", "Wooof!")
    }
  )
)

Now, we won’t have direct access to the name and age fields:

> d <- Dog$new(name = "Milo", age = 4)
> d$age
NULL
> d$name

but we will still be able to use our methods

> d$make_sound()
Milo says Wooof!

Note: We used the term harder to reach, because it’s possible to access private fields like this:

> d$.__enclos_env__$private$name
[1] "Milo"

However, by making fields private we give a message to class users that those fields are internal implementation details and should not be accessed directly.

Inheritance

R6 classes support inheritance through the inherit parameter of R6::R6Class.

Let’s define our Animal, Dog, Cat class hierarchy:

Animal <- R6::R6Class(
  classname = "Animal",
  private = list(
    name = NULL,
    age = NULL
  ),
  public = list(
    initialize = function(name, age) {
      private$name <- name
      private$age <- age
    }
  )
)

Dog <- R6::R6Class(
  classname = "Dog",
  inherit = Animal
)

Cat <- R6::R6Class(
  classname = "Cat",
  inherit = Animal
)

Now, we can create Cat and Dog objects like this:

> d <- Dog$new(name = "Milo", age = 4)
> c <- Cat$new(name = "Tucker", age = 2)

> d
<Dog>
  Inherits from: <Animal>
  Public:
    clone: function (deep = FALSE) 
    initialize: function (name, age) 
  Private:
    age: 4
    name: Milo
> c
<Cat>
  Inherits from: <Animal>
  Public:
    clone: function (deep = FALSE) 
    initialize: function (name, age) 
  Private:
    age: 2
    name: Tucker

‍Note: At the time of writing, R6 doesn’t support multiple inheritance. However, there are existing GitHub issues (1, 2) that discuss the topic.

Active Bindings

Active bindings is an R6 feature that looks like fields, but each time they are accessed they call a function.

Let’s assume we want to provide getters and setters for the name field of our Dog class. We might consider implementing them by using public methods:

Dog <- R6::R6Class(
  classname = "Dog",
  private = list(
    name = NULL,
    age = NULL
  ),
  public = list(
    initialize = function(name, age) {
      private$name <- name
      private$age <- age
    },
    
    get_name = function() {
      private$name
    },
    
    set_name = function(name) {
      stopifnot(is.character(name))
      private$name <- name
    }
  )
)

Instead of defining two methods with get_ and set_ prefixes, we can provide similar functionality through an active binding:

Dog <- R6::R6Class(
  classname = "Dog",
  private = list(
    .name = NULL,
    age = NULL
  ),
  public = list(
    initialize = function(name, age) {
      private$.name <- name
      private$age <- age
    }
  ),
  active = list(
    name = function(value) {
      if (missing(value)) {
        private$.name
      } else {
        stopifnot(is.character(value))
        private$.name <- value
      }
    }
  )
)

Note: We needed to change name to .name in private fields, because we can’t have both a name private field and a name active binding.

Now, we can still both access and set the .name field, but with slightly different syntax.

> d <- Dog$new(name = "Milo", age = 4)
> d$name
[1] "Milo"
> d$name <- "Marley"
> d$name
[1] "Marley"

and our validation is still working!

> d$name <- 123
Error in (function (value)  : is.character(value) is not TRUE

R6 Usage in the Community

As we already mentioned, R6 has been used in {shiny} and improved the performance of the package.

It is used in other packages as well and can be found used directly by 32 packages on CRAN (checked with tools::dependsOnPkgs("R6", recursive = FALSE) |> length()).

At Appsilon we have found R6 to be particularly useful in Organizing shiny apps and Managing App State in Shiny which helps us in modularising our code.

mlr3 is another interesting example. In fact, it is a rewrite of mlr and uses R6 instead of S3. During the useR! 2020: Machine Learning with mlr3 (Bernd Bischl, Michel Lang), tutorial, the authors mentioned that the mlr team felt limited by S3.

In Advanced R, Hadley mentions how originally S3 classes were used in ggplot2 to implement scales and that for this particular area, R6 classes made the code significantly simpler.

Conclusions

OOP systems in R can be divided into two groups: functional OOP systems and encapsulated OPP systems
1. The difference between them is based on the relationship between methods and classes. In functional OOP methods belong to generics, while in encapsulated OOP methods belong to classes/objects.

Reference Classes is an encapsulated OOP system built on top of S4 classes, introduced to R in 2010.

Because Reference Classes are built on top of S4 classes, they support features such as validation methods, class unions, or multiple inheritance. Reference classes can also be used in S4’s coercion system.

R6 is another encapsulated OOP system available in R. It is not a part of the core R language and has been available in the form of a package on CRAN since 2014.

R6 classes take up less memory and are faster compared to Reference Classes.

R6 classes are very similar to Reference Classes in terms of syntax. They do not support multiple inheritance.

R6 classes have been used in packages such as {shiny}, {mlr3} and at Appsilon we have found them useful in organizing Shiny apps and managing app state in Shiny.

In ggplot2, scales were originally implemented using S3 classes, a later switch to R6 classes made the code significantly simpler.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

Object Oriented Programming in R (Part 4): Reference Classes & R6 Classes

Functional and Encapsulated OOP

Reference Classes

Creating Our First Reference Class

Creating Our First Method

External Methods

Inheritance

S4 features Supported by Reference Classes

Validation Methods

Class Unions

Coercion System

R6 Classes

Creating Our First R6 Class

Creating Our Method

Private and Public fields

Inheritance

Active Bindings

R6 Usage in the Community

Conclusions

Other Blog Posts in this Series

Related