Object Oriented Programming in R (Part 4): Reference Classes & R6 Classes
Want to share your content on python-bloggers? click here.
In our last article, we explored the S4 OOP system in R. Up until this point, we had only discussed functional OOP systems in R.
Today, we are going to learn about two encapsulated OOP systems available for R:
- Reference Classes – introduced to R in 2010 in version 2.12.0 (source). Sometimes also referred to as R5 and RC.
- R6 classes – OOP system available in the R6 package created in 2014.
We will first define what we mean by functional and encapsulated OOP, followed by example usage of Reference Classes and R6 classes.
Last but not least, we will go through example use cases of using R6 within the community.
Functional and Encapsulated OOP
We can divide OOP systems available in R into two groups. We will be using the terminology established in Extending R:
- Functional OOP
- Encapsulated OOP (sometimes referred to as message passing OOP like in the S7 proposal)
The difference between them is based on the relationship between methods and classes. In functional OOP methods belong to generics, while in encapsulated OOP methods belong to classes/objects.
Remember our make_sound
functions in the S3 article? We would first define a generic function and later on, implement a method for a specific class. Example usage of the make_sound
for S3 looked like this:
d <- new_dog(name = "Milo", age = 4) make_sound(d)
The dog object (d
) is the argument of the method.
In the case of encapsulated OOP, we define methods as part of the class definitions and later on call them like this:
Note: We are using R6 syntax here to present the general idea. We will be explaining specifics in subsequent sections!
d <- Dog$new(name = "Milo", age = 4) d$make_sound()
Here, the make_sound
method is part of the dog object (d
).
Encapsulated OOP is the type of OOP found in most popular programming languages. Therefore, for those familiar with, for example, Java or Python, encapsulated OOP syntax might seem more familiar! (Other languages typically use .
instead of $
to access object fields and methods.)
Reference Classes
Creating Our First Reference Class
To create a Reference Class we use the setRefClass
function. You can think of it as the equivalent of setClass
:
Dog <- setRefClass( Class = "Dog", fields = c( name = "character", age = "numeric" ) )
The setRefClass
returns a generator function for creating objects. Now, to create objects of the class Dog
, we do:
d <- Dog$new(name = "Milo", age = 4)
Reference Classes are built on top of S4 classes, so fields get validated on initialization:
> Dog$new(name = "Milo", age = "4 years old") Error: invalid assignment for reference class field ‘age’, should be from class “numeric” or a subclass (was class “character”)
We can access the fields of the created object by using $
:
> d$name [1] "Milo" > d$age [1] 4
> d$name
Even though we are using $
we don’t need to worry about partial matching as under the hood, Reference classes are implemented as S4 classes, where fields are stored as named objects in an environment (see ?ReferenceClasses
for more details)
> d$a Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : ‘a’ is not a valid field or method name for reference class “Dog”
Creating Our First Method
Let’s define our first method, this can be done in two ways:
- By invoking
$methods()
on the generator function returned bysetRefClass
- By using the
methods
argument in thesetRefClass
call
# Approach 1 Dog$methods( make_sound = function() { cat(name, "says", "Wooof!") } ) # Approach 2 Dog <- setRefClass( Class = "Dog", fields = c( name = "character", age = "numeric" ), methods = list( make_sound = function() { cat(name, "says", "Wooof!") } ) )
Now, we can use our make_sound
method like this:
d <- Dog$new(name = "Milo", age = 4) d$make_sound()
You might wonder, how does the make_sound
method know that the name variable corresponds to the name of the object?
In contrast to other programming languages in Reference Classes the body of the function can contain calls to any other methods or fields of the object by name (see the Writing Reference Methods section of ?ReferenceClasses
for more details).
Reference Classes objects are mutable (they can be modified in place). To change the value of a field within a method we use the << -
operator.
Dog$methods( set_name = function(new_name) { name <<- new_name } ) d$set_name("Tucker") > d Reference class object of class "Dog" Field "name": [1] "Tucker" Field "age": [1] 4
External Methods
There is also an alternative type of methods called External Methods. An external method is a method where the first argument of the method is called .self
. The body of an external method behaves like in any ordinary function and we no longer can refer to other methods or fields by name.
Let’s recreate our Dog
class with make_sound
and set_name
methods written as External Methods:
Dog <- setRefClass( Class = "Dog", fields = c( name = "character", age = "numeric" ), methods = list( make_sound = function(.self) { cat(.self$name, "says", "Wooof!") }, set_name = function(.self, new_name) { .self$name <- new_name } ) ) d <- Dog$new(name = "Milo", age = 4) > d$make_sound() Milo says Wooof! > d$set_name(new_name = "Tucker") > d Reference class object of class "Dog" Field "name": [1] "Tucker" Field "age": [1] 4
The reason why External Methods exist is to avoid issues when inheriting classes between packages (see ?ReferenceClasses
for more details).
The documentation discourages the usage of External methods as there is no obvious advantage of using them when not needed. External methods are considered as harder to read and write and are slightly slower to execute.
We discuss inheritance in-depth in the second part of the object-oriented programming series. Check out this blog post to learn more.
Inheritance
Reference Classes support inheritance through the contains
parameter of setRefClass
.
Let’s define our Animal
, Dog
, Cat
class hierarchy:
Animal <- setRefClass( Class = "Animal", fields = c( name = "character", age = "numeric" ) ) Dog <- setRefClass( Class = "Dog", contains = "Animal" ) Cat <- setRefClass( Class = "Cat", contains = "Animal" ) d <- Dog$new(name = "Milo", age = 4) c <- Cat$new(name = "Tucker", age = 2)
Also, because under the hood Reference Classes are implemented as S4 classes we can leverage S4 features such as Virtual Classes:
Animal <- setRefClass( Class = "Animal", fields = c( name = "character", age = "numeric" ), contains = "VIRTUAL" ) > Animal$new(name = "Milo", age = 3) Error in methods::new(def, ...) : trying to generate an object from a virtual class ("Animal")
Or multiple inheritance:
Pet <- setRefClass( Class = "Pet", contains = "VIRTUAL", fields = c( owner = "character" ) ) Animal <- setRefClass( Class = "Animal", contains = "VIRTUAL", fields = c( name = "character", age = "numeric" ) ) Dog <- setRefClass( Class = "Dog", contains = c("Animal", "Pet") ) Cat <- setRefClass( Class = "Cat", contains = c("Animal", "Pet") ) d <- new("Dog", name = "Milo", age = 5, owner = "Jane") c <- new("Cat", name = "Tucker", age = 2, owner = "John")
Reference Classes can inherit from S4 classes; however the Inheritance section of ?ReferenceClasses
discourages doing so.
S4 features Supported by Reference Classes
As mentioned in the Inheritance section, Reference Classes support S4 features such as virtual classes or multiple inheritance. In ?ReferenceClasses
we can also find information that Reference Classes support:
- Validation Methods
- Class Unions
Validation Methods
We can define validation methods for our reference classes using the setValidity
function, but now we need to remember to access object fields through $
instead of @
:
Dog <- setRefClass( Class = "Dog", fields = c( name = "character", age = "numeric" ) ) setValidity( Class = "Dog", method = function(object) { if (object$age < 0) { "age should be a positive number" } else { TRUE } } ) > Dog$new(name = "Milo", age = -1) Error in validObject(.Object) : invalid class “Dog” object: age should be a positive number
Class Unions
Reference Classes can be part of class unions. Let’s create a Reference Class for each particle and a class union for particles:
Proton <- setRefClass( Class = "Proton" ) Neutron <- setRefClass( Class = "Neutron" ) Electron <- setRefClass( Class = "Electron" ) setClassUnion( name = "Particle", members = c( "Proton", "Neutron", "Electron" ) )
Coercion System
Reference Classes, by default provide $export
and $import
methods for coercion. Let’s consider our Animal
, Dog
, Cat
class hierarchy. For the sake of the example, we won’t make Animal
a virtual class:
Animal <- setRefClass( Class = "Animal", fields = c( name = "character", age = "numeric" ) ) Dog <- setRefClass( Class = "Dog", contains = "Animal" ) Cat <- setRefClass( Class = "Cat", contains = "Animal" )
We can use $export
to coerce a Dog
object into an Animal
object:
> d <- Dog$new(name = "Milo", age = 4) > a <- d$export("Animal") > is(a)[1] [1] "Animal"
There is also the $import
method to copy corresponding fields from a superclass:
> a <- Animal$new(name = "Milo", age = 4) > d <- Dog$new(name = "", age = NA_integer_) > d$import(a) > d Reference class object of class "Dog" Field "name": [1] "Milo" Field "age": [1] 4
In case we wanted to coerce a Dog
object into a Cat
object, we will encounter an error:
> d <- Dog$new(name = "Milo", age = 4) > d$export("Cat") Error in methods::as(.self, Class) : no method or default for coercing “Dog” to “Cat”
Fortunately, we can define a coercion method using the setAs
function:
setAs( from = "Dog", to = "Cat", def = function(from) { Cat$new( name = from$name, age = from$age ) } )
And now, we are able to do the coercion:
> d$export("Cat") Reference class object of class "Cat" Field "name": [1] "Milo" Field "age": [1] 4
We covered what you need to know about R6. Learn more in this blog post.
R6 Classes
R6 is another encapsulated OOP system available in R. However, this time it is not a part of the core language. It is a package available on CRAN since 2014.
R6 classes are similar to Reference Classes, but they are more efficient and they do not depend on S4 classes or the {methods}
package.
In fact, packages like {httpuv}
or {shiny}
used to use Reference Classes, but switched to R6 at one point (see commits for shiny and httpuv). In Advanced R it is mentioned that this switch led to a substantial performance improvement in the shiny package.
There is a Performance vignette available in the R6 package that compares the performance of R6 and Reference classes. It showed that R6 objects take up less memory and are faster compared to Reference Classes objects.
Sounds exciting, isn’t it? Let’s have a closer look at R6 classes!
Creating Our First R6 Class
To create an R6 class, we use the R6Class function:
Dog <- R6::R6Class( classname = "Dog", public = list( name = NULL, age = NULL ) )
Similarly to setRefClass
it returns a generator, and we can now create objects by calling Dog$new
> d <- Dog$new() > d <Dog> Public: age: NULL clone: function (deep = FALSE) name: NULL
However, you may notice that if we try to provide values for age
and name
, it will not work:
> Dog$new(name = "Milo", age = 4) Error in Dog$new(name = "Milo", age = 4) : Called new() with arguments, but there is no initialize method.
To support arguments when creating a new object, we need to provide an initialize method:
Dog <- R6::R6Class( classname = "Dog", public = list( name = NULL, age = NULL, initialize = function(name, age) { self$name <- name self$age <- age } ) )
To access public fields by default we need to refer to them through self
. Let’s see if it works!
Note: By default R6 classes are portable to avoid issues when inheriting classes across packages. But if we use non-portable mode, R6 behaves like Reference Classes, and fields can be accessed without using
self
(source)
> d <- Dog$new(name = "Milo", age = 4) > d <Dog> Public: age: 4 clone: function (deep = FALSE) initialize: function (name, age) name: Milo
We can access public fields by using $
:
> d$age [1] 4 > d$name [1] "Milo"
In the case of R6 classes, objects are implemented as environments, so there is no partial matching, but we also do not get an explicit error:
> d$a NULL > d$nam NULL
Creating Our Method
Actually, we already defined the initialize
method, but let’s add another one! This time instead of redefining our class let’s use the set
method on the generator object:
Dog$set( which = "public", name = "make_sound", value = function() { cat(self$name, "says", "Wooof!") } )
Now, we can call our method in a similar way as in Reference Classes
> d <- Dog$new(name = "Milo", age = 4) > d$make_sound() Milo says Wooof!
Private and Public fields
Up until now we have been using the public
argument when creating our R6 classes. But we don’t necessarily want to make all fields available to end users right?
To make our name
and age
fields – a bit harder to reach, let’s make them private.
Dog <- R6::R6Class( classname = "Dog", private = list( name = NULL, age = NULL ) )
We will also need to adjust our initialize
and make_sound
methods. Previously we used self to access public fields and methods, but as they are private now we need to refer to them through private
:
Dog <- R6::R6Class( classname = "Dog", private = list( name = NULL, age = NULL ), public = list( initialize = function(name, age) { private$name <- name private$age <- age }, make_sound = function() { cat(private$name, "says", "Wooof!") } ) )
Now, we won’t have direct access to the name
and age
fields:
> d <- Dog$new(name = "Milo", age = 4) > d$age NULL > d$name
but we will still be able to use our methods
> d$make_sound() Milo says Wooof!
Note: We used the term harder to reach, because it’s possible to access private fields like this:
> d$.__enclos_env__$private$name [1] "Milo"
However, by making fields private we give a message to class users that those fields are internal implementation details and should not be accessed directly.
Inheritance
R6 classes support inheritance through the inherit
parameter of R6::R6
Class.
Let’s define our Animal
, Dog
, Cat
class hierarchy:
Animal <- R6::R6Class( classname = "Animal", private = list( name = NULL, age = NULL ), public = list( initialize = function(name, age) { private$name <- name private$age <- age } ) ) Dog <- R6::R6Class( classname = "Dog", inherit = Animal ) Cat <- R6::R6Class( classname = "Cat", inherit = Animal )
Now, we can create Cat
and Dog
objects like this:
> d <- Dog$new(name = "Milo", age = 4) > c <- Cat$new(name = "Tucker", age = 2) > d <Dog> Inherits from: <Animal> Public: clone: function (deep = FALSE) initialize: function (name, age) Private: age: 4 name: Milo > c <Cat> Inherits from: <Animal> Public: clone: function (deep = FALSE) initialize: function (name, age) Private: age: 2 name: Tucker
Note: At the time of writing, R6 doesn’t support multiple inheritance. However, there are existing GitHub issues (1, 2) that discuss the topic.
Active Bindings
Active bindings is an R6 feature that looks like fields, but each time they are accessed they call a function.
Let’s assume we want to provide getters and setters for the name
field of our Dog
class. We might consider implementing them by using public methods:
Dog <- R6::R6Class( classname = "Dog", private = list( name = NULL, age = NULL ), public = list( initialize = function(name, age) { private$name <- name private$age <- age }, get_name = function() { private$name }, set_name = function(name) { stopifnot(is.character(name)) private$name <- name } ) )
Instead of defining two methods with get_
and set_
prefixes, we can provide similar functionality through an active binding:
Dog <- R6::R6Class( classname = "Dog", private = list( .name = NULL, age = NULL ), public = list( initialize = function(name, age) { private$.name <- name private$age <- age } ), active = list( name = function(value) { if (missing(value)) { private$.name } else { stopifnot(is.character(value)) private$.name <- value } } ) )
Note: We needed to change
name
to.name
in private fields, because we can’t have both aname
private field and aname
active binding.
Now, we can still both access and set the .name
field, but with slightly different syntax.
> d <- Dog$new(name = "Milo", age = 4) > d$name [1] "Milo" > d$name <- "Marley" > d$name [1] "Marley"
and our validation is still working!
> d$name <- 123 Error in (function (value) : is.character(value) is not TRUE
R6 Usage in the Community
As we already mentioned, R6 has been used in {shiny}
and improved the performance of the package.
It is used in other packages as well and can be found used directly by 32 packages on CRAN (checked with tools::dependsOnPkgs("R6", recursive = FALSE) |> length()
).
At Appsilon we have found R6 to be particularly useful in Organizing shiny apps and Managing App State in Shiny which helps us in modularising our code.
mlr3 is another interesting example. In fact, it is a rewrite of mlr and uses R6 instead of S3. During the useR! 2020: Machine Learning with mlr3 (Bernd Bischl, Michel Lang), tutorial, the authors mentioned that the mlr team felt limited by S3.
In Advanced R, Hadley mentions how originally S3 classes were used in ggplot2 to implement scales and that for this particular area, R6 classes made the code significantly simpler.
Conclusions
- OOP systems in R can be divided into two groups: functional OOP systems and encapsulated OPP systems
- The difference between them is based on the relationship between methods and classes. In functional OOP methods belong to generics, while in encapsulated OOP methods belong to classes/objects.
- Reference Classes is an encapsulated OOP system built on top of S4 classes, introduced to R in 2010.
- Because Reference Classes are built on top of S4 classes, they support features such as validation methods, class unions, or multiple inheritance. Reference classes can also be used in S4’s coercion system.
- R6 is another encapsulated OOP system available in R. It is not a part of the core R language and has been available in the form of a package on CRAN since 2014.
- R6 classes take up less memory and are faster compared to Reference Classes.
- R6 classes are very similar to Reference Classes in terms of syntax. They do not support multiple inheritance.
- R6 classes have been used in packages such as {shiny}, {mlr3} and at Appsilon we have found them useful in organizing Shiny apps and managing app state in Shiny.
- In ggplot2, scales were originally implemented using S3 classes, a later switch to R6 classes made the code significantly simpler.
Other Blog Posts in this Series
- Object-Oriented Programming In R (Part 1): An Introduction
- Object Oriented Programming In R Part 2: S3 Simplified
- Object Oriented Programming In R (Part 3): A Practical Guide To The S4 System
Did you find this blog post useful? Find more materials like our Functional Programming ebook on our new Resources page.
The post appeared first on appsilon.com/blog/.
Want to share your content on python-bloggers? click here.