Python in R Markdown

This article was first published on The Data Sandbox , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Photo by David Clode on Unsplash

The main advantage of using the R Markdown format is the utility of running R code within the text. This is clearly more advantageous than just writing code in a Markdown file. R Markdown is however limited to R code, unable to run Python scripts. The R library reticulate looks to add this capability.

Initial Setup

The initial setup requires the installation of the reticulate library, after installation you shouldn’t need to call it, but I do in the preceding code. I have loaded the trees dataset as a test dataset and the tidyverse library just to explore the data a bit.

library(reticulate)
library(tidyverse)
data(trees)
glimpse(trees)
Rows: 31
Columns: 3
$ Girth  <dbl> 8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11.0, 11.0, 11.1, 11.2, 11.3, …
$ Height <dbl> 70, 65, 63, 72, 81, 83, 66, 75, 80, 75, 79, 76, 76, 69, 75, 74,…
$ Volume <dbl> 10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9, 24.…

Now, R Studio will use your local version of Python when you write any code in a code chuck labelled with the “{Python}” header. If you don’t have any local version, R Studio will ask if you would like to install Miniconda. From here, you will need to start downloading the required Python modules.

Modules can be downloaded with the pip python package installer from the terminal or command line. The easiest method in R Studio is within the terminal window next to the console window. The command used is pip install "module name". Some modules can be tricky and won’t work if not installed after other modules.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Multiple Environments

After the setup, you should see some additional options in the environment in R Studio. You should see that you have the option to switch between the R and Python environments.

Data is transitioned from the R environment to the Python environment with the r variable. This method should pretty similar to the Shiny Apt’s use of input\output. It is not only data that can move between environments, but functions too.

The following code takes data from the R environment and creates a plot in Seaborn. The mean values of the columns are calculated in python to be imported into the R environment. A simple linear model is created with the SKlearn module.

data = r.trees
means = np.mean(data, axis = 0)
data["big"] = data.Height > means.Height 
sns.scatterplot(data = data, x= "Girth", y= "Height", hue = "big")
sns.set_theme(color_codes=True)
plt.show()

from sklearn.linear_model import LinearRegression
mdl = LinearRegression()
mdl.fit(data[["Girth"]], data[["Height"]])
LinearRegression()
print(mdl.coef_)
[[1.05436881]]

Data is transitioned from Python to, R similarly with the variable py. Information on models can be passed but not the models themselves. This is important if you are more comfortable creating models in Python.

print(py$means)
   Girth   Height   Volume 
13.24839 76.00000 30.17097 
print(py$mdl$intercept_)
[1] 62.03131
py$data %>%
        ggplot(aes(x = Girth, y = Height, colour = big)) +
        geom_point()

To leave a comment for the author, please follow the link and comment on their blog: The Data Sandbox .

Want to share your content on python-bloggers? click here.