Debunking the Myths of R vs. Python

This article was first published on Python on RStudio , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Data science teams sometimes believe that they must standardize on R or Python for efficiency, at the cost of forcing individual data scientists to give up their preferred, most productive language. RStudio’s professional products provide the best single home for R and Python data science, so teams can optimize the impact their team has, not the language they use.

R, Python and Serious Data Science

RStudio’s mission is to create free and open-source software for data science, analytic research, and technical communication. This mission is expressed in our charter as a Public Benefits Corporation, and funded by the revenue from our professional products. These products, such as RStudio Team, enable teams and organizations to scale, secure and operationalize their open source data science.

In working with many different organizations that want to maximize the impact of their data science work, we’ve seen three recurring attributes that contribute to success–which collectively we call Serious Data Science:

  • Open source: It’s better for everyone if the tools used for data science are free and open. This enhances the production and consumption of knowledge and facilitates collaboration. The widespread use of open source software makes recruiting, retention and training of data science team members easier, and comprehensive open source ecosystems ensure you have the right tool for any analytic challenge.
  • Code-first: Coding is the most powerful and efficient path to tackle complex, real-world data science challenges. It gives data scientists superpowers to tackle the hardest problems because code is flexible, reusable, inspectable, and reproducible. With code, the answer is always yes.
  • Centralized, on premises or in the cloud: Centralizing the infrastructure for data science work reduces unnecessary headaches for data science teams, promotes collaboration and sharing self-service applications, supports reproducibility and eases administration.

RStudio’s professional products deliver a platform on which to centralize, secure and scale your data science, but there are two prominent choices for open source, code-first environments: R and Python. Teams sometimes believe that they must standardize on one or the other for efficiency, at the cost of forcing individual data scientists to give up their preferred, most productive language.

Myths about R vs. Python

There are a few common myths that we frequently hear from different organizations struggling with the decision of R vs. Python:

  • Cognitive overload for Data Scientists: Practitioners often fear that using more than one language will add overhead and context switching, forcing them to use different development environments.
  • Unnecessary burden on IT: The DevOps and IT teams are concerned that supporting two languages will mean supporting twice the infrastructure for development and deployment, and answering twice as many support tickets for help.
  • Blockers to collaboration, reuse and sharing: The leaders of data science teams worry that allowing multiple different languages will make it harder for the team to collaborate, re-use each other’s work, and deliver that work to the rest of the organization.

Debunking the Myths

While these myths are common, they are nonetheless myths. Advancements in tools in the last few years have made it far easier for a data science team to use both R and Python, side by side.

  • Data scientists can easily combine R and Python: The RStudio IDE makes it easy to combine R and Python in a single data science project. The reticulate package provides a comprehensive set of tools for interoperability between Python and R, and the RStudio IDE has added new capabilities to make Python coding easier, including the display of Python objects in the Environment pane, viewing of Python data frames, and tools for configuring Python versions and conda/virtual environments. (See this blog post on RStudio 1.4, and the recent RStudio 1.4 update, for more information).

Video: Recent improvements to Python integrations in the RStudio 1.4 release.

  • Common infrastructure can support multiple languages and reduce support costs: By using a platform that supports both R and Python, such as RStudio Team, DevOps and IT teams can enable data scientists to use their preferred languages and development environments, while supporting a single infrastructure for both development and deployment. For example, RStudio Workbench (recently renamed from RStudio Server Pro) allows data science teams to use the RStudio IDE, Jupyter or VS Code on the same infrastructure, so data scientists can use their IDE of choice without putting an additional burden on IT.

  • Optimize your team’s impact, not the language they use: Data science teams are most effective when they are sharing work with their fellow team members and with their key stakeholders, as was discussed in this recent panel webinar with leaders of data science teams. By supporting both languages, teams have access to more tools for distributing work and making an impact. Frameworks like Shiny, Dash, Streamlit, plumber, Flask, and R Markdown allow data scientists to focus on communication regardless of the language they use.

Serious Data Science

Figure: RStudio Team provides a single infrastructure for data science teams to develop, share and manage their work, whether it is built in R or Python.

For More Information

To leave a comment for the author, please follow the link and comment on their blog: Python on RStudio .

Want to share your content on python-bloggers? click here.