Using Cassandra Through R

Posted on December 11, 2018 by Nagdev in Data science | 0 Comments

This article was first published on python – Hi! I am Nagdev , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In the last couple of years, there has been a lot of buzz around open source community. Almost every day, there are a lot of tools being open sourced. With a ton of open source tools in the market, don’t expect to have drivers built for every platform. I am a big fan of open source and the main reason is the huge community behind it.

I came across Cassandra, a No-SQL database a while ago and was very impressed. Since it was open source, I did not wait a moment to get my hands into it. Being primarily an R-user, I was happy to see R-Package to connect to Cassandra. That’s where problems began. For some reason, I could not connect to the database. After hours and hours of research on stack overflow, I ended up eventually connecting to it. Next problem was, data I queried was in a very weird format. Guess what, I turned to stack overflow. After a few hours, I gave up on it and didn’t bother for a few weeks.

One day, it hit me. Let me give it a try in Python, my second favorite language and it did the job I wanted. So, now the question was how do I replicate this in R. The answer was simple. Just write Python code in R-script voila! It solved my problem for now and hopefully someone or I can come up with a solution to rewrite the package for Cassandra.


#Supress Warnings
options(warn=-1)

#load reticulate library to use python Scripts
library(reticulate, quietly=T)

#call the table in cassandra using Python function
py = py_run_string('import requests;
from cassandra.cluster import Cluster;
from datetime import datetime;
import pandas as pd;

cluster = Cluster(["192.168.1.1","192.168.1.2","192.168.1.3"]);
session = cluster.connect("test");

query="select * from sample_table; ";
#df=pd.DataFrame(list(session.execute(query)));
df=pd.DataFrame(list(session.execute(query)));
print(df);
cluster.shutdown();')

#exit

#move the pandas dataframe to R-dataframe
data = py$df

So, what the above code does is, you will be running a python script to access Cassandra using reticulate package, get the results and insert them into a pandas data frame. Next, move pandas data frame to R data frame.

More tutorials on the Reticulate package is available here.

Hope this helps.

References:

[1] https://blog.rstudio.com/tags/reticulate

[2] http://www.rforge.net/RCassandra

[3] https://cran.r-project.org/web/packages/reticulate/index.html