Keeping Up with Your Data Science Options

Posted on April 12, 2017 by Bob Muenchen in Data science | 0 Comments

This article was first published on Python – r4stats.com , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

The field of data science is changing so rapidly that it’s quite hard to keep up with it all. When I first started tracking The Popularity of Data Science Software in 2010, I followed only ten packages, all of them classic statistics software. The term data science hadn’t caught on yet, data mining was still a new thing. One of my recent blog posts covered 53 packages, and choosing them from a list of around 100 was a tough decision!

To keep up with the rapidly changing field, you can read the information on a package’s web site, see what people are saying on blog aggregators such as R-Bloggers.com or StatsBlogs.com, and if it sounds good, download a copy and try it out. What’s much harder to do is figure out how they all relate to one another. A helpful source of information on that front is the book Disruptive Analtyics, by Thomas Dinsmore.

I was lucky enough to be the technical reviewer for the book, during which time I ended up reading it twice. I still refer to it regularly as it covers quite a lot of material. In a mere 262 pages, Dinsmore manages to describe each of the following packages, how they relate to one another, and how they fit into the big picture of data science:

Alluxio
Alpine Data
Alteryx
APAMA
Apex
Arrow
Caffe
Cloudera
Deeplearning4J
Drill
Flink
Giraph
Hadoop
HAWQ
Hive
IBM SPSS Modeler
Ignite
Impala
Kafka
KNIME Analytics Platform
Kylin
MADLib
Mahout
MapR
Microsoft R Aerver
Phoenix
Pig
Python
R
RapidMiner
Samza
SAS
SINGA
Skytree Server
Spark
Storm
Tajo
Tensorflow
Tez
Theano
Trafodion

As you can tell from the title, a major theme of the book is how open source software is disrupting the data science marketplace. Dinsmore’s blog, ML/DL: Machine Learning, Deep Learning, extends the book’s coverage as data science software changes from week to week.

I highly recommend both the book and the blog. Have fun keeping up with the field!

To leave a comment for the author, please follow the link and comment on their blog: Python – r4stats.com .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

Keeping Up with Your Data Science Options

Related