Keeping Up with Your Data Science Options

This article was first published on Python – r4stats.com , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

The field of data science is changing so rapidly that it’s quite hard to keep up with it all. When I first started tracking The Popularity of Data Science Software in 2010, I followed only ten packages, all of them classic statistics software. The term data science hadn’t caught on yet, data mining was still a new thing. One of my recent blog posts covered 53 packages, and choosing them from a list of around 100 was a tough decision!

To keep up with the rapidly changing field, you can read the information on a package’s web site, see what people are saying on blog aggregators such as R-Bloggers.com or StatsBlogs.com, and if it sounds good, download a copy and try it out. What’s much harder to do is figure out how they all relate to one another. A helpful source of information on that front is the book Disruptive Analtyics, by Thomas Dinsmore.

I was lucky enough to be the technical reviewer for the book, during which time I ended up reading it twice. I still refer to it regularly as it covers quite a lot of material. In a mere 262 pages, Dinsmore manages to describe each of the following packages, how they relate to one another, and how they fit into the big picture of data science:

  • Alluxio
  • Alpine Data
  • Alteryx
  • APAMA
  • Apex
  • Arrow
  • Caffe
  • Cloudera
  • Deeplearning4J
  • Drill
  • Flink
  • Giraph
  • Hadoop
  • HAWQ
  • Hive
  • IBM SPSS Modeler
  • Ignite
  • Impala
  • Kafka
  • KNIME Analytics Platform
  • Kylin
  • MADLib
  • Mahout
  • MapR
  • Microsoft R Aerver
  • Phoenix
  • Pig
  • Python
  • R
  • RapidMiner
  • Samza
  • SAS
  • SINGA
  • Skytree Server
  • Spark
  • Storm
  • Tajo
  • Tensorflow
  • Tez
  • Theano
  • Trafodion

As you can tell from the title, a major theme of the book is how open source software is disrupting the data science marketplace. Dinsmore’s blog, ML/DL: Machine Learning, Deep Learning, extends the book’s coverage as data science software changes from week to week.

I highly recommend both the book and the blog. Have fun keeping up with the field!

To leave a comment for the author, please follow the link and comment on their blog: Python – r4stats.com .

Want to share your content on python-bloggers? click here.