What is the “data analytics stack?”
Want to share your content on python-bloggers? click here.
A poor craftsman blames his tools. But if all you have is a hammer, everything looks like a nail.
It’s common for web developers or database adminstrators to refer to their “stack” of tools used to do the job, but I’ve never heard this moniker used for data analysts. So it got me thinking, what is the data analytics stack?
Data analysts make range of a wide variety of software, for a wide variety of tasks. When a solution comes up short, the focus ought not to be on “blaming” tools for their shortcomings, but on possessing alternatives and choosing a better one (or ones) for the given scenario.
That is, it’s better to think of these tools as “slices” of the same stack to be used concurrently, rather than as misfits to be entirely discarded.
To imagine what the analytics stack might look like, I used the below data products Venn diagram, placing the logos of popular data analytics tools in their respective segments.
After stepping back from my marked-up Venn diagram, four categories or “slices” of the stack appeared to me. Let’s get to them below; but first, a caveat.
Staying vendor agnostic
Some vendors have packaged their own “stack” of tools for data analysis; for example, Microsoft’s Power Platform or Google Data Studio. I am keeping my overview of the stack vendor-agnostic.
While you may learn that some slices fit better together, it’s better to start with the context of what category to tool to use, when, rather than what vendor. I will, however, provide a brief industry landscape of these products below, and suggestions for future learning.
Spreadsheets
Reports of the death of spreadsheets are greatly exaggerated. For their ease of use and flexibility, spreadsheets are an excellent choice for back-of-the-envelope calculations and prototyping.
However, spreadsheets do have their limitations. They can lack data integrity, storage and delivery functionalities. These limitations are often what cause pundits to give spreadsheets their last rites. But this misses the point of “the stack” entirely — those tasks aren’t the proper context for spreadsheets in the first place.
The major spreadsheet applications are Microsoft Excel and Google Sheets. I won’t tell you outright my preference, but you may find out if you follow me on social media for long.
Databases
Databases are a relatively ancient technology in the analytics space, but show no signs of slowing. They offer more reliable and extensible methods for data storage and integrity, but the actual analysis easily done directly inside databases is limited.
Structured query language, or SQL, is the language used to interact with relational database management systems. While many SQL platforms exist, the types of read-only operations necessary for most data analysts won’t change across them.
For data analysts new to SQL, I suggest SQLite or Microsoft Access as lightweight tools for learning SQL.
Business intelligence & dashboard platforms
This is a broad swathe of tools and it’s likely the most ambiguous slice of the stack, but here I mean enterprise tools that allow users to gather, model and display data.
Data warehousing tools like MicroStrategy and SAP BusinessObjects straddle the line here, since they are tools designed for self-service data gathering and analysis. But these often have limited visualization and iteractive report-building included.
That’s where tools like Power BI, Tableau and Looker come in. These tools allow users to build data models, dashboards and reports with minimal coding. Importantly, they make it easy to disseminate and update information across an organization.
However, these tools tend to be inflexible in the way they handle and visualize data. They can also be expensive, with single-user annual licenses running several hundred or even thousands of dollars.
Data programming languages
While many vendor tools are moving to a place where coding is not as essential to the data workflow, I still think it’s a good idea to learn programming. This helps sharpen understanding of how data processing works, and gives users fuller control of their workflow over using a graphical user interface (GUI).
For data analytics, two open-source programming language are good fits: R and Python. Each include a dizzying universe of free packages made to help with everything from social media automation to geospatial analysis. Learning these tools also opens the door to advanced analytics and data science.
However, this slice could have the steepest learning curve in the stack, and many analysts may struggle to see the benefit of learning to code, when they can do most of what they need easily enough from a GUI.
Not better or worse, just different
Seen in the light of a “stack,” it makes little sense to compare any of these slices, or claim one as inferior than the other. They are meant to be complementary.
Data analysts often wonder which tool they should focus on learning or becoming the expert in. I would suggest not becoming the expert in any single one, but in learning each slice of the stack well enough to contextualize and choose between them.
Entering the stack
Learning one data tool is daunting. Learning a whole “stack” of them can seem impossible. However, this cross-training can expedite growth, as connections are made across platforms in how to use data effectively.
If it all seems like too much, or you would like specifics in pivoting knowledge in one slice of the stack to another, check out my data education resource library by subscribing below.
What data tools do you use? How do you fit together? Other thoughts on the idea of an “analytics stack?” Let’s discuss in the comments.
Want to share your content on python-bloggers? click here.