Explore bespoke analysis & reporting - Data Science & ML for Product Managers
I have a love of data… intractable problems and mysteries within data sets. I will admit to occasionally taking a deep dive into my raw analytics data sets, and coming out sweating (virtually) from twisting and turning it every which way to eek out insights - looking at problems from angles I hadn’t considered when I started. I won’t pretend to be an expert on the proliferation of tools out there for Big Data analysis. Really, the specific tools aren’t what matters anyway (except when someone decides that the tool matters more than the skill). Still, let’s take a meandering look at a few tools to help your data driven decision making & analysis. Specifically, we’ll talk through why its important to get your hands on data and gain the ability to tease out insights on your own, as well as a few representative tools with which I happen to have a fair amount of experience (Domo, Tableau, Jupyter) - and finally we’ll touch on a few points about ML.
Why dive into Data Science? Don’t we have data scientists for this?
More broadly, why should you get your hands dirty with raw data and gain the skills to generate your own reports? One core piece of my personal philosophy is that in order to know (more often) the best tool for the task, you need to understand the tool well enough to know it’s limitations. A great way to do that is to build it, struggle with it personally, break it, and put it back together again. Relying on premade reports and metrics is often fine - as long as you’re confident they’re the best for the job at hand. However, living in your existing analytics engines, relying on insights they pre-generate, or that have been generated for you, can be extremely limiting.
You don’t need to be an expert per say - but gaining confidence that you can find the insights hidden in the raw data without your hand being held is incredibly freeing and empowering. You’ll also gain knowledge and insight into when some of the most common tools, reports, and KPIs we use are ill suited (or completely broken) for addressing your particular needs. Bonus - you’ll get a better sense of when data and insights shared with you don’t make sense and need a further look. This helps you sift signal from noise - and generate more effective insights faster.
One other reason to flex this muscle: You can develop your own custom reports for questions you ask on the spot - without needing to queue up for the resources of the heavy experts - before you push a new report into general use.
Domo
I first heard about Domo from a sales call or two, perhaps 12 or so years ago. I’d done the demo and frankly, the tool wasn’t right for my team at the time. Mostly because it is tremendously expensive and we were a boutique incubator / agency. If you haven’t heard of it, or used it, in a nutshell it’s billed as a Business Intelligence Tool, which is fancy speak for being a data massaging and visualization platform. You connect data sources (from a host of pre-existing connectors, or building your own), and can create transformation pipelines to merge, summarize, mutate, and ultimately, report on your data. It has robust dashboarding and charting tools. Personally, I’ve made heavy use of the data pipelines to create engagement reports, various retention reports, and complex summaries across disparate data sources. The visualization of the data transformations makes it fairly easy (and inspectable) to understand exactly what you’re doing to your data and in what order.
Generally speaking, Domo isn’t well suited to real time work. It’s better to come to it when you have a fairly solid idea of what you want to do with your data. It can also be tempting to create rather absurdly large lists of charts and reports in Domo full of unnecessary and distracting information. This I know from experience…
Tableau
Now, here’s a more immediate and personal alternative to Domo (though of course, it too has enterprise cloud installations). Tableau is another Business Intelligence platform - though arguably targeted at different user or use case. My experience with Tableau is mainly with desktop instances. It’s lightyears better than trying to work with data in excel when you need a visualization heavy approach. If you’re quite comfy with formulas in excel and your math is strong - you’ll be more at home here. Charting is not as friendly as Domo - though it’s arguably more flexible. This is a tool with a somewhat impenetrable interface and a fairly steep learning curve.
Unlike Domo, there’s no true data pipeline approach here - and transformations are often stored out to a DB, or performed in real time - which can make adding lots of complex processing a slog. On the other hand, it’s much faster at prototyping out reports and visualizing insights from locally stored data sets than Domo. It’s also great for one-off reports and analysis. Most recently, I used this to understand the effect on earned growth for a value added service on top of an existing subscription product. The underlying data sets hadn’t been set up to work well together, and Tableau helped me connect them. With it, I created heatmap visualizations for how retention and referral interacted over time between distinct products and discovered just how well our products were doing, among other insights.
If you’re in an exploratory / experimental mode with your data, Tableau is a powerful tool (in need of a usability rethink).
Jupyter Notebooks
..and by extension Python. You’ll require some programming skills here - and if you absolutely refuse to learn it, don’t come here. But if you’re willing or already have the skills, this is your most powerful insight engine. Jupyter is unique in that you’re half writing a document explaining what you’re doing, why, and the insights in a human readable format even as you share the code and the charts needed to explain to anyone the store you’re telling with your data. You don’t need to try to tell the entire story with the chart - if you can tell the story around the chart as you would in a report. Notebooks are simply more readable than dashboards.
Admittedly, it’s not so easy to keep all the charts on one page as it is in a BI tool. As this is a relatively linear format, it’s well suited to generating regular reports - and to doing powerful analysis. And, with Python, you gain access to a host of Machine Learning libraries and tools. Suddenly the doors to forecasting, recommendation, classification, computer vision and natural language processing are at your fingertips. Frankly, I’m fascinated by this area and spend a fair amount of free time playing and exploring different ML problems across each of these domains.
Learning w/ Kaggle
If you’re interested in getting your hands a little dirty with Data Science & ML using Jupyter, I recommend starting with the learning materials at Kaggle.com. Kaggle is a sort of data science and machine learning platform with free, hosted Jupyter instances connected to GPUs and TPUs. They host competitions for various data science problems, and the archive there is a wealth of challenges to play with and learn from. It’s hard to understate how important AI/ML is becoming in our profession and this is a way for you to build your muscles in that area. They offer a modest selection of micro courses (which I belatedly took recently) that give you an introduction to various concepts in Python and ML.
If you’re interested in Neural Networks, I recommend checking out Jeremy Howard on YouTube (his videos turned me on to Kaggle 5 years ago). You can learn about an alternative to Tensorflow & Keras in Fast.ai (built on top of PyTorch IIRC) - or heck, learn what Tensorflow & Keras are too. There are various iterations of his videos, some of which the Fast.ai library has probably grown beyond. Jeremy makes Deep Learning fairly accessible and friendly to non-coders.
Please note that my recommendations here are completely unsolicited - they are purely based on my personal experience and enjoyment.
What are you going to do with this?
Perhaps it’s easiest to process with an example. I recently did some exploration of the Live Operations space. With Tableau, I was able to explore the dataset, gather ideas for what was important, and what wasn’t. With Python & Jupyter, I could find which factors mattered most using Linear Regressors that would take much longer implementing & processing in other places. With more data, I could likely plan and predict exactly what modes, promotions, and configurations would be most successful - and create a data pipeline for a feedback loop to optimize engagement. Now, this was for LiveOps on a content play… But this same work can be applied to any non-static application or product, to social networks, to games, and to content optimization.
There are nearly endless possibilities… and I encourage you to explore, get your hands dirty, and break out of the boxes of pre-existing analytics solutions. Learn how your data is processed, how your reports are built, and how you might change it for your specific needs quickly, flexibly, and powerfully.
Social Proof - Kaggle gives you certificates for their micro courses!
Should you need it, each Kaggle micro course gives you a printable / sharable certificate such as those I’ve listed below. They introduced me to some of their casual learning oriented competitions, which I find fun. One day I’ll beat 82% accuracy on Spaceship Titanic… one day.