Avoid Irrelevancy and Fire Drills in Data Science Teams
Balancing the twin threats of data science development
Data science leaders naturally want to maximize the value their teams deliver to their organization, and that often means helping them navigate between two possible extremes. On the one hand, a team can easily become an expensive R&D department, detached from actual business decisions, slowly chipping away only to end up answering stale questions. On the other hand, teams can be overwhelmed with requests, spending all of their time on labor intensive, manual fire-drills, always creating one more “Just in Time” Powerpoint slide.
How do you avoid these threats, of either irrelevancy or constant fire drills? As we touched on in a recent blog post, Getting to the Right Question, it turns out the answer is pretty straightforward: use iterative, code-based development to share your content early and often, to help overcome the communications gap with your stakeholders.
Data science ecosystems can be complex and full of jargon, so before we dive into specifics let’s consider a similar balancing act. Imagine you are forming a band that wants to share new music with the world. To do so, it is critical to get music out to your fans quickly, to iterate on ideas rapidly. You don’t want to get bogged down in the details of a recording studio on day 1. At the same time, you want to be able to capture and repeat what works - perhaps as sheet music, perhaps as a video, or even as a simple recording.
Benefits of sharing for data science teams
Luckily, tools exist to ensure data science teams can create artifacts that share these three characteristics. At RStudio, we’ve built RStudio Team with all 3 of these goals in mind.
Great data science teams talk about the happy result of this approach. For examples:
“RStudio Connect is critical, the way you can deploy flexdashboards, R Markdown… I use web apps as a way to convey a model in a very succinct fashion… because I don’t know what the user will do, I can create an app where the user’s interactions with the model can imply it, I don’t have to come up with all the finite outcomes ahead of time” - Moody Hadi at S&P
“One of the key focuses for us was the method of delivery … actually taking your insights and getting business impact. How are non analytic people digesting your work.” - Aymen Waqar at Astellas (check out our last blog post, Getting to the Right Question, to see Aymen discussing the analytics communication gap)
It’s not just about production
We often see data science teams make a common mistake that prevents them from achieving this delicate balancing act. A tempting trap is to focus exclusively on complex tooling oriented towards putting models in production. Because data science teams are trying to strike a balance between repeatability, robustness, and speed, and because they are working with code, they often turn to their software engineering counterparts for guidance on adopting “agile” processes. Unfortunately, many teams end up focusing on the wrong parts of the agile playbook. Instead of copying the concept - rapid iterations towards a useful goal - teams get caught up in the technologies, introducing complex workflows instead of focusing on results. This mistake leads to a different version of the expensive R&D department - the band stuck in a recording studio with the wrong song.
Eduardo Arina de la Rubio, head of a large data team at Facebook, lays out an important reminder in his recent talk at rstudio::conf 2020. Data science teams are not machine learning engineers. While growth of the two are related, ML models will ultimately become commoditized, mastered by engineers and available in off-the-shelf offerings. Data scientists, on the other hand, have a broader mandate: to enable critical business decisions. Often, in the teams we work with at RStudio, many projects are resolved and decisions made based on the rapid iteration of an app or a notebook. Only on occasion does the result need to be codified into a model at scale - and usually engineers are involved at that stage.
To wrap up, at RStudio we get to interact with hundreds of data science teams of all shapes and sizes from all types of industries. The best of these teams have all mastered the same balancing act: they use powerful tools to help them share results quickly, earning them a fanbase among their business stakeholders and helping their companies make great decisions.
We developed RStudio Team with this balancing act in mind, and to make it easy for data science teams to create, reproduce and share their work. To learn more, please visit the RStudio Team page.