Tips for communicating ROI of data science projects
The Missing Semester: soft skills for successful data science projects
Many aspects contributing to the success of a data science project fall beyond the conventional boundaries of typical data science tasks.
I recently asked on LinkedIn: “What’s something essential to doing your job as a data scientist that isn’t really taught?” and the responses spanned many different topics such as:
- Sales & Marketing
- Data storytelling
- Asking the right questions
- DevOps / version control
A recurring topic in Data Science Hangouts, meetups, and 1:1 chats is the ability to estimate, measure, and communicate the Return on Investment (ROI) for data science projects to business stakeholders. A specific question was: “Are you ever asked to estimate the ROI from an analytics project before the business is willing to invest in anything?”
I reached out to the community seeking your insights on this topic and we got together to discuss further. A special thank you to Derek Beaton at St. Michael’s Hospital and Joe Powers at Intuit who kicked off the group conversation last month by sharing their own experiences. You can view the recording below as well as a summary of the conversation.
We have paraphrased and distilled portions of the responses for brevity and narrative quality.
Tips for communicating the ROI for data science projects
- Engage stakeholders and identify gatekeepers
- Align evaluation metrics to stakeholder objectives
- Establish clear success criteria
- Anticipate where things can go awry
- Consistently communicate your central message
- Prioritize what’s most impactful
- Promote transparency while measuring project performance
- Ensure the benefit is unquestionably worth the time investment
✨ Derek Beaton, Director of Advanced Analytics at St. Michael’s Hospital, shared that at St. Michael’s Hospital, a project typically originates from physicians, HR professionals, or clinical managers who have identified a problem that data can solve. The initial collaboration involves filling out a detailed project intake form alongside the data team, focusing on critical elements like evaluation metrics together.
✨ Joe Powers, Principal Data Scientist at Intuit, advises that if you’re the one pushing a new project idea forward, identify your gatekeepers early. You may have lots of stakeholders, but you need to think about the gatekeepers who hold the authority to greenlight or impede your project’s rollout. Consider what they are going to really care about.
✨ Before committing to taking a project on at St. Michael’s Hospital, Derek and the data team engage in detailed discussions with collaborators to establish metrics, tracking methods, and expected levels of change. They need to know whether they’re trying to address mortality, readmission, length of stay, financial issues, or human effort. If projects don’t impact these key evaluation metrics, they will have to dig deeper into the types of things that might yield value overall or not take the project on.
✨ Joe recommends using a dashboard that your leaders are regularly referring to and this can be your anchor to identify their key metrics and establish a shared reality with your gatekeepers.
✨ Dan Boisvert, Senior Director Head of Data Stewardship at Biogen, stressed the importance of understanding why you are being asked about ROI. In some cases, you are asked about ROI because it’s a multi-billion dollar decision and the CFO needs to know but other times it’s because they don’t think it’s valuable. You could do all the work you want and you’re never going to convince them. Think about what the company needs and the metrics that the company cares about. How is what you’re doing fixing those metrics? Dan added that he tends to see passionate data scientists go off, see stuff, and say “I bet I could do this” but it doesn’t come from this direct line from strategy. You have to pull it back to the company strategy.
✨ Tareef Kawaf, President at Posit, shared that he asks himself, “What are the underlying assumptions behind anything that we’re doing? Whether it’s on the strategy level, what is the end goal here, and then how do I know that I’m on the right path?” He challenges everybody when they say, “I want to do X.” He asks them, “Why, and what the metrics are that you are hoping to drive?”
✨ Joe suggests defining a reasonable lift based on a literature review or qualitative testing. For example, if testing with 20 users shows a jump in invoice completion rate from 20% to 90%, that’s a valuable anchor point. Joe shared a data simulation task to explore reasonable lift and answer questions at Intuit like, “How many conversions did we yield because we detected a better experience earlier in the season and rolled it out to 100% of customers?” Anchor stakeholders ahead of time to start thinking about the range of outcomes you could experience. People often think very causally and don’t think in terms of variance. You have to condition your key stakeholders to expect that kind of variance.
✨ Derek shared that at St. Michael’s Hospital, success expectations also rely on insights from the broader hospital system and changes observed elsewhere. For example, pointing to literature that shows other hospitals that saw a 10% reduction in the length of stay.
✨ Tareef added when Derek shared the intake form, he was thinking about how long it takes before you know you’ve got a good test and when the impact kicks in. When building new features or new products, it could take 18-24 months before you know whether that was a good investment or not. It’s good discipline, in general, to say, “Here’s what I’m assuming is going to change or here’s what I’m assuming is going to get better. Here’s how I’m going to measure it.” Then use that as a way to learn and get better. You might want to be okay with saying, 5% to 10% of all the investments that I make are going to be these kinds of moonshots and I’m okay if they fall flat on their face. We do that when we say, we’re going to take a crack at X or Y or Z. We’ll make an investment where if it fails, it’s not going to kill the company but if it succeeds could generate a lot more revenue.
✨ Joe emphasizes that two people can mean completely different things by the same metric and explicitly redefining those metrics is often really valuable. Think ahead with stakeholders about potential ways where things can go awry and how you can get ahead of that.
In experimentation, science really cares about false positives because you don’t want false positive effects getting into that body of scientific knowledge. While it definitely matters in medicine, that can be a lot less important in industry. In tech, if you release some minor change that turns out to be like a false positive no one’s harmed by this. It’s actually the false negatives that are really dangerous. When working in experimentation, the metrics that matter are accuracy and speed. Accuracy is a term we can all throw around, but you could completely redefine that justifiably. However, you need to get your stakeholders on board.
✨ Joe reminded us that the person whose buy-in you need for a project often has a lot of people demanding their attention and a lot of fires to put out. How are you going to get your idea in front of them so that it sticks? If you give a successful 20-minute presentation, they’re probably only going to remember 1 or 2 things from that. The clarity of your communication is so important. Repeat, repeat, repeat, and just keep kind of driving home that central message that appeals to that stakeholder.
If they have an urgent problem and you’re offering the solution – you’re not going to have to offer it that many times if you’re doing it clearly. But be ready to deliver.
- Start sending word out in Slack channels
- Offer supportive peer training through virtual and live meetings to get your peers on board.
- Work through your manager to secure presentation time in recurring executive meetings. Highlight not only the performance of your solution, but the return on investment in a metric they care about to ensure your project goes into practice.
✨ Joe: You have multiple potential projects you could pursue. You and your stakeholders are really the experts in this space at your company. What is the range of outcomes that a project might manifest? That can be enormously useful in helping you decide if this is something you should explore more or go do now. Go into projects with a prior estimate of how successful it’s likely to be and the range of effect sizes you might reasonably expect. If everything you do has a low prior probability of success, you’re just generating noise. The Bayesian idea of an expected loss can also be really helpful in supporting rational decision-making in an uncertain future rather than always jumping to the most extreme worst-case scenario, which is not an optimal way to make decisions.
✨ Tareef: It’s really important to remember that the more things you take on simultaneously, the more your work in progress increases the more you’re context switching and things will actually take longer. It’s so easy to get sucked in and say, “Oh, this is not going to take long, I’ll do this. I’ll do this.” Before you know it, you’re doing 17 things at the same time, and that just slows you down significantly.
✨ Gerard Sentveld, Director of Data Analytics Operational Risk Management at Prudential Financial, shared that at the beginning of the year, they agree upon an amount of time that you have to do research and projects that don’t necessarily have an immediate impact. Throughout the year, if you come across a new, really cool thing that you want to work on – the question immediately bounces back to you. Is this worth stopping the other research that you’re doing right now to switch to the new project? Can you wrap it up? You only have so much time.
✨ Joe stressed the importance of having a tracking system in place to show how the project materialized in real life. If something underdelivers, it’s important to remember there’s variability in the world and you can’t control everything. Your ROI projections should reflect a range to begin with, which reminds stakeholders that we don’t know the future with certainty. Transparently communicate the process behind projections and share end-of-project insights. There are even successful projects that fall short because you’re unaware of key factors and you learn from that. Leaders who invest in you should recognize when you leaned into a difficult problem and learned along the way. Be clear about what was overlooked and the correction put in place to pursue it next time.
✨ Derek outlined 4 key approaches to monitor projects with potential under-delivery. In the healthcare industry, they have some deployments where you just absolutely do not miss a prediction. In establishing the project, if they suspect it would be something that underdelivers on potential impact, they don’t move forward. During model building, if they do not cross certain acceptable thresholds in particular things like precision and recall, they will not move forward.
Some earlier silent deployments have two types of monitoring – model and intervention – to ensure the project aligns with its intended impact. If they’re not seeing an impact or something is going in the wrong direction – that is under delivery the project stops. They also carefully watch the intended intervention metrics – what you’re going to change in the hospital. Derek also highlighted the importance of those instances where you didn’t underdeliver but are starting to lose some ground – and whether this is something you did or something bigger within a system – like in an exhausted healthcare system where readmission rates are just up in general.
✨Derek: We don’t estimate ahead of time. It’s really hard to make a guess, especially when a lot of stuff we do is pretty variable, especially in the early stage of projects. We’re a bit better about measuring later once we’re really in the weeds. This would come from our product managers. The way we’ve set up our sprint cycles, we have a pretty good sense of what we’re doing and how much effort we’re putting into the respective projects.
✨ Joe: With regards to estimating your time, ask a good survey question. For instance in surveying analysts, “How much time did you spend on these parts of your work in the last week? Is that a typical week?” It helps to anchor surveys to a specific time period to improve the accuracy of responses. Using calendar data for these kinds of projects is difficult because there are so many factors. The reward-to-investment ratio should be high enough that it’s just not a hangup. The benefit needs to be unquestionably worth the time investment. If it’s getting that marginally close, it’s probably not something to pursue.
Joe added that he used to build furniture before being in tech and estimating how long a new piece of furniture was going to take was hard. If he built something similar before, he had to triple the estimate. If he hadn’t built anything similar before, he had to multiply by 9. In grad school, he found that mechanical engineers called it the rule of Pi – that everything takes 3.14 times longer than you remember it taking. The ninefold things are all the stuff you didn’t think about. If you’re putting a model into production: now you need to train every relevant user, build enormous amounts of documentation, do the tracking, etc.
Additional ROI resources mentioned
- Analytics Power Hour Podcast – Estimating the Effort for Analytics Projects: https://analyticshour.io/2023/10/31/231-estimating-the-effort-for-analytics-projects/
- CDO Matters Podcast with Malcolm Hawker: https://open.spotify.com/show/2OJXB8v32CrKbbsv9Uq2vn
- How to Measure Anything by Douglas Hubbard: https://hubbardresearch.com/7-simple-principles-for-measuring-anything/
If you want to continue conversations like this, I invite you to also join us at the Data Science Hangout every Thursday from 12-1 ET. Every week we’re joined by a different data leader from the community to share their perspectives and experience. You can add it to your calendar as well.