Focus on relationships first
On building procedures and standards for your data science team:
Focus on relationships first.
Inevitably, what you define as the best practice in the first case, you’re not going to be doing that best practice in 6 months.
Not always – there are certain things that will stay, and teaching your team to use Git branches and merge into main and those things are going to be pretty stable.
But there are other more thorny questions that are really hard to build best practices around.
For example, in our team, we support a lot of different organizations with different data analyses. Do we put everything into one big GitHub repo? Do we make a repo for every organization for every project?
It’s always in development and it’s constantly iterating, so being comfortable with that is a key part of developing best practices, because it’s never going to be just like a set it and forget it.
It’s going to be constantly in building relationships and those best practices together.
We run an R users office hours every two weeks and I often don’t have an agenda or anything. It’s a great way to just first and foremost build community. But that’s also where people come to say, “I’m really struggling with this; how do you approach this?”
And now we’re starting to build those best practices as we build out the whole Posit onboarding infrastructure. Our hope is that we have internal packages. I think building that into vignettes, like Confluence documents (I did see that Quarto is supporting Confluence documentation, which is awesome)
Be comfortable with iterating and make sure that you’re always communicating with each other and building strong relationships.
On building a business case for data science tools:
We did a survey of people in terms of how much time are people spending building a single slide deck?
Where are all the places that you have to go? On average, you spend, say, 10 hours building this slide deck.
Then hey, “if all of these data points were populated automatically for you, how much time would be remaining?”
We used that and factored in people’s salaries and all of these things, and then came up with a rough estimate of the time-cost savings that building out this app would have.
The app I’ve built now, I don’t think anyone would have been able to see that if I just started with that.
Building MVP– minimally viable products – like little pieces of the pie is important. I started off doing parameterized reporting with R Markdown, and people were like, “That’s amazing!”
But the problem is that people aren’t going to have the kind of flexibility on the end user side. So I’m like, “this is great – this is how we actually populate slide decks with data insights, but if you want the flexibility to change the branding, alter the wording, all of this stuff like, this is our next product.”
Then you just kind of work your way up – where now I have a production-grade Shiny app that does all of this for you.
Build in little chunks that get you there to your end result.
Featured in this episode
Natalie O'Shea is an analytics consultant at BetterUp, where she leverages R and Python to develop data products that drive organizational transformation by helping individuals live their lives with greater clarity, purpose, and passion. A data scientist by way of anthropology, she is passionate about developing human-centered data products that deliver tangible insights that inform decision-making and help drive more equitable outcomes. She is also part of the leadership team for Data Science by Design (DSxD), a creative community of data scientists dedicated to developing a more open, ethical, and inclusive future for the field.