Grow your data science skills at posit::conf(2024)

August 12th-14th in Seattle

13 Apr 2023

Explicit design at the start of a project

Emily Riederer

Senior Manager - Customer Management Data Science & Analytics at Capital One
We were recently joined by Emily Riederer, Senior Manager - Customer Management Data Science & Analytics at Capital One. We discussed how a strong foundation in high-quality data infrastructure and reproducible tools sets the stage for innovation in modeling, causal inference, and analytics, and so much more.
Watch this hangout

Episode notes

Diving into a question asked at (42:38): What is your thought process for solving a problem that you don’t know how to solve immediately?


One thing that I think is a really undervalued part of that process is thinking about how you will know a good solution when you find one? Also, how would you know if there was a good solution staring you in the face and you already had it?


I think the more unstructured and complicated a problem can be, it can almost be a little deceptive of what’s good– which can have one or two bad outcomes.


You find a good solution, but you don’t realize it’s good so you keep going

You spend a lot of time chasing after an outcome, and only then do you realize, I solved the problem I was trying to solve but it wasn’t the problem I wanted to solve.


Something I’ve really been experimenting with in my own work is having a lot more of an explicit design stage at the beginning of a project and thinking, how can you do a pilot?


If I’m trying to predict some target, can I take those two values of that target and plug them into a downstream problem I actually thought that I was going to solve, and make sure that’s actually what I want to solve?


Almost like frontloading model evaluation with even a fake solution is the first step versus last step.


Then, I’ll check on one other point.


I think the other aspect of that – going back to that level of abstraction – is figuring out how to take the context out of my problem to make it something more Googleable.

So I mean thinking, not being like, “oh, this experiment, the random seeds were wrong, so I don’t have a control population – what do I do?”


Backing that into more of a general question – “how do you sample a synthetic control through observational data?” which is something you can Google and then find a ton of resources about.


I think pushing myself on what I want, and then finding the right framing at which to ask for help.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great data science. By subscribing, you'll get alerted whenever we publish something new.