Modeling at posit::conf(2023): Talks and Workshops
For me, this was the standout modeling talk of the conference.
Cook County’s Nicole Jardine and Dan Snow describe the practical, technical, and political aspects of replacing an especially sub-optimal set of closed-source models related to a hot-button topic: TAXES.
Warning: You may never look at the Ames data the same way again.
Carl Vogel discusses A/B testing methodology from a practical point-of-view and explains the conundrum of:
“Why do we conduct an underpowered test and launch with an insignificant result?”
It’s a good discussion about how analysts view the A/B testing problem versus what the customers are focused on.
Ryan Timpe from LEGO talks about how they use tidymodels to make predictions for their customers.
Ryan describes the surprise benefits they discovered: they could spend the time they saved using tidymodels on the data science aspects of their work and more time with their stakeholders.
Posit’s Julia Silge discusses model maintenance in terms of both software and statistics. She discussed data and concept drift, which are important concepts for monitoring performance. Julia also shows tooling from the vetiver package for monitoring dashboard templates.
Yours truly introduces the field of “conformal inference,” which is a fancy term for methods that can compute prediction intervals. The talk shows three different techniques and how to use them in tidymodels for regression models.
Chelsea Parlett-Pelleriti is a fantastic statistician and presenter. She shows how to use Shiny for Python to: demonstrate important modeling techniques for teaching, visualizing classification boundaries, model fairness, model deployment, and other topics.
fast.ai’s Jeremy Howard talks about large language models at both high- and low levels. GPT-4 is the focus and discusses fine-tuning, tokens, and other aspects of these models.
Brendan Broderick, a data science analyst at the Mayo Clinic, discusses aspects of developing healthcare delivery models to production. He uses a respiratory care unit application to illustrate the important facets of building a successful predictive model. They use Git, targets, renv, plumber, and Posit Connect to manage the scripting workflows and overall process.
Posit’s Daniel Falbel’s lightning talk describes changes to R’s torch implementations: the luz package for higher-level interfaces, two torch-based modeling packages (TabNet and brulee), hfub, and tok. They also mention the R torch book!
The pre-conference workshops are excellent resources to learn and develop skills. This year was no exception.
Unfortunately, Posit does not record the workshops, but most of the instructors made their materials public.
Malcolm Barrett and Travis Gerke introduce the intricacies of producing valid inferences and making counterfactual, causal estimates.
If you are like me, someone with no prior formal training on this topic, you will love these materials.
We were very lucky to have Rob Hyndman conduct his two-day workshop on time series forecasting. He has literally written the book (several times!) on times series analysis.
The tidymodels group conducted two one-day workshops for introductory and advanced topics. All of the slides and sources, for every time that we do them, can be found at
Julia Silge shows how to use the R and Python packages for vetiver for MLOps. She describes how to version, deploy, and monitor the models you have trained.