Modeling at posit::conf(2023): Talks and Workshops

The 2023 Posit conference had fantastic content for those interested in data modeling and analysis.
Text: Modeling at posit::conf(2023) talks and workshops. An array of modeling-related hex stickers on a background of a photo of conference attendees.

The 2023 Posit conference contained a lot of great content for people interested in modeling and analyzing data. This post provides a digest view of the modeling talks and workshops.




Open Source Property Assessment: Tidymodels to Allocate $16B in Property Taxes


For me, this was the standout modeling talk of the conference.

Cook County’s Nicole Jardine and Dan Snow describe the practical, technical, and political aspects of replacing an especially sub-optimal set of closed-source models related to a hot-button topic: TAXES.

They used R and tidymodels to replace the problematic modeling project and created a completely open portal for data, code, and predictors using a lightgbm model.

Warning: You may never look at the Ames data the same way again.


How Data Scientists Broke A/B Testing (And How We Can Fix It)


Carl Vogel discusses A/B testing methodology from a practical point-of-view and explains the conundrum of:

“Why do we conduct an underpowered test and launch with an insignificant result?”

It’s a good discussion about how analysts view the A/B testing problem versus what the customers are focused on.


tidymodels: Adventures in Rewriting a Modeling Pipeline


Ryan Timpe from LEGO talks about how they use tidymodels to make predictions for their customers.

Ryan describes the surprise benefits they discovered: they could spend the time they saved using tidymodels on the data science aspects of their work and more time with their stakeholders.


Reliable Maintenance of Machine Learning Models


Posit’s Julia Silge discusses model maintenance in terms of both software and statistics. She discussed data and concept drift, which are important concepts for monitoring performance. Julia also shows tooling from the vetiver package for monitoring dashboard templates.


Conformal Inference with Tidymodels


Yours truly introduces the field of “conformal inference,” which is a fancy term for methods that can compute prediction intervals. The talk shows three different techniques and how to use them in tidymodels for regression models.


Using R with Databricks Connect


Posit’s Edgar Ruiz describes Spark Connect and Databricks Connect and how you can use these tools in R. These tools will enable users to use easily.


Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow


Chelsea Parlett-Pelleriti is a fantastic statistician and presenter. She shows how to use Shiny for Python to: demonstrate important modeling techniques for teaching, visualizing classification boundaries, model fairness, model deployment, and other topics.


A hacker’s guide to open source LLMs’s Jeremy Howard talks about large language models at both high- and low levels. GPT-4 is the focus and discusses fine-tuning, tokens, and other aspects of these models.


Using R to develop production modeling workflows at Mayo Clinic


Brendan Broderick, a data science analyst at the Mayo Clinic, discusses aspects of developing healthcare delivery models to production. He uses a respiratory care unit application to illustrate the important facets of building a successful predictive model. They use Git, targets, renv, plumber, and Posit Connect to manage the scripting workflows and overall process.


Large Language Models in RStudio


Dow’s James Wade describes his large-language model journey. He illustrates how to use these tools inside of the RStudio IDE via gptstudio as well as with the experimental gpttools package.


What’s New in the Torch Ecosystem


Posit’s Daniel Falbel’s lightning talk describes changes to R’s torch implementations: the luz package for higher-level interfaces, two torch-based modeling packages (TabNet and brulee), hfub, and tok. They also mention the R torch book!




The pre-conference workshops are excellent resources to learn and develop skills. This year was no exception.

Unfortunately, Posit does not record the workshops, but most of the instructors made their materials public.


Causal Inference with R


Malcolm Barrett and Travis Gerke introduce the intricacies of producing valid inferences and making counterfactual, causal estimates.

If you are like me, someone with no prior formal training on this topic, you will love these materials.

Source files


Tidy Time Series and Forecasting in R


We were very lucky to have Rob Hyndman conduct his two-day workshop on time series forecasting. He has literally written the book (several times!) on times series analysis.

Source files


tidymodels workshops


The tidymodels group conducted two one-day workshops for introductory and advanced topics. All of the slides and sources, for every time that we do them, can be found at


Deploy and Maintain Models with vetiver


Julia Silge shows how to use the R and Python packages for vetiver for MLOps. She describes how to version, deploy, and monitor the models you have trained.

Source files


Machine Learning and Deep Learning with Python


Sebastian Raschka’s workshop teaches how to create ML models using NumPy, Pandas, Matplotlib, scikit-learn, and PyTorch.

Source files