Modeling at posit::conf(2023): Talks and Workshops

The 2023 Posit conference contained a lot of great content for people interested in modeling and analyzing data. This post provides a digest view of the modeling talks and workshops.

Talks

Open Source Property Assessment: Tidymodels to Allocate $16B in Property Taxes

For me, this was the standout modeling talk of the conference.

Cook County’s Nicole Jardine and Dan Snow describe the practical, technical, and political aspects of replacing an especially sub-optimal set of closed-source models related to a hot-button topic: TAXES.

They used R and tidymodels to replace the problematic modeling project and created a completely open portal for data, code, and predictors using a lightgbm model.

Warning: You may never look at the Ames data the same way again.

How Data Scientists Broke A/B Testing (And How We Can Fix It)

Carl Vogel discusses A/B testing methodology from a practical point-of-view and explains the conundrum of:

Why do we conduct an underpowered test and launch with an insignificant result?

It’s a good discussion about how analysts view the A/B testing problem versus what the customers are focused on.

tidymodels: Adventures in Rewriting a Modeling Pipeline

Ryan Timpe from LEGO talks about how they use tidymodels to make predictions for their customers.

Ryan describes the surprise benefits they discovered: they could spend the time they saved using tidymodels on the data science aspects of their work and more time with their stakeholders.

Reliable Maintenance of Machine Learning Models

Posit’s Julia Silge discusses model maintenance in terms of both software and statistics. She discussed data and concept drift, which are important concepts for monitoring performance. Julia also shows tooling from the vetiver package for monitoring dashboard templates.

Conformal Inference with Tidymodels

Yours truly introduces the field of “conformal inference,” which is a fancy term for methods that can compute prediction intervals. The talk shows three different techniques and how to use them in tidymodels for regression models.

Using R with Databricks Connect

Posit’s Edgar Ruiz describes Spark Connect and Databricks Connect and how you can use these tools in R. These tools will enable users to use pyspark.ml easily.

Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow

Chelsea Parlett-Pelleriti is a fantastic statistician and presenter. She shows how to use Shiny for Python to: demonstrate important modeling techniques for teaching, visualizing classification boundaries, model fairness, model deployment, and other topics.

A hacker’s guide to open source LLMs

fast.ai’s Jeremy Howard talks about large language models at both high- and low levels. GPT-4 is the focus and discusses fine-tuning, tokens, and other aspects of these models.

Using R to develop production modeling workflows at Mayo Clinic

Brendan Broderick, a data science analyst at the Mayo Clinic, discusses aspects of developing healthcare delivery models to production. He uses a respiratory care unit application to illustrate the important facets of building a successful predictive model. They use Git, targets, renv, plumber, and Posit Connect to manage the scripting workflows and overall process.

Large Language Models in RStudio

Dow’s James Wade describes his large-language model journey. He illustrates how to use these tools inside of the RStudio IDE via gptstudio as well as with the experimental gpttools package.

What’s New in the Torch Ecosystem

Posit’s Daniel Falbel’s lightning talk describes changes to R’s torch implementations: the luz package for higher-level interfaces, two torch-based modeling packages (TabNet and brulee), hfub, and tok. They also mention the R torch book!