Modeling at posit::conf(2023): Talks and Workshops

2024-01-03
Text: Modeling at posit::conf(2023) talks and workshops. An array of modeling-related hex stickers on a background of a photo of conference attendees.

The 2023 Posit conference contained a lot of great content for people interested in modeling and analyzing data. This post provides a digest view of the modeling talks and workshops.

 

Talks

 

Open Source Property Assessment: Tidymodels to Allocate $16B in Property Taxes

 

For me, this was the standout modeling talk of the conference.

Cook County’s Nicole Jardine and Dan Snow describe the practical, technical, and political aspects of replacing an especially sub-optimal set of closed-source models related to a hot-button topic: TAXES.

They used R and tidymodels to replace the problematic modeling project and created a completely open portal for data, code, and predictors using a lightgbm model.

Warning: You may never look at the Ames data the same way again.

 

How Data Scientists Broke A/B Testing (And How We Can Fix It)

 

Carl Vogel discusses A/B testing methodology from a practical point-of-view and explains the conundrum of:

“Why do we conduct an underpowered test and launch with an insignificant result?”

It’s a good discussion about how analysts view the A/B testing problem versus what the customers are focused on.

 

tidymodels: Adventures in Rewriting a Modeling Pipeline

 

Ryan Timpe from LEGO talks about how they use tidymodels to make predictions for their customers.

Ryan describes the surprise benefits they discovered: they could spend the time they saved using tidymodels on the data science aspects of their work and more time with their stakeholders.

 

Reliable Maintenance of Machine Learning Models

 

Posit’s Julia Silge discusses model maintenance in terms of both software and statistics. She discussed data and concept drift, which are important concepts for monitoring performance. Julia also shows tooling from the vetiver package for monitoring dashboard templates.

 

Conformal Inference with Tidymodels

 

Yours truly introduces the field of “conformal inference,” which is a fancy term for methods that can compute prediction intervals. The talk shows three different techniques and how to use them in tidymodels for regression models.

 

Using R with Databricks Connect

 

Posit’s Edgar Ruiz describes Spark Connect and Databricks Connect and how you can use these tools in R. These tools will enable users to use pyspark.ml easily.

 

Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow

 

Chelsea Parlett-Pelleriti is a fantastic statistician and presenter. She shows how to use Shiny for Python to: demonstrate important modeling techniques for teaching, visualizing classification boundaries, model fairness, model deployment, and other topics.

 

A hacker’s guide to open source LLMs

 

fast.ai’s Jeremy Howard talks about large language models at both high- and low levels. GPT-4 is the focus and discusses fine-tuning, tokens, and other aspects of these models.

 

Using R to develop production modeling workflows at Mayo Clinic

 

Brendan Broderick, a data science analyst at the Mayo Clinic, discusses aspects of developing healthcare delivery models to production. He uses a respiratory care unit application to illustrate the important facets of building a successful predictive model. They use Git, targets, renv, plumber, and Posit Connect to manage the scripting workflows and overall process.

 

Large Language Models in RStudio

 

Dow’s James Wade describes his large-language model journey. He illustrates how to use these tools inside of the RStudio IDE via gptstudio as well as with the experimental gpttools package.

 

What’s New in the Torch Ecosystem

 

Posit’s Daniel Falbel’s lightning talk describes changes to R’s torch implementations: the luz package for higher-level interfaces, two torch-based modeling packages (TabNet and brulee), hfub, and tok. They also mention the R torch book!

 

Workshops

 

The pre-conference workshops are excellent resources to learn and develop skills. This year was no exception.

Unfortunately, Posit does not record the workshops, but most of the instructors made their materials public.

 

Causal Inference with R

 

Malcolm Barrett and Travis Gerke introduce the intricacies of producing valid inferences and making counterfactual, causal estimates.

If you are like me, someone with no prior formal training on this topic, you will love these materials.

Source files

 

Tidy Time Series and Forecasting in R

 

We were very lucky to have Rob Hyndman conduct his two-day workshop on time series forecasting. He has literally written the book (several times!) on times series analysis.

Source files

 

tidymodels workshops

 

The tidymodels group conducted two one-day workshops for introductory and advanced topics. All of the slides and sources, for every time that we do them, can be found at workshops.tidymodels.org.

 

Deploy and Maintain Models with vetiver

 

Julia Silge shows how to use the R and Python packages for vetiver for MLOps. She describes how to version, deploy, and monitor the models you have trained.

Source files

 

Machine Learning and Deep Learning with Python

 

Sebastian Raschka’s workshop teaches how to create ML models using NumPy, Pandas, Matplotlib, scikit-learn, and PyTorch.

Source files