Company, events, and community

Keynotes and Talks at posit::conf(2023)

Written by Posit Team
2023-06-06
Text: posit conf 2023 Talk Schedule Now Live. A cartoon group of people walking, skateboarding, eating pizza.

The full list of talks for posit::conf(2023) is now available!

Create your ideal conference schedule by selecting from our extensive selection of over 100 captivating keynotes and talks. Explore the complete lineup in the convenient table below, expertly crafted using gt. And, for the most up-to-date information and schedule, be sure to visit our event portal.

Excited? So are we. Visit pos.it/conf to learn more and register today.

Warning: package 'gt' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.3

Talks for posit::conf(2023)

 



Keynote 1 - Elaine McVey & David Meza - From Data Confusion to Data Intelligence
9/19/23 (9:00, 1h)

Keynote 1 - Elaine McVey & David Meza - From Data Confusion to Data Intelligence
9/19/23 (9:00, 1h)


From Data Confusion to Data Intelligence

Elaine McVey & David Meza, NASA
 

Abstract

Data science teams operate in a unique environment, much different than the IT or software development life cycle. Hope from executives for the impact of data science is extremely high! Understanding of how to make data science efforts successful is very low! This creates an interesting set of organizational challenges for data and analytics teams. These are particularly clear when data science is being introduced at new companies, but plays out at organizations of all sizes. So, how do we navigate this dynamic? We’ll share some strategies for success.



Keynote 2 - Jeremy Howard - A hackers guide to open source LLMs
9/19/23 (10:30, 1h)

Keynote 2 - Jeremy Howard - A hackers guide to open source LLMs
9/19/23 (10:30, 1h)


A hackers guide to open source LLMs

Jeremy Howard, fast.ai
 

Abstract

Coming soon



Building effective data science teams
9/19/23 (13:00, 1h 20m)

Building effective data science teams
9/19/23 (13:00, 1h 20m)

I. Developing a prototyping competency in a statistical science organization

Daniel Woodie, Eli Lilly & Company
 

Abstract

The introduction of new tools, methods, and processes can be a struggle within a statistical science organization. Being deliberate and investing in the creation of a prototyping competency can help in accelerating progress. Prototyping allows organizations to quickly experiment with new ideas, reduce the risk of failure, identify potential issues early, and iterate until the desired outcome is achieved. I will highlight the key areas we have focused on accelerating, our framework for developing this competency, how we use Shiny, and the lessons we've learned along the way. Developing a prototyping competency is crucial for statistical science organizations that wish to stay competitive and innovative in today's rapidly changing landscape.

II. How to gracefully expand your R team

Liz Roten, Metropolitan Council
 

Abstract

R users don't always come in teams. Often, you may be the only useR on the block. But, one miraculous day, you get to welcome more R folks on your team. Suddenly, the little R system you created to suit your needs, like a custom R package, styling, and file organization, isn't just for you. Want to suddenly overhaul that one package you wrote two years ago? It probably won't work when your colleagues try to update it. Your new teammates are data.table fans, but you prefer the tidyverse. Do you need to refactor? Are style choices, like indentation important when collaborating, or are you just being persnickety? In this talk, you will learn how to bring new teammates onboard and blend your respective styles without pulling your hair out.

III. Oops I’m a manager - On More Effective 1-on-1 Meetings

Andrew Holz, Posit
 

Abstract

As a team leader (accidental or not), it's easy to get caught up in the daily grind and overlook the importance of 1-on-1s. Bad idea. 1-on-1s are critical for building trust, providing feedback, and ensuring that everyone is on the same page. Keys to good 1-on-1s start with a small amount of prep and a running shared document of notes and takeaways. Another key is to rotate types of 1-on-1s. Possibilities include “heads down” on recent work, “heads up” looking further out and career focused sessions. After some tips on the right sort of questions and uncovering sneaky issues, I will also touch on effective feedback. I will share resources and hope to include Seussian visuals and a few poetic lines to help the key points stick.

IV. The Gonzalez Matrix for Data Projects

Patrick Tennant, Meadows Mental Health Policy Institute
 

Abstract

I’d like to a tell the good people at conf about how we deal with the fact that my boss has too many good ideas for data science projects. Specifically, we use an adapted version of the Eisenhower Matrix that lays out our projects according to the effort required and value they will produce. Given the functionally unlimited number of data science projects that a team could do, I think the audience could benefit from this example of how we keep our team focused on valuable work while also reducing the stress that comes from never ending list of projects.



Pharma
9/19/23 (13:00, 1h 20m)

Pharma
9/19/23 (13:00, 1h 20m)

I. Building a Community of Engaged R Users: Pain points, Tactics, Strategies, Successes

Natalia Andriychuk, Pfizer
 

Abstract

For decades pharmaceutical companies predominately used SAS for data analysis, but the times and landscape are changing, and we see more colleagues learning and using R. At Pfizer, we have over a thousand colleagues globally in an R community on MS Teams. How do we engage with them all, celebrate their successes with R, and share best practices? We established the Pfizer R Center of Excellence in 2022 to focus our efforts to coalesce a large growing community of colleagues, provide technical expertise, and best practice guidance. In my talk, I’m going to share techniques we used to build a supportive R community, tools to increase community engagement, and pain points/successes of defining and leading R strategy within an organization.

II. Open-source solutions to next-generation submissions, after 30 years of industry experience

Mike K Smith, Pfizer R&D UK Ltd
 

Abstract

The pharmaceutical industry is undergoing rapid change, driven by a desire from both industry and regulatory agencies to move to more interactive visualizations and web applications to review data and make decisions. These changes would have been unthinkable 30 years ago, when I started working at Pfizer. In this talk I’ll consider the drivers for these changes, how open-source tools can help achieve this and why collaboration across industry is vital to achieve this goal. I’ll contrast this with my experience of 30 years working in the pharma industry – when the R language had only just been released, when the internet was new and when submissions to agencies were printed out, loaded onto trucks and shipped to their doors.

III. The Need for Speed - AccelerateR R Adoption in GSK

Ben Arancibia, GSK
 

Abstract

How does a risk adverse Pharma Biostats organization with 600+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly then all at once. GSK started the transition of using R for its clinical trial data analysis in 2020 and now uses R for our regulatory reviewed outputs. The AccelerateR Team, an agile pod of R experts and data scientists, rotates through GSK Biostats study teams sitting side by side to answer questions and mentor during this transition. We will share our experience from AccelerateR and how other organizations can use our learnings to scale R from pilots to full enterprise adoption and contributing to open-source industry R packages.

IV. Succeed in the Life Sciences with R/Python and the Cloud

Colby T. Ford, Tuple, LLC / Amissa, Inc / University of North Carolina at Charlotte
 

Abstract

In the life sciences, whether it’s pharma, biotech, research, or another type of organization, we are unique in that we blend scientific knowledge with technical skills to extract insights out of large, complex datasets. In the cloud, we can architect solutions to help us scale, automate, and collaborate. Interestingly, the use of R and Python by bioinformatics and data science teams can be challenging in a cloud-first world where all the data is somewhere other than your laptop (like a data lake). In this talk, I will share best practices and lessons learned surrounding the use of R and Python by technical teams in the cloud. We’ll focus on the use of Posit Workbench and RStudio on various cloud services such as Azure ML and Databricks.



Quarto (1)
9/19/23 (13:00, 1h 20m)

Quarto (1)
9/19/23 (13:00, 1h 20m)

I. Reproducible Manuscripts with Quarto

Mine Cetinkaya-Rundel, Posit
 

Abstract

In this talk, we present new capability in Quarto that provides a straightforward and user-friendly approach to creating truly reproducible manuscripts that are publication-ready for submission to popular journals. This new feature, Quarto manuscripts, includes the ability to produce a bundled output containing a standardized journal format, source documents, source computations, referenced resources, and execution information into a single bundle that be ingested into journal review and production processes. In this talk we'll demo how Quarto manuscripts work and how you can incorporate them into your current manuscript development process as well as touch on pain points in your current workflow that Quarto manuscripts help alleviate.

II. From journalist to coder: Creating a web publication with Quarto

Brian Tarran, Royal Statistical Society
 

Abstract

In March 2022, I was tasked by the Royal Statistical Society with creating a new online publication: a data science website for data science professionals. I've been a print journalist for 20 years and have worked on websites in that time, but my coding ability began and ended with wrapping "<a href=" tags around text and images. That is, until I discovered Quarto. In this talk, I'll describe how I explored, learned and fell in love with the Quarto publishing system, and how I used it to build a website -- Real World Data Science -- from the ground-up. I'll also make the case for why journalists and others with an interest in science communication should discover what Quarto can do -- and what they can do with it!

III. What’s New in Quarto?

Charlotte Wickham, Posit
 

Abstract

It’s been over a year since Quarto 1.0, an open-source scientific and technical publishing system, was announced at rstudio::conf(2022). In this talk, I’ll highlight some of the improvements to Quarto since then. You'll learn about new formats, options, tools, and ways to supercharge your content. And, if you haven’t used Quarto yet, come to see some reasons to try it out.

IV. Dynamic Interactions: Empowering Educators and Researchers with Interactive Quarto Documents using webR

James Balamuta, University of Illinois Urbana-Champaign
 

Abstract

Traditional Quarto documents often lack interactivity, limiting the ability of students and researchers to fully explore and engage with the presented topic. In this paper, we propose a novel approach that utilizes webR, a WebAssembly-powered version of R, to seamlessly embed R code directly within the browser without the need for a server. We demonstrate how this approach can transform static Quarto documents into dynamic examples by leveraging webR's capabilities through standard quarto code cells, enabling real-time execution of R code and dynamic display of results. Our approach empowers educators and researchers alike to harness the power of interactivity and reproducibility for enhanced learning and research experiences.



Lightning talks
9/19/23 (13:00, 1h 18m)

Lightning talks
9/19/23 (13:00, 1h 18m)

I. dplyr 1.1.0 Features You Can’t Live Without

Davis Vaughan, Posit
 

Abstract

Did you enjoy my clickbait title? Did it work? Either way, welcome! The dplyr 1.1.0 release included a number of new features, such as: - Per-operation grouping with `.by` - An overhaul to joins, including new inequality and rolling joins - New `consecutive_id()` and `case_match()` helpers - Significant performance improvements in `arrange()` Join me as we take a tour of this exciting dplyr update, and learn how to use these new features in your own work!

II. What’s new in the Torch ecosystem

Daniel Falbel, Posit, PBC
 

Abstract

torch is an R port of PyTorch, a scientific computing library that enables fast and easy creation and training of deep learning models. In this talk, you will learn about the latest features and developments in torch, such as luz, a higher level interface that simplifies your model training code, and vetiver, a new integration that allows you to deploy your torch models with just a few lines of code. You will also see how torch works well with other R packages and tools to enhance your data science workflow. Whether you are new to torch or already an experienced user, this talk will show you how torch can help you tackle your data science challenges and inspire you to build your own models.

III. Plumber APIs in Production: How To Get Notified When Your API Fails

Daren Eiri, Director of Data Science
 

Abstract

You’ve deployed an API to production, but how do you know when non-zero exit codes occur? Our team has 10 APIs in production used in our organization. They are hosted in Posit Connect, but we also have APIs running in Docker containers deployed on Azure. As of today, Posit Connect is limited in its ability to notify when an API encounters an unexpected error. We’ve introduced into our Plumber-based APIs {log4r} and append our logs to syslog, which are routinely captured by Azure Monitor and made available in the cloud. By setting up alerts, we now get notified by email -- and text messages for urgent issues – for non-zero exit codes with descriptive error messages. With this solution, we have more visibility into the health of our APIs

IV. the people of posit: bringing personality to R packages

JP Flores, University of North Carolina at Chapel Hill
 

Abstract

The R programming language offers versatility to perform statistical analyses, create publication ready plots, and render high quality reports and presentations. Despite having this environment of indispensable tools, it can be daunting for a beginner-level programmer to get started. Luckily, the posit community is one of a kind and values inclusivity, collaboration, and empathy. By putting a face to the R packages we use on a daily basis, we hope to make every programmer feel included and capable. We want to inspire attendees to create their own projects or packages, connect with others inside and outside of their field of expertise, and challenge themselves to learn something new, knowing the community is right there to support them.

V. CI/CD Pipelines - Oh, the Places You’ll Go!

Trevor Nederlof, Posit
 

Abstract

Data scientists are creating incredibly useful data products at an accelerating rate. These products are consumed by others who expect them to be accurate reliable and timely, often promises unfulfilled. In this talk we will explore how to use common CI/CD pipeline tools already within reach of attendees to automatically test and deploy their apps, APIs and reports.

VI. Making Community Magic: why you should stop networking and start making friends

Libby Heeren, Freelance Data Scientist, Data Humans Podcast
 

Abstract

When we think about making connections, we think about networking. I’d like you to forget about networking and start thinking about making friends. I’ll share my perspective as a community builder and host of the Data Humans podcast on how and why to cultivate a community of practice for yourself and how to become a force multiplier who increases engagement. You’ll learn the benefits of making genuine human connections, the practical steps to making data friends, the power of vulnerability, and why we all benefit when we show up as our whole selves. If you’ve ever wondered how some people seem to know so many people while you feel lost, and how some communities seem to thrive so effortlessly while others stagnate, this talk is for you.

VII. Coding tools for Industry R&D – Development lessons from an Analytical lab

Camila Saez Cabezas, Dow, Inc.
 

Abstract

Are you considering or curious about developing code-based tools for scientists? Whether you are an experienced developer or a fellow Posit Academy graduate who might be stepping into this role for the first time, the aim of my story is to inspire you and help you navigate this process. While developing custom R functions, packages, and Shiny apps for diverse analytical capabilities and users in R&D, I learned why it’s important to collect certain information at the start before writing any tidying, analysis, visualization, and web application code. In this talk, I will share the essential technical questions that help me define and plan for success.

VIII. What I Wish I Knew Before Becoming a Data Scientist

Kaitlin Bustos, SharpestMinds
 

Abstract

In this conference talk I’m sharing my personal journey as a Data Scientist and the key lessons learned along the way. I’ll emphasize the importance of finding a positive community of like minded-allies, persevering through setbacks as success is not linear, and exploration by embracing the broad nature of the Data Science field. By sharing my experiences and acknowledging the challenges I've faced attendees will gain a fresh perspective on what it takes to succeed in a Data Science career and inspire them to pursue their passions in the field. Overall, this talk aims to provide a glimpse into the reality of a Data Science career. Attendees will take away a sense of motivation and empowerment to find their own unique path to success.

IX. Quickly get your Quarto HTML theme in order

Greg Swinehart, Posit, PBC
 

Abstract

A 5-minute talk to discuss how I’ve used Quarto and Bootstrap variables to quickly make Shiny’s new website look as it should. The Quarto user I have in mind works at an organization with specific brand guidelines to follow. I’ll discuss how to set up your theme, show some key Quarto settings, and how Bootstrap’s Sass variables are your best friend.

X. USGS R-package development: 10-year reflections

Laura DeCicco, USGS
 

Abstract

Ten years ago, the first set of git commits were submitted to a new R software package repository “dataRetrieval” with the goal to provide an easy way for R users to retrieve U.S Geological Survey (USGS) water data. At that time, the perception within the USGS was the use of R was exclusive to an elite group of “very serious scientists”. Fast forward, we now find many newer USGS hires having a solid grasp of the language from the start along with the use of R in a wide variety of applications. In this presentation, I’ll discuss my experiences maintaining the dataRetrieval package, how it’s shaped my career, impacted USGS R usage, and why data providers should consider sponsoring their own R packages wrapping their data API services.

XI. Speeding Up Plots in R/Shiny

Ryszard Szymański, Appsilon
 

Abstract

Slow dashboards lead to a poor user experience and cause users to lose interest, or even become frustrated. A common culprit of this situation is a slowly rendering plot. During the talk, we will dive deeper into how plots are rendered in Shiny, identify common bottlenecks that can occur during the rendering process, and learn various techniques for improving the speed of plots in R/Shiny dashboards. These techniques will range from more efficient data processing to library-specific optimisations at the browser level.

XII. Shiny Developer Secrets: Insights from over 1200 applicants and what you MUST know to shine ✨

Vedha Viyash, Appsilon
 

Abstract

Over 1200 candidates applied for the Shiny developer role at Appsilon in the last year, and I will be sharing some insights that we have gained from going through the qualitative and quantitative feedback collected from every round of the interview process. We want to share some of the interesting insights from this data that would help you focus on things that will make you a better shiny developer. One insight is that over 60% of the candidates who make it to the final round do not have a deeper understanding of reactivity in Shiny, The first half of the talk will focus on the insights and learnings from the interview process and the second half will focus on how to overcome some of the gaps mentioned in the first half.

XIII. Optimizing Layouts for Small-Multiples Viz

Matt Dzugan
 

Abstract

Using Small-Multiples (faceted graphs) is an effective way to compare patterns across many dimensions. In this talk, I'll walk you through some ways to layout your individual facets according to the underlying data. For example, maybe each facet represents a city or point on a 2D plane - we'll explore ways to organize facets in a grid that mimics the data itself - unlocking your ability to explore patterns in 4+ dimensions. Other solutions to this problem rely on manually-curated lists that map common layouts to a grid, but in this talk we'll explore solutions that work on EVERYTHING. I'll show you how to incorporate this technique into your viz, & how I built the libraries, since there are some interesting Data Science concepts at play.



Bridging the gap between data scientists and decision makers
9/19/23 (14:40, 1h 20m)

Bridging the gap between data scientists and decision makers
9/19/23 (14:40, 1h 20m)

I. From Concept to Impact: Building and Launching Shiny Apps in the Workplace

Tiger Tang, CARFAX
 

Abstract

Do you find shiny apps powerful but wonder what use cases might fit in your organization? Have you had a shiny app idea and got stuck at where to start? Have you ever built a shiny app but received much smaller traction than expected? If you are experiencing any of these like I did, you are at the right talk. At CARFAX, we began utilizing Shiny apps for internal use a couple of years ago. The shiny apps I implemented have handled over 160,000 internal requests. In this talk, I will introduce an implementation mindmap for Shiny that may help you find a use case, make it robust, and better connect with your users at your workplace.

II. Building a flexible, scaleable self-serve reporting system with Shiny

Natalie O'Shea, BetterUp
 

Abstract

Working in the high-touch world of consulting, our team needed to develop a reporting system that was flexible enough to be tailored to the specific needs of any given partner while still reducing the highly manual nature of populating slide decks with various metrics and data visualizations. In this talk I will share my approach to flexible, scalable self-serve reporting using the rhino framework for Shiny and the reticulate, R6, and gargoyle packages. I hope attendees will walk away with new ideas about how to combine various frameworks and design strategies to get the best of both parameterized reporting and customizable Google Slide decks.

III. How Data Scientists Broke A/B Testing (And How We Can Fix It)

Carl Vogel, Babylist
 

Abstract

As data scientists, we care about making valid statistical inferences from experiments. And we've adapted well-established and well-understood statistical methods to help us do so in our A/B tests. Our stakeholders, though, care about making good product decisions efficiently. I'll describe how the way we design A/B tests can put these goals in tension and why that often causes misalignment between how A/B tests are intended to be used and how they are actually used. I'll also talk about how I've used R to implement alternative experimental approaches—including non-inferiority designs and value-of-information analyses—that have helped bridge the gap between data scientists and stakeholders.

IV. How to Win Friends and Influence People (with Data)

Joe Powers
 

Abstract

Too many great data science products never go into production. To persuade leaders & colleagues to adopt your data science offering, you must translate your insights into terms that are relevant and accessible to them. Attempts to persuade these audiences with proofs and model performance stats will often fall flat because the audience is left feeling overwhelmed. This talk will demonstrate the data simulation, visualization, and story-telling techniques that I use to influence leadership and the community-building techniques I use to earn the trust and support of fellow analysts. These efforts were successful in persuading Intuit to adopt advanced analytic methods like sequential analysis that cut the duration of our AB tests by over 60%.



Tidy up your models
9/19/23 (14:40, 1h 20m)

Tidy up your models
9/19/23 (14:40, 1h 20m)

I. tidymodels: Adventures in rewriting a modeling pipeline

Ryan Timpe, The LEGO Group
 

Abstract

Data science sure has changed over the past few years! Everyone’s talking about production. RStudio is now Posit. Models are now tidy. This talk is about embracing that change and updating existing models using the tidymodels framework. I recently completed this change, letting go of our in-production code and revisioning it with tidymodels. My team ended up with a faster, more scalable pipeline that enabled us to better automate our workflow and increase our scale while improving our stakeholders’ experiences. I’ll share tips & tricks for adopting the tidymodels framework in existing products, best practices for learning and upskilling teams, and advice for using tidymodel packages to build more accessible data science tools.

II. Reliable maintenance of machine learning models

Julia Silge, Posit PBC
 

Abstract

Maintaining machine learning models in production can be quite different from maintaining general software engineering projects, each with different challenges and common failure modes. In this talk, learn about model drift, the different ways the word “performance” is used with models, what you can monitor about a model, how feedback loops impact models, and how you can use vetiver to set yourself up for success with model maintenance. This talk will help practitioners who are already deploying models, but this is also useful knowledge for practitioners earlier in their MLOps journey because decisions made along the way can make the difference between resilient models that are easier to maintain and disappointing or misleading models.

III. Tracking ML Experiments with Guild AI

Tomasz Kalinowski, Posit Software, PBC
 

Abstract

Machine learning is an empirical practice, and keeping track of what worked, and what didn't, is crucial for progress. This talk introduces Guild AI, an experiment tracker tailored for ML. The presentation will provide a high-level overview of Guild AI's key features, benefits, and real-world applications, demonstrating how it can simplify your ML projects and help you maximize project outcomes. Topics covered: - Running and viewing experiment runs - Hyperparameter optimization - Model comparison tools - Using guild with popular packages like keras, tidymodels, and torch - Using guild with custom frameworks/models/metrics. - Annotating runs with tags, labels, comments - Sharing and publishing run results.

IV. Conformal Inference with tidymodels

Max Kuhn, posit
 

Abstract

Conformal inference theory enables any model to produce probabilistic predictions, such as prediction intervals. We'll demonstrate how these analytical methods can be used with tidymodels. Simulations will show that the results have good coverage (i.e., a 90% interval should include the real point 90% of the time).



Managing packages
9/19/23 (14:40, 1h 20m)

Managing packages
9/19/23 (14:40, 1h 20m)

I. {slushy}: A Bridge to the Future

Becca Krouse, GSK
 

Abstract

Scaling the use of R can present complications for environment management, especially in regulated industries with a focus on traceability. One solution is controlled (aka “frozen”) environments, which are carefully curated and tested by tech teams. However, the speed of R development means the environments quickly become outdated and users are unable to benefit from the latest advances. Enter {slushy}: a team-friendly tool powered by {renv} and Posit Package Manager. Users can quickly mimic a controlled environment, with the easy ability to time travel between snapshot dates. Attendees will learn how {slushy} bolstered our R adoption efforts, and how this strategy enables tech teams and users to work in parallel towards a common future.

II. How I Learned to Stop Worrying and Love Public Packages

Joe Roberts, Posit
 

Abstract

The popularity of R and Python for Data Science is in no small part attributable to the vast collection of extension packages available for everything from common tasks like data cleaning to highly-specialized domain-specific functions. However, with that ease of sharing packages comes a larger target for bad actors trying to exploit them. We'll explore these security risks along with approaches you can take to mitigate them using Posit Package Manager.

III. CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-Universe

Athanasia Monika Mowinckel, LCBC, University of Oslo, Norway
 

Abstract

Say goodbye to installation headaches and hello to a universe of possibilities with R-Universe! Take your R package development to new frontiers by organising and sharing packages beyond the bounds of CRAN. R-Universe's reliable package building process strengthens installation and usage instructions, resulting in fewer support requests and an easy installation experience for users. With webpages and an API for exploring packages, R-Universe creates a streamlined and tidy ecosystem for R-package constellations. Also, you can build a custom toolchain for your users, relieving your workload and empowering users to help themselves. Join me to learn how to explore the vastness of R-Universe and expand your package development possibilities!

IV. Package Management for Data Scientists

Tyler Finethy, Posit, PBC
 

Abstract

As a former data scientist, I remember the problems I faced when I tried to install scikit-learn for the first time. Encountering an issue early on like this brought the learning process to a halt and made it extremely difficult to progress. Even seasoned professionals struggle with these errors, which can be esoteric and hard to comprehend. In my talk, "Package Management for Data Scientists," I will delve into the workings of software dependencies for R and Python, the difficulties involved in making package installations run smoothly, and what can be done to improve accessibility for newcomers to the field of data science.



The future is Shiny
9/19/23 (14:40, 1h 20m)

The future is Shiny
9/19/23 (14:40, 1h 20m)

I. Making a (Python) Web App is easy!

Marcos Huerta, Carmax
 

Abstract

Creating and deploying an interactive web application that demonstrates the capabilities of software you have written, or datasets you have found is easier than ever. I plan to talk about several (mostly python) based web application frameworks, how I have used them, their strengths and weaknesses, and how data scientists / analysts / etc. can easily turn a class, function, or data set visualization they already have written into an interactive web page to share with the world. Often talks focus on one specific tool, but I plan to discuss Dash, Streamlit, and Shiny (for R and Python), how I have used these, how I have taken a journey from R for Shiny, to Dash, to Streamlit, and back to Shiny for Python.

II. Shiny for Python Machine Learning Apps with pandas, scikit-learn and TensorFlow

Chelsea Parlett-Pelleriti, Chapman University
 

Abstract

With the introduction of Shiny for Python in 2022, users can now use the power of reactivity with their favorite python packages. Shiny can be used to build interactive reports, dashboards, and web apps, that make sharing insights and results both simple, and dynamic. This includes apps to display and explore popular Machine Learning models built with staple python packages like pandas, scikit-learn, and TensorFlow. This talk will demonstrate how to build simple Shiny for Python apps that interface with these packages, and discuss some of the benefits of using Shiny for Python to build your web apps.

III. Shiny new tools for scaling your Shiny apps

Joe Kirincic, Medical Mutual of Ohio
 

Abstract

So you have a Shiny app your org loves, but as adoption grows, performance starts getting sluggish. Profiling reveals your cool interactive plots are the culprit. What can you do to make things snappy again? We can increase the number of app instances, sure, but suppose that isn't an option for us. Another approach is to shift the plotting work from the server onto the client. In this talk, we'll learn how to leverage two Javascript projects, DuckDB-WASM and Observable's Plot.js, in our Shiny app to create fast, flexible interactive visualizations in the browser without burdening our app's server function. The end result is an app that can scale to more users without needing to increase compute resources.

IV. 10 solutions for Shiny in Production

Andrew Patterson, Infrastructure Lead @ Jumping Rivers
 

Abstract

We’re often helping developers to assess, fix and improve their shiny apps, and often the first thing we do is see if we can deploy the app. If you can’t deploy your shiny app, it's a waste of time. If you can deploy it successfully, then at the very least it runs, so we’ve got something to work with. There’s a bunch of reasons why apps fail to deploy. They can be easy to fix, like Hardcoded secrets, fonts or missing libraries. Or they can be intractable and super frustrating to deal with, like manifest mismatches, resource starvation and missing libraries. At the end of this talk I want you to know how to identify, investigate and proactively prevent shiny app deployment failures from happening. See you there!



Databases for data science with duckdb and dbt
9/19/23 (16:20, 1h 20m)

Databases for data science with duckdb and dbt
9/19/23 (16:20, 1h 20m)

I. dbtplyr: Bringing column-name contracts from R to dbt

Emily Riederer, Capital One
 

Abstract

dplyr’s select helpers exemplify how the tidyverse uses opinionated design to push users into the pit of success. The ability to efficiently operate on names incentivizes good naming patterns and creates efficiency in data wrangling and validation. However, in a polyglot world, users may find they must leave the pit when comparable syntactic sugar is not accessible in other languages like python and SQL. In this talk, I will explain how dplyr’s select helpers inspired my approach to ‘column name contracts’, how good naming systems can help supercharge data management with packages like {dplyr} and {pointblank}, and my experience building the {dbtplyr} to port this functionality to dbt for building complex SQL-based data pipelines.

II. In-Process Analytical Data Management with DuckDB

Hannes Mühleisen, DuckDB Labs, Centrum Wiskunde & Informatica, Radboud Universiteit Nijmegen
 

Abstract

DuckDB is an in-process analytical data management system. DuckDB supports complex SQL queries, has no external dependencies, and is deeply integrated into the R ecosystem. For example, DuckDB can run SQL queries directly on R data frames without any data transfer. DuckDB is Free and Open Source software under the MIT license. DuckDB uses state-of-the art query processing techniques like vectorized execution and automatic parallelism. DuckDB is out-of-core capable, meaning that it is possible to process datasets far bigger than main memory. In our talk, we will describe the user values of DuckDB, and how it can be used to improve their day-to-day lives through automatic parallelization, efficient operators and out-of-core operations.

III. duckplyr: tight integration of duckdb with R and the tidyverse

Kirill Müller, cynkra GmbH
 

Abstract

duckdb is an analytical database system that works great with R and Python and other host systems. dplyr is the grammar of data manipulation in the tidyverse, tightly integrated with R, but it works best for small or medium-sized data. The former has been designed with large or big data in mind, but currently you need to formulate your queries in SQL. The new duckplyr package offers the best of both worlds. It transforms a dplyr pipe into a query object that duckdb can execute, using an optimized query plan. It is better than dbplyr because the interface is "data frames in, data frames out", and no SQL code is generated. The talk presents first results, a bit of the mechanics, and an outlook for this ambitious project.

IV. Siuba and duckdb: analyzing everything everywhere all at once

Michael Chow, Posit Software, PBC
 

Abstract

Every data analysis in python starts with a big fork in the road: which DataFrame library should I use? The DataFrame Decision locks you into different methods, with subtly different behavior:: different table methods (e.g. polars `.with_columns()` vs pandas `.assign()`) different column methods (e.g. polars `.map_dict()` vs pandas `.map()`) In this talk, I’ll discuss how siuba (a dplyr port to python) combines with duckdb (a crazy powerful sql engine) to provide a unified, dplyr-like interface for analyzing a wide range of data sources–whether pandas and polars DataFrames, parquet files in a cloud bucket, or pins on Posit Connect. Finally, I’ll discuss recent experiments to more tightly integrate siuba and duckdb.



Compelling design for apps and reports
9/19/23 (16:20, 1h 20m)

Compelling design for apps and reports
9/19/23 (16:20, 1h 20m)

I. Why Design is Worth The Time

Laura Gast, USO
 

Abstract

Have you ever submitted a report or app and wished you’d been given just a bit more time to clean up the look of it? Does it make your skin crawl to hear "the data speaks for itself, don’t waste your time making that deliverable ‘pretty’”? As a compliment to the many sessions this week in which you’ll hear great methods for *how* to make your work more beautiful, in this talk, we’ll walk through some of the scientific research that shows *why* taking the time to make design improvements is critical to communicating your point with data – for dashboards, reports, and even simple tables.

II. Adding a touch of glitr: Developing a package of themes on top of ggplot

Aaron Chafetz, US Agency for International Development
 

Abstract

How do you create brand cohesion across your large team when it comes to data viz? My colleague and I, inspired by the BBC’s bbplot, developed a package on top of ggplot2 to create a common look and feel for our team's products. This effort improved not just the cohesiveness of our work, but also trustworthiness. By creating this package, we reduced the reliance of using defaults and the time spent each project customizing numerous graphic elements. More importantly, this package provided an easier on ramp for new teammates to adopt R. We would like to share our journey within a federal agency developing a style guide, to guide and inspire other organizations who could benefit from developing their own branding package and guidance.

III. HTML & CSS for R Users

Albert Rapp, Ulm University
 

Abstract

It's easy to think that R users do not need HTML & CSS. After all, R is a language designed for data analysis, right? But the reality is that these web standards are everywhere, even in R. And many great tools like {ggtext}, {gt}, {shiny} and Quarto unlock their full potential when you know a little bit of HTML & CSS. In this talk, I will - demonstrate specific examples where R users can benefit from HTML & CSS, - show you good resources for learning - share what has worked for me.

IV. Styling and templating quarto documents

Emil Hvitfeldt
 

Abstract

Quarto is a powerful engine to generate documents, slides, book, websites and more. The default aesthetics looks good, but there are times where you want and need to change how they look. This is that talk. Whether you want your slides to stand out from the crowd, or you need your ducuments to fit within your corporate style guide, being able to style quarto documents are a valuable skill. Once you have persevered and created the perfect document, you don't want the effort to go to waste. This is where templating comes in. Quarto makes it super easy to turn a styled document into a template to be used over and over again.



Getting %$!@ done: productive workflows for data science
9/19/23 (16:20, 1h 20m)

Getting %$!@ done: productive workflows for data science
9/19/23 (16:20, 1h 20m)

I. What an early 2000s reality show taught me about file management

Reiko Okamoto, National Research Council Canada
 

Abstract

Ideas from home organization shows are surprisingly applicable to file management. Using a room divider to establish dedicated zones for different activities in a studio apartment is analogous to creating self-contained projects in RStudio. Likewise, swapping mismatched hangers with matching ones to tidy a closet resembles the adoption of a file naming convention to make a directory easier to navigate. In this talk, I will share good practices in file management through the lens of home organization. We all know that clutter, whether it is in our physical space or on our machine, destroys our ability to focus. These practices will help R users of all levels create a serene, relaxing environment where they feel inspired to work with data.

II. Getting the most out of Git

Colin Gillespie, Jumping Rivers
 

Abstract

Did you believe that Git will solve all of your data science worries? Instead, you’ve been plunged HEAD~1 first into merging (or is that rebasing?) chaos. Issues are ignored, branches are everywhere, main never works, and no one really knows who owns the repository. Don’t worry! There are ways to escape this pit of despair. Over the last few years, we’ve worked with many data science teams. During this time, we’ve spotted common patterns and also common pitfalls. While one size does not fit all, there are golden rules that should be followed. At the end of this talk, you’ll understand the processes other data science teams implement to make Git work for them.

III. Documenting things: openly for future us

Julia Stewart Lowndes, Openscapes and UC Santa Barbara
 

Abstract

RMarkdown and Quarto are shifting the paradigm for how professionals learn to code, write documentation, and teach others. I'll showcase the style I first experienced by Jenny Bryan (Stat545, Happy Git With R, What They Forgot): narrative and code together, shared openly as a website, that builds trust with learners and provides a resource to consult during and outside of live-teaching. This mode shifts culture in a powerful way, since we can fork this idea, repeat it in our own work. I’ll share examples of how professional researchers are using this style to write open documentation, from technical workflows to community onboarding. I’ll also highlight design elements for writing open documentation, and tips for doing so with Quarto.

IV. How You Get Value as a 1-Person Connect Team

Sean Nguyen, S2G Ventures
 

Abstract

Sean, a sole Posit Connect developer, shares his experience in delivering business impact. He narrates his transition from crafting one-off reports to developing and deploying robust data science web applications using Python and R with Posit Connect. Despite its common association with large enterprise teams, Sean demonstrates how Posit Connect can be effectively utilized in smaller settings. He presents his work on creating and deploying end-to-end machine learning pipelines in Python, hosting them as APIs, and seamlessly integrating with Shiny apps via Posit Connect. This talk imparts practical strategies and techniques to foster user and executive adoption of Posit Connect within lean (and large) organizations.



Teaching data science
9/19/23 (16:20, 1h 20m)

Teaching data science
9/19/23 (16:20, 1h 20m)

I. Teaching Data Science in Adverse Circumstances: Posit Cloud and Quarto to the Rescue.

Aleksander Dietrichson, Universidad de San Martin
 

Abstract

The focus of this presentation is on the challenges faced by teachers of data-science whose students are not quantitatively inclined and in addition may face some adversity in terms of technology resources available to them and potential language barriers. I identify three main areas of challenges and show how at Universidad Nacional de San Martín (Argentina) we addressed each of the areas through a combination of original curriculum redesign, production of course materials appropriate for the students in question; and the use of OS, and some Posit products, i.e.:posit.cloud and quarto. I show how these technologies can be used as a pedagogical tool to overcome the challenges mentioned, even on a shoestring budget.

II. You Can Lead a Horse to Water . . . Changing the Data Science Culture for Veterinary Scientists

Jill MacKay, University of Edinburgh
 

Abstract

In education research, knowledge recall is often considered the easiest aspects of learning, and creation of new knowledge the most challenging. Skill development therefore often requires learners to consistently ‘do’ the skill and receive feedback from experts to allow them to fully enter a new culture of practice. This is particularly challenging for those who are interdisciplinary & have limited control over their workload, such as medics and field scientists. In this talk, an educational scientist describes the previous 10 years of supporting veterinary scientists to adopt Open Science practices surrounding Data Science. What worked, what failed miserably, and reflections on why it can be so hard to get a horse to drink.

III. Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor

Sean Kross, Fred Hutchinson Cancer Center
 

Abstract

The data frame is a fundamental data structure for data scientists using Python and R. Pandas and the Tidyverse are designed to center building pipelines for the transformation of data frames. However, within these pipelines it is not always clear how each operation is changing the underlying data frame. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams to illustrate the semantics of operations such as filtering, sorting, and grouping. In this talk I will introduce Pandas Tutor and Tidy Data Tutor, step-by-step visual representation engines of data frame transformations. Both tools illustrate the row, column, and cell-wise relationships between an operation’s input and output data frames.

IV. R! You Going?!

SherAaron Hurt, The Carpentries
 

Abstract

Everyday we are losing data scientists, not because people aren’t excited but because they walk into the process and become stuck and they don’t have the confidence or the resources to move further along their journey. At The Carpentries we train novice scientists and researchers how to comfortably move from dipping their toe in the water to successfully going deeper to obtain all of the resources afforded to them. I'd like to tell my story as a Data Scientist and offer resources and tips to build confidence to those who are new to their journey. The tools are available however, it is not always easy to find them. The participants must be willing and ready to keep going! I'm the lifeguard ready to help!



Keynote - Kara Woo - R Not Only In Production
9/20/23 (9:00, 1h)

Keynote - Kara Woo - R Not Only In Production
9/20/23 (9:00, 1h)


R Not Only In Production

Kara Woo, Insight RX
 

Abstract

I will share what our team has learned from successfully integrating R in all areas of our company's operations. InsightRX is a precision medicine company whose goal is to ensure that each patient receives the right drug at the optimal dose. At InsightRX, R is a first-class language that's used for purposes ranging from customer-facing products to internal data infrastructure, new product prototypes, and regulatory reporting. Using R in this way has given us the opportunity to forge fruitful collaborations with other teams in which we can both learn and teach.



Keynote 4 - JD Long - t’s Abstractions All the Way Down …
9/20/23 (10:30, 1h)

Keynote 4 - JD Long - t’s Abstractions All the Way Down …
9/20/23 (10:30, 1h)


It’s Abstractions All the Way Down …

JD Long, RenaissanceRe
 

Abstract

Over 20 years ago Joel Spolsky famously wrote, "All non-trivial abstractions, to some degree, are leaky." Unsurprisingly this has not changed. However, we have introduced more and more layers of abstraction into our workflows: Virtual Machines, AWS services, WASM, Docker, R, Python, data frames, and on and on. But then on top of the computational abstractions we have people abstractions: managers, colleagues, executives, stakeholders, etc. JD's presentation will be a wild romp through the mental models of abstractions and discuss how we, as technical analytical types, can gain skill in traversing abstractions and dealing with leaks.



Data science infrastructure for your org
9/20/23 (13:00, 1h 20m)

Data science infrastructure for your org
9/20/23 (13:00, 1h 20m)

I. Data Science in Production: The way to a centralized infrastructure

Oliver Bracht, eoda GmbH
 

Abstract

In this talk the success story of Covestro’s posit infrastructure is presented. The problem of the leading German material manufacturer was that no common development environment existed. With the help of eoda and posit, a replicable, centralized development environment for R and Python was created. Although R and Python represent the core of the infrastructure, multiple languages and tools are unified. In addition to the collaboration of Covestro's data science teams, compliance guidelines could also be better fulfilled. The staging architecture hereby provides developers with a concept for testing and going live of their products. This project presents a best practice approach to a data science infrastructure using Covestro as an example.

II. Work with the cloud, not just in the cloud

James Blair, Posit PBC
 

Abstract

More and more organizations are migrating their infrastructure to "the cloud." While this has clear benefits for IT administrators, individual users can sometimes wonder what the cloud has to offer them. This talk will present practical patterns for taking advantage of cloud computing for both R and Python developers. Specific examples will be given using Amazon SageMaker to demonstrate how the flexible compute options and increased resources offered by the cloud can be a significant upgrade compared to local development environments.

III. Connect and Kubernetes: A Candid Discussion about Scalability

Kelly O'Briant, Posit
 

Abstract

Running Connect with off-host content execution in Kubernetes is very cool and allows you to enable some powerful and sophisticated workflows. The question is, do you really need it? How do you evaluate and decide? Let’s have a candid conversation about whether Connect content execution on Kubernetes is right for you and your organization. Moving to Kubernetes will introduce complexity. What do you hope to get in return? If your goals begin and end with cost reduction and ease of scaling, there are potentially very good alternatives to consider. It would be great if scaling Connect horizontally were easier. We’ll talk about why auto scaling Connect (specifically scaling down) is a tricky problem and what we can do about it.

IV. Posit Workbench: A panacea for open source data science developers in R and Python

Tom Mock, Posit - Product Manager for Workbench
 

Abstract

Posit Workbench is a central platform designed to alleviate the challenges faced by open-source data science devs, such as limited computational resources, data access issues, and brittle local dev environments that don't match collaborators or production. Posit Workbench offers a scalable solution with a server-based or elastic infrastructure, controlled data access, and enterprise-grade auth. Posit Workbench reduces the burden on admins by providing centralized infrastructure and built-in management tools. It is an affordable and flexible alternative to proprietary platforms, freeing developers to continue working with the open-source packages and editors they know and love while meeting the requirements of enterprise-grade platforms.



Developing your skillset; building your career
9/20/23 (13:00, 1h 20m)

Developing your skillset; building your career
9/20/23 (13:00, 1h 20m)

I. It’s All About Perspective: Making A Case For Generative Art

Meghan Santiago Harris, Prostate Cancer Clinical Trials Consortium - Memorial Sloan Kettering
 

Abstract

Because the field of Data Science is inherently task-oriented, it is no wonder that most people struggle to see the utility of generative art past the bounds of a casual hobby. This talk will invite the participant to learn about generative art while focusing on "why" people should create it and its potential place in Data Science. This talk is suitable for all disciplines and artistic abilities. Furthermore, this talk will aim to expand the participant's perspective on generative art with the following concepts: What is generative art and how can it be created in R or Python Justifications for generative art within Data Science Examples of programming skills that are transferrable between generative art and pragmatic data science projects

II. How the R for Data Science (R4DS) Online Learning Community made me a better student

Lydia Gibson, California State University East Bay
 

Abstract

Through my participation in R4DS Online Learning Community, I have advanced my R and data science skills, making me a better student than I otherwise would have through just my studies. As a non-traditional MS Statistics student with an undergraduate background in economics, I had absolutely no experience with the R programming language prior to pursuing my Master’s degree. In July 2021, with hopes of getting a headstart on learning R before beginning my degree program, I joined the R4DS Slack Workspace. Along with helping to improve my programming skills, R4DS has connected me with scholarship, mentorship, and other opportunities, and I think that it would be beneficial for other students to know about this great resource.

III. Solving a Secure Geocoding Problem (That Hardly Anybody Has)

Tesla DuBois, Fox Chase Cancer Center/Temple University
 

Abstract

I'm an aspiring geospatial data scientist working in cancer research and a late-blooming academic studying medical geography. The strictest health researchers won't send patient addresses to remote servers for geocoding due to data security concerns. The only existing methods for offline geocoding are expensive, cumbersome, or require working with code - all limiting factors for many researchers. So, a couple of classmates and I made a standalone desktop application using shell, docker, and Python to geocode addresses through a friendly GUI without ever sending them off the local machine. Hear how my R background played into the daunting, frustrating, but ultimately successful task of creating a tool using unfamiliar technologies.

IV. Small package, broad impact: How I discovered the ultimate new hire hack

Trang Le, Bristol Myers Squibb
 

Abstract

Onboarding new hires can be a challenging process, but taking a problem-focused approach can make it more meaningful and rewarding. In this talk, I will share how I discovered the ultimate new hire hack by creating two small packages that gave me the confidence I needed when I started at BMS. Through building these packages, I not only learned R things like using bslib and making font files available for published dashboards, but also gained a deep understanding of my company's internal systems and workflows, and connected with my team via lots of questions. The resulting packages are still heavily used today. Join me to discover how small packages can have a broad impact and what hiring managers can do to help.



R or Python? Why not both!
9/20/23 (13:00, 1h 20m)

R or Python? Why not both!
9/20/23 (13:00, 1h 20m)

I. Using R, Python and Cloud Infrastructure to Battle Aquatic Invasive Species

Nicholas Snellgrove, Epi-interactive
 

Abstract

Invasive species are a huge threat to lake ecosystems in Minnesota. With over 10,000 water bodies across the state, having up-to-date data and decision support is critical. Researchers at the University of Minnesota have created four complex R and Python models to support lake managers, all pulled together and presented with the most recent infestation data available. Come along with us to see how we connected these models in the AIS Explorer, a decision support application built in R Shiny to help with prioritising risks and placing watercraft inspectors, using tools like OCPU and cloud tooling like Lambda, EventBridge and AWS S3.

II. FOCAL Point: Utilizing Python, R, & Shiny to Capture, Process, And Visualize Motion

Alyssa Burritt, PING, Inc.
 

Abstract

One of the fastest movements in modern sports is a golf swing. Capturing this motion using a high-speed camera system creates many unique challenges in processing, analyzing, and visualizing the thousands of data points that are generated. These spatial coordinates can be quickly translated through Python scripts to well-known, industry specific performance metrics and graphics in Shiny. Down the line, R utilities aid more complicated analyses and optimizations, driving new product innovations. This talk will cover our company’s process of implementing these tools into our workflow and highlight key program features that have helped successfully combine these applications for users with a variety of technical backgrounds.

III. Combining R and Python for the Belgian Justice Department

Thomas Michem
 

Abstract

We build a great case on how to combine R and Python in a production environment. So the justice department's back office monitors the smooth processing of all traffic fines in Belgium. They gather that data from all police departments. On this data, they check if any anomalies occur. The back-office monitors that using a shiny application where they can see traffic signs showing the status of the whole operation and the status is built using Python scripts that perform anomaly detection if the number of fines is inline what they expect daily. And the results of those checks are delivered to a front-end shiny application with Python flask API.

IV. Validating and testing R dataframes with Pandera via reticulate: A case study in R-Python interoperability

Niels Bantilan, Union.ai
 

Abstract

Data science and machine learning practitioners work with data every day to analyze and model them for insights and predictions. A major component to any project is data quality, which is a process of cleaning, and protecting against flaws in data that may invalidate the analysis or model. Pandera is an open source data testing toolkit for dataframes in the Python ecosystem: but can it validate R dataframes? This talk is composed of three parts: first I’ll describe what data testing is and motivate why you need it. Then, I’ll introduce the iterative process of creating and refining dataframe schemas in Pandera. Finally, I’ll demonstrate how to use it in R with the reticulate package using a simple modeling exercise as an example.



Shiny user interfaces
9/20/23 (13:00, 1h 20m)

Shiny user interfaces
9/20/23 (13:00, 1h 20m)

I. Towards the next generation of Shiny UI

Carson Sievert, Posit
 

Abstract

Shiny recently celebrated it's 10th birthday, and since it's birth, has grown tremendously in many areas; however, a hello world Shiny app still looks roughly like it did 10 years ago. Thanks to the bslib R package, it is now easy to give your Shiny apps a fresh look using a modern Bootstrap 5 foundation that just works with Shiny, R Markdown, and more. In addition to seamless upgrading of Bootstrap, bslib also provides modern UI components as well as layout and theming helpers. In this talk, I'll highlight the features that we're most excited about (e.g., expandable cards, sidebar layouts, value boxes, etc.), discuss best design practices for improving user experience, and present some real world examples of these tools in action.

II. The Power of Prototyping in R Shiny: Saving Millions by Building the Right Tool

Maria Grycuk
 

Abstract

The development of software can be costly and time-consuming. If the end users are not involved in the process from the start the tool we built may not meet their needs. In this presentation, I will discuss how prototyping in R Shiny can help you build the right tool and save you from spending millions of dollars on a tool no one will use. I will explore the advantages of using R Shiny for prototyping, particularly its ability to rapidly build interactive applications. I will also discuss how to design effective prototypes, including techniques for gathering user feedback and using that feedback to refine your tool. I will emphasize the importance of presenting real-life data, particularly when building data-driven tools.

III. ShinyUiEditor: From Alpha to Powerful Shiny App Development Tool

Nick Strayer, Posit
 

Abstract

Since its alpha debut at last year's conference, the ShinyUiEditor has experienced continuous development, evolving into a powerful tool for crafting Shiny app UIs. Some key enhancements include the integration of new bslib components and the editor's ability to create or navigate to existing server bindings for inputs and outputs. In addition to new features, the editor is now available as a VSCode extension enabling it to integrate smoothly into more developer’s workflows. This talk will showcase how these new capabilities empower users to efficiently create visually appealing and production-ready applications with ease.

IV. How to help developers make apps users love

Michał Parkoła, Appsilon
 

Abstract

There are many resources that can help you design better apps. But what if your org creates many apps? Scaling good design to larger groups dials the challenge up to 11. In this talk I will share how we approach the problem at Appsilon. I will present our in-house Design Guide. I will share the successes and failures we've had building it and helping a wide variety of developers use it. I will then share some tips about what you might want to consider if you want to help your org build better apps at scale.



It takes a village: building and sustaining communities
9/20/23 (14:40, 1h 20m)

It takes a village: building and sustaining communities
9/20/23 (14:40, 1h 20m)

I. Sustainable growth of global communities: RLadies’ next ten years

Riva Quiroga
 

Abstract

R-Ladies' first ten years were about growing the community: from being just one chapter in 2012, to becoming a global organization in 2016, and now fostering more than 200 chapters worldwide. But how can we face the challenges of growing an organization based solely on volunteer work? In this talk, we discuss how we are planning the development of R-Ladies for the next ten years, focusing on the sustainable management of an ever-growing community. To that end, we will present our most recent efforts at documenting workflows, making data-driven decisions, and automating tasks that allow volunteers to focus their time where it is most needed.

II. How to Keep Your Data Science Meetup Sustainable

Ted Laderas, DNAnexus, Inc.
 

Abstract

I know that many data science meetup organizers struggle with burnout. It can be daunting to plan a meetup schedule, especially with the added burden of work and life. I want to highlight some activities and strategies for keeping your data science meetup sustainable. Specifically, I want to highlight some successful ones we've done, like a data scavenger hunt, watching videos together, styling plots, and sharing useful tidyverse functions. By making it easy for your members to contribute and empowering them, it takes a lot of the burden off you as an organizer. You don't need to reinvent the wheel for meetups, or have famous guests for every one. Let's start the conversation and make your meetup last.

III. Side Effects of a Year of Blogging

Millie Symns
 

Abstract

A big part of being in the R community is sharing your knowledge in different forums, no matter how big or small. So what better way to do that than a blog? And what better way than using R and RStudio to build and maintain that blog and website? This was the route I took to challenge myself in putting myself out there more in the community to find my voice, share my knowledge and learn new things. In this talk, I will reflect on lessons learned and gains I have spent the past year blogging and sharing my website for all to see. The side effects include professional and personal benefits – from a clear profile of my skills to the progression of the development of my art. You may leave inspired to try the challenge for yourself.

IV. Black hair and data science have more in common than you think

Kari L. Jordan, The Carpentries
 

Abstract

Data Science is often difficult to define because of its many intersections, including statistics, programming, analytics, and other domain knowledge. Would you believe it if you were told Black hair and data science have more in common than you think? This talk is for anyone considering learning about, studying, or pursuing data science. In it, Dr. Kari L. Jordan draws parallels between approaches to caring for Black hair and approaches to learning data science. We start with the roots and end by picking the right tools and products to maintain our coiffure.



Package development
9/20/23 (14:40, 1h 20m)

Package development
9/20/23 (14:40, 1h 20m)

I. It’s a great time to be an R package developer!

Jenny Bryan, Posit
 

Abstract

In R, the fundamental unit of shareable code is the package. As of March 2023, there were over 19,000 packages available on CRAN. Hadley Wickham and I recently updated the R Packages book for a second edition, which brought home just how much the package development landscape has changed in recent years (for the better!). In this talk, I highlight recent-ish developments that I think have a great payoff for package maintainers. I'll talk about the impact of new services like GitHub Actions, new tools like pkgdown, and emerging shared practices, such as principles that are helpful when testing a package.

II. Becoming an R package author (or how I got rich* responding to GitHub issues)

Matt Herman, Council of State Governments Justice Center
 

Abstract

The transition from analyzing data in R to making packages in R can feel like a big step. Writing code to clean data or make visualizations seems categorically different from building robust “software” on which other people rely. In this talk, I’ll show why that distinction is not necessarily true by discussing my personal experience from learning R in graduate school to reporting bugs on GitHub to becoming a co-author of the tidycensus package and a practicing data scientist. The positive and supportive R community on GitHub, Twitter, and elsewhere contribute to why anyone who writes R code can become a package author. * I have not actually gotten rich but I did get freelance data work based on my package contributions!

III. Committing to Change: How You Can Increase Accessibility in Your Favorite Open Source Projects

Rose Franzen, Children's Hospital of Philadelphia
 

Abstract

When people hear the phrase “technology accessibility”, their minds often jump to accessibility for end-users, skipping past accessibility for developers or data scientists. While companies that provide solutions for data scientists are starting to roll out accessibility updates, data scientists also rely upon and contribute to a wide variety of small, open-source projects which do not have the resources to bring in accessibility experts. This talk weaves together disability theory, principles of inclusive design, and common open source project structures to discuss simple adjustments that can be made by folks of all levels of tech expertise to decrease barriers to contributions by neurodiverse and disabled community members.

IV. • A Biologist’s guide to R package development

Fonti Kar, University of New South Wales
 

Abstract

I am a biologist wearing R package developer shoes. I recall writing my first package was an intimidating task. Here, I will share the parallels between package dev and biology that helped me through process. Like a food web, every file has a unique role and is interconnected with other components. Experimenting with one will naturally result in change in another. Like the internal physiology of our bodies, routine testing ensures the package is in perfect equilibrium with its environment. Much like the forces that promote species adaptation, addressing bugs improves the functionality of the package. I hope sharing my perspective will help others see package development as wonderful as the natural world and dispel any hesitation to start!



Data science with Python
9/20/23 (14:40, 1h 20m)

Data science with Python
9/20/23 (14:40, 1h 20m)

I. Data visualization with seaborn

Michael Waskom, Flatiron Health
 

Abstract

Seaborn is a Python library for statistical data visualization. After nearly a decade of development, seaborn recently introduced an entirely new API that is more explicitly based on the formal grammar of graphics. My talk will introduce this API and contrast it with the classic seaborn interface, sharing insights about the influence of the grammar of graphics on the ergonomics and maintainability of data visualization software.

II. Grammar of Graphics in Python with Plotnine

Hassan Kibirige, A Plus Associates, Posit PBC(Contractor)
 

Abstract

ggplot2 is one of the most loved visualisation libraries. It implements a Grammar of Graphics system which requires one to think about data in terms of columns of variables and how to transform them into geometric objects. It is elegant and powerful. This talk will be about plotnine, which brings the elegance of ggplot2 to the python programming language. It will be an invitation to learn the Grammar of Graphics system and how to avoid the common frustrations of fighting the plotting system. The talk touch on the key milestone of plotnine-1.0.0, which will open plotnine to extensions and that would make new categories of visualisations possible.

III. Diversify your career with Shiny for Python

Gordon Shotwell, Posit
 

Abstract

A few years ago my company made a sudden shift from R to Python which was quite bad for my career because I didn't really know Python. The main issue was that I couldn't find a niche which allowed me to use my existing knowledge while learning the new language. Shiny for Python is a great niche for R users because none of the Python web frameworks can do what Shiny can do. Additionally almost all of your knowledge of the R package is applicable to the Python one. This talk will provide an overview of the Python web application landscape and articulate what Shiny adds to this landscape, and then go through the five things that R users need to know before developing their first Shiny for Python application.

IV. Thanks, I made it with quartodoc

Isabel Zimmerman
 

Abstract

When Python package developers create documentation, they typically must choose between mostly auto-generated docs or writing all the docs by hand. This is problematic since effective documentation has a mix of function references, high-level context, examples, and other content. Quartodoc is a new documentation system that automatically generates Python function references within Quarto websites. This talk will discuss pkgdown’s success in the R ecosystem and how those wins can be replicated in Python with quartodoc examples. Listeners will walk away knowing more about what makes documentation delightful (or painful), when to use quartodoc, and how to use this tool to make docs for a Python package.



Quarto (2)
9/20/23 (14:40, 1h 20m)

Quarto (2)
9/20/23 (14:40, 1h 20m)

I. We converted our documentation to Quarto

Melissa Van Bussel, Statistics Canada
 

Abstract

A year ago, my team’s documentation, which had been created using Microsoft Word, was large and lacked version control. Scrolling through the document was slow, and, due to confidentiality reasons, only one person could edit it at a time, which was a significant challenge for our team of multiple developers. After realizing we needed a more flexible solution, we successfully converted our documentation to Quarto. In this talk, I’ll discuss our journey converting to Quarto, the challenges we faced along the way, and tips and tricks for anyone else who might be looking to revamp their documentation too.

II. Extending Quarto

Richard Iannone, Posit PBC
 

Abstract

What are Quarto shortcode extensions? Think of them as powerful little programs you can run in your Quarto docs. I won't show you how to build a shortcode extension during this talk but rather I'm going to take you on a trip across this new ecosystem of shortcode extensions that people have already written. For example, I'll introduce you to the `fancy-text` extension for outputting nicely-formatted versions of fancy strings such as LaTeX and BibTeX; you'll learn all about the `fontawesome`, `lordicon`, `academicons`, `material-icons`, and `bsicons` shortcode extensions that let you add all sorts of icons. This is only a sampling of the shortcode extensions I will present, there will be many other inspiring examples as well.

III. Never again in outer par mode: making next-generation PDFs with Quarto and typst

Carlos Scheidegger, Posit, PBC
 

Abstract

Quarto 1.4 will introduce support for Typst. Typst is a brand-new open-source typesetting system built from scratch to support the lessons we have learned over almost half a century of high-quality computer typesetting that TeX and LaTeX have enabled. If you've ever had to produce a PDF with Quarto and got stuck handling an inscrutable error message from LaTeX, or wanted to create a new template but were too intimated by LaTeX's arcane syntax, this talk is for you. I'll show you why we need an alternative for TeX and LaTeX , and why it will make Quarto even better.

IV. Unlocking the power of data visualization animation and interactivity in Quarto using Plotly, Crosstalk and Highcharter

Deepsha Menghani, Microsoft
 

Abstract

Data visualizations are a key tool in the process of sharing insights with stakeholders in ways that make impact and are easily understood. Animation and interactivity bring in a key component of analysis that enables you to derive insights that a plot with two dimensions doesn’t otherwise allow. Sometimes it can tell a powerful story and sometimes it can just add some flair to an otherwise low-key story. Plotly, Crosstalk and Highcharter are powerful packages that allow interactivity right from within your Quarto markdown report, without the need to create an entire application. Easily have two separate plots interact with each other or add drill down capabilities within a single plot, whichever adds to the story you are trying to tell.



Leave it to the robots: automating your work
9/20/23 (16:20, 1h 20m)

Leave it to the robots: automating your work
9/20/23 (16:20, 1h 20m)

I. Integrating {pointblank} into a Data Validation Workflow

Michael Garcia, Medable
 

Abstract

For the Data Services team at Medable, our number one priority is to ensure that the data we collect and deliver to our clients is of the highest quality. The {pointblank} package, along with Posit Connect, modernizes how we tackle data validation within Data Services. In this talk, I will briefly summarize how we develop test code with {pointblank}, share with {pins}, execute with {rmarkdown}, and report findings to stakeholders with {blastula}. Finally, I will show how we aggregate metadata from tests conducted across projects into a holistic view using {shiny}.

II. Hitting the target(s) of data orchestration

Alexandros Kouretsis, Appsilon
 

Abstract

We are living at a time where the size of datasets can be overwhelming. Add to this that their process involves linking together different computing systems and software, and integrating dynamically changing reference data, and for sure, you have a problem. Reproducibility, traceability, and transparency have left the building. Here is where Posit Connect along with the vast R ecosystem come to save the day, allowing the creation of reproducible pipelines. I will share with you my first-hand experience in this presentation. In particular, how we used Targets in Posit Connect combined with AWS technologies in a bioinformatics pipeline. The result? An effective and secure workflow orchestration that is scalable and advances knowledge.

III. Using R to Aid Daily Discharge from Mayo Clinic’s Intensive Care Unit

Brendan Broderick, Mayo Clinic
 

Abstract

REGENERATE is real-time decision support tool for Mayo Clinic's ICU (Intensive Care Unit). Its main aim is assure that patients with respiratory issues receive appropriate care once they discharge from the ICU. The support tool was developed, hosted, and supported entirely with R and Posit Solutions. It comprises of a machine learning model (xgboost) developed with reproducible tools such targets, renv, and git. Predictions for Mayo Clinic's ICU patients is made in real-time with the aid of vetiver and data streams supported by httr2 and DBI packages. Finally, physicians interact with a front-end UI developed in shiny and hosted on a Posit Connect server that provides recommendations to prevent patient readmission.

IV. Modernizing Flu Surveillance in the Netherlands: A Data Science Solution

Patrick van den Berg, RIVM
 

Abstract

Every year, thousands of patients are hospitalized with the flu, and to ensure there are enough hospital beds available the National Institute for Public Health and the Environment of the Netherlands (RIVM) is responsible for monitoring any trends. However, the reporting process was a laborious and manual, taking precious time from a skilled epidemiologist. Our team has automated this reporting process, making it more accurate, efficient and robust for future outbreaks. This talk will be at the cross-section of data science and epidemiology and will be a valuable opportunity for the Posit community to learn from our experiences. It will complement Naomi Smorenburg's talk on the broader application of automated reporting efforts of our team.



End-to-end data science with real-world impact
9/20/23 (16:20, 1h 20m)

End-to-end data science with real-world impact
9/20/23 (16:20, 1h 20m)

I. Using Data to Protect Traditional Lifeways

Angie Reed, Penobscot Indian Nation
 

Abstract

The spirit of Penobscot Nation’s work to protect the health of their relative, the Penobscot River, is embodied in the Penobscot water song which says “water, we love you, thank you so much water, we respect you.” This is the cultural foundation on which we use water quality data to protect traditional lifeways. For the past 10 years R and Posit products helped to manage, transform, analyze, and visualize data, setting us up to leave a legacy of good data management. In addition to involving youth in every step of our work to achieve more stringent protections for the river, we are helping other tribal professionals do the same. The work we have done is extensive; the work to come will benefit from the R community at large.

II. Democratizing Access to Education Data

Erika Tyagi, Urban Institute
 

Abstract

Every year, government agencies release large amounts of data on schools and colleges, but this information is scattered across various websites and often difficult to use. To make these data more accessible, the Urban Institute built the Education Data Portal, a freely available one-stop-shop for harmonized data and metadata for all major federal education datasets. We've also built tools around the portal including an R package, a (forthcoming) Python package, a Stata package, and an interface to generate R, Python, Stata, and JavaScript code. In this talk, we’ll demonstrate how these tools work and share lessons we’ve learned about making data accessible to users with varying technical skills and preferred programming languages.

III. Take it in Bits: Using R to Make Eviction Data Accessible to the Legal Aid Community

Logan Pratico, Legal Services Corporation
 

Abstract

One in five low-income renter households in the US experienced falling behind on rent or being threatened with eviction in 2021. Yet most are unrepresented when facing eviction in court. The complex and fast-paced legal system obscures access to timely information, leaving tenants without assistance. In this talk, I discuss the Civil Court Data Initiative’s use of R alongside AWS Cloud and SQL to analyze disaggregate eviction records for legal aid groups working to assist tenants. I focus on the integration of RMarkdown with Amazon Athena and EC2 to create weekly eviction reports across 20 states. The upshot: accessible eviction data to help legal aid providers better address local legal needs.

IV. Open-Source Property Assessment: Using Tidymodels to Allocate $16B of Annual Property Taxes

Dan Snow, Cook County Assessor's Office
 

Abstract

The Cook County Assessor's Office (CCAO) determines the current market value of properties for the purpose of property taxation. Since 2020, the CCAO has used R, Tidymodels, and LightGBM to build predictive models that value Cook County's 1.5 million residential properties, which are collectively worth over $400B. These predictive models are open-source, easily replicable, and have significantly improved valuation accuracy and equity over time. Join CCAO Director of Data Science Dan Snow as he walks through the CCAO's modeling process, shares lessons learned, and offers a sneak peek at changes planned for the 2024 reassessment of Chicago.



Elevating your reports
9/20/23 (16:20, 1h 20m)

Elevating your reports
9/20/23 (16:20, 1h 20m)

I. epoxy: super glue for data-driven reports and Shiny apps

Garrick Aden-Buie, Posit
 

Abstract

R Markdown, Quarto, and Shiny are powerful frameworks that allow authors to create data-driven reports and apps. But truly excellent reports require a lot of work in the final ~mile~ inch to get numerical and stylistic formatting *just right*. {epoxy} is a new package that uses {glue} to give authors templating super powers. Epoxy works in R Markdown and Quarto, in markdown, LaTeX and HTML outputs. It also provides easy templating for Shiny apps for dynamic data-driven reporting. Beyond epoxy's features, this talk will also touch on tips and approaches for data-driven reporting that will be useful to a wide audience, from R Markdown experts to the Quarto and Shiny curious. p.s. {epoxy} will be on CRAN soon!

II. Can I have a Word?

Ellis Hughes, GSK
 

Abstract

Since its release, {gt} has won over the hearts of many due to its flexible and powerful table generating abilities. However, in cases where office products were required by downstream users, {gt}'s potential remained untapped. That all changed in 2022 when Rich Iannone and I collaborated to add word documents as an official output type. Now, data scientists can engage stakeholders directly, wherever they are. Join me for an upcoming talk where I'll share my excitement about the new opportunities this update presents for the R community as well as future developments we can look forward to.

III. Motley Crews: Collaborating with Quarto

Susan McMillan, University of Wisconsin - Madison
 

Abstract

Our talk will be about how our adoption of Quarto for document creation has transformed the collaborative workflow in a higher education analytics office. Historically, our content experts wrote in Word documents and our data analysts ran statistical analyses and made graphs in R. Specialization in different software tools created challenges for producing collaborative analytic reports, but Quarto has solved this problem. We will describe how we use Quarto for writing and editing text, embedding statistical analysis, and producing reports with a standard style in multiple formats. Speaker topics: Wyl Schuth: writing, editing, and html/markdown formatting Michael Zenz: embedding R/Python code Susan McMillan: data visualization/reporting

IV. Custom Quarto reports improve farmer understanding of soil health

Jadey Ryan, Washington State Department of Agriculture
 

Abstract

Soil sampling data are notoriously challenging to tidy and effectively communicate to farmers. The Washington Soil Health Initiative analyzed 702 soil samples from 223 farmers. Moving away from Excel, we used functional programming with the tidyverse to reproducibly streamline data cleaning and summarization. To improve project outreach, we developed a Quarto template to dynamically create interactive HTML reports and printable PDFs. Custom to every farmer, reports include project goals, measured parameter descriptions, summary statistics, maps, tables, and graphs. Our case study presents a workflow for data preparation and parameterized reporting, with best practices for effective data visualization, interpretation, and accessibility.



I can’t believe it’s not magic: new tools for data science
9/20/23 (16:20, 1h 20m)

I can’t believe it’s not magic: new tools for data science
9/20/23 (16:20, 1h 20m)

I. Running R-Shiny without a server

Joe Cheng, Posit Software, PBC
 

Abstract

A year ago, Posit announced ShinyLive, a deployment mode of Shiny that lets you run interactive applications written in Python, without actually running a Python server at runtime. Instead, ShinyLive turns Shiny for Python apps into pure client-side apps, running on a pure client-side Python installation. Now, that same capability has come to Shiny for R, thanks to the webR project. In this talk, I'll show you how you can get started with ShinyLive for R, and why this is more interesting than just cheaper app hosting. I'll talk about some of the different use cases we had in mind for ShinyLive, and help you decide if ShinyLive makes sense for your app.

II. Magic with WebAssembly and webR

George Stagg, Posit, PBC
 

Abstract

Recently webR v0.1.0 was released, and users have begun building new interactive experiences with R on the web. In this talk, I'll about webR's TypeScript library and the kind of things that it can do. The library aims to allow users to interact with the R environment directly from JavaScript, which allows for manipulation tricks that when not seen before can seem like magic. I'll begin by describing how to move objects from R to JS and back, and talk about the technology behind the proxy objects that make this possible. I'll continue showing advanced manipulation, such as invoking R functions from JS and vice-versa. Finally I will describe how messages are sent on webR's communication channel, building up to a demo of a "Shinylive for R".

III. AI and Shiny for Python: Unlocking new possibilities

Winston Chang, Posit, PBC
 

Abstract

In the past year, people have come to realize that AI can revolutionize the way we work. This talk focuses on using AI tools with Shiny for Python, demonstrating how AI can accelerate Shiny application development and enhance their capabilities. We'll also explore Shiny's unique ability to interface with AI models, offering possibilities beyond Python web frameworks like Streamlit and Dash. Learn how Shiny and AI together can empower you to do more, and do it faster.

IV. Large Language Models in RStudio

James Wade, Dow
 

Abstract

Large language models (LLMs), such as ChatGPT, have shown potential to transform how we code. As an R package developer, I have contributed to the creation of two packages -- gptstudio and gpttools -- specifically designed to incorporate LLMs into R workflows within the RStudio environment. The integration of ChatGPT allows users to efficiently add code comments, debug scripts, and address complex coding challenges directly from RStudio. With text embedding and semantic search, we can teach ChatGPT new tricks, resulting in more precise and context-aware responses. This talk will delve into hands-on examples to showcase practical application of these models, as well as offer my perspective as recent entrant into public package development.