Principal component analysis (PCA) is a powerful approach for exploring high-dimensional data, but can be challenging for learners to comprehend. In this talk, I will walk through a practical and interactive explanation of what PCA is and how it works. As a case study I’ll explore a domain that many data analysts and data scientists are familiar with: programming languages and technologies, as understood through traffic to Stack Overflow questions. We will explore how interactive visualization using Shiny gives us insight into the complex, real-world relationships in high-dimensional datasets.
Julia Silge is a data scientist and software engineer at RStudio PBC where she works on open source tools for machine learning and MLOps. She holds a PhD in astrophysics and has worked as a data scientist in tech and the nonprofit sector, as well as a technical advisory committee member for the US Bureau of Labor Statistics. She is a coauthor of Tidy Text Mining with R, Supervised Machine Learning for Text Analysis in R, and Tidy Modeling with R. An international keynote speaker and a real-world practitioner focusing on data analysis and machine learning, Julia loves text analysis, making beautiful charts, and communicating about technical topics with diverse audiences.