Instructors teaching the R language to beginners have many choices about what programming strategies they teach first. In this talk, I’ll make the argument that teaching data transformation and visualization with the dplyr, tidyr, and ggplot2 packages is a suitable first introduction to data analysis in R. Some advantages of this approach include that it produces useful results as early as possible, that it encourages productive habits around organization of data, and that it offers a consistent and memorable syntax. I’ll also describe how base R syntax can be taught within a course as it becomes useful for solving problems. I’ll also discuss potential pitfalls of the Tidyverse-first approach, and what kinds of curricula it may be less suited for.
David is the Chief Data Scientist at DataCamp, an education company for teaching data science through interactive online courses. His interests include statistics, data analysis, education, and programming in R.
David is co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R. He also the author of the broom, gganimate, and fuzzyjoin packages, and of the e-book Introduction to Empirical Bayes.
David previously worked as a data scientist at Stack Overflow, and received a PhD in Quantitative and Computational Biology from Princeton University.