Categorical data, called “factor” data in R, presents unique challenges in data wrangling. R users often look down at tools like Excel for automatically coercing variables to incorrect datatypes, but factor data in R can produce very similar issues. The stringsAsFactors=HELLNO movement and standard Tidyverse defaults have moved us away from the use of factors, but they are sometimes still necessary for analysis. This talk will outline common problems arising from categorical variable transformations in R, and show strategies to avoid them, using both base R and the Tidyverse (particularly, dplyr and forcats functions).
(related paper from the DSS collection)
Dr. McNamara is an assistant professor of statistics in the Department of Computer and Information Sciences at the University of St Thomas, St Paul, MN. She received her BA in English and mathematics from Macalester College, and her PhD in statistics from UCLA. Her research interests include statistical computing, statistics education, data visualization, and spatial statistics.