The Data Science community is dominated by folks doing amazing work with data that starts in and never leaves cyberspace. This talk is about best practices and playbooks for doing data science that involves meatspace (the opposite of cyberspace) and why R is such a great language for working with data that originated in the physical world. While the concrete examples in this talk will mostly come from the manufacturing space, where I have the most experience, I believe the themes are relevant to many meatspace workflows. We'll talk through effective playbooks that can help you navigate common tasks throughout the life-cycle of a project. We’ll also weave in how R’s glorious package ecosystem, including `Tidyverse`, can be combined with other languages like `python`, and with enterprise products like RStudio Connect to great effect. Specifically, we'll discuss practices in these areas:
- best practices for data collection in meatspace
- the importance of quantifying measurement system error
- collecting the correct data for training computer vision models
- the rarely discussed cost of maintaining models in production
BenJoaquin works at Plenty, an indoor vertical farming startup headquartered in San Francisco growing craveable fruits and vegetables. He leads Plenty's Datalab team, responsible for Data Science and Perception initiatives in their farms (yes, they are hiring). Prior to Plenty, BenJoaquin led Tesla's central Data Science team focused on Manufacturing where he and his teams developed and deployed model-driven automation equipment. Before finding Data Science, BenJoaquin was a Manufacturing Engineer responsible for Tesla's highest volume manufacturing process where he developed a love for R and all things tidy.