Reproducibility is a critical aspect in science to enable trust & communication. In R, many tools exist to bring in the best practices of reproducibility into the hands of data scientists. However, outside of a research setting, how does reproducibility hold up in commercial data science projects? In this talk I take an honest retrospective of my own commercial R projects in the last year. I look at the various types of analyses completed, and which workflows were selected and why. Through this process we can learn how workflow choices may help in the short term but hinder in the long term. More importantly what can be done strike the balance between progress and perfection when doing data science in the wild?
Dean Marchiori is a Statistician based in Sydney, Australia. He currently works with Endeavour Energy as a Senior Data Scientist modelling bushfire and vegetation risk on the electricity network. Dean's career started in finance as an equities trader before moving into advanced analytics, where he has worked with some of Australia’s largest organisations. His professional interests are in geospatial analysis, time series modelling and the R programming language. In 2019, Dean was named one of the top 10 analytics leaders in Australia by IAPA. He is also recognised as an Accredited Statistician with the Statistical Society of Australia. He holds a Bachelor of Science in Mathematics (awarded with University Medal) and a Masters degree in Applied Finance. Outside of work Dean enjoys bodysurfing, running and spending time with his wife and two boys.