In this talk, I’ll lay out the reasons that blogging, open source contribution, and other forms of public work are a critical part of a data science career. For beginners, a blog is a great accompaniment to data science coursework and tutorials, since it gives you experience applying practical data science skills to real problems. For data scientists at any stage of their careers, open source development offers practice in collaboration, documentation, and interface design that complement other kinds of software development. And for data scientists more advanced in their careers, writing a book is a great way to crystallize your expertise and ensure others can build on it. All of these practices build skills in communication and collaboration that form an essential component of data science work. Each also lets you build a public portfolio of your skills, get feedback from your peers, and network with the larger data science community.
David is the Chief Data Scientist at DataCamp, an education company for teaching data science through interactive online courses. His interests include statistics, data analysis, education, and programming in R.
David is co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R. He also the author of the broom, gganimate, and fuzzyjoin packages, and of the e-book Introduction to Empirical Bayes.
David previously worked as a data scientist at Stack Overflow, and received a PhD in Quantitative and Computational Biology from Princeton University.