The human brain has developed to support two radically different ways of thinking: focused and holistic. For example, when wild animals feed, they focus on getting the juiciest, greenest new leaf, and at the same time being broadly vigilant on the environment to scan for danger.
This divided nature of the brain allows focused thinking as well as developing a more holistic point of view. Even the state of the art in machine learning and AI is still highly focused – the best algorithms can still only perform very narrowly defined tasks, despite significant progress in self-driving cars, machine translation, computer vision, etc.
The practice of data science itself can be narrow or holistic. Too often, data scientists think of the algorithm, hand-building a neural network, or tuning the model performance as the sexy part of their job. But what happens once automatic machine learning gets established as the best model builder? Then the narrow role of the data scientist must become more holistic. The data scientist must add value by selecting data sources that contain signal, and also interpret the results for validity.
In addition, the data scientist must ensure they are aware of the implications of their work, keeping in mind ethical issues such as privacy, safety and inclusiveness.
In this talk, Tareef Kawaf explores these issues, and discussed how RStudio solves some very specific data science pain points, as well as thinking holistically about developing software for the future.