Serious Data Science
Set your team up for success
The building blocks of serious data science
By adopting an open source core, you make it easier to recruit and retain data scientists, while the comprehensive nature of open source ensures you will always have the right tool for any analytic problem, including the ability to connect to all your other analytic investments. You also avoid putting yourself at the mercy of any specific vendor, since your core data science work is based on R or Python.
Complex, sometimes vaguely-defined analytic problems require the power of code. Code is flexible, without any black box constraints, and enables you to access, transform and combine ALL of your data. Code enables fast iteration and updates in response to feedback, or new circumstances. And most importantly, by its very nature code is reusable, extensible and inspectable, allowing you to modify and apply it to new problems, and track where changes occurred. Code becomes a core source of intellectual property in your organization, the value of which grows over time.
By centralizing your data science infrastructure, you break down the siloes which impede your productivity. This allows you to reduce unnecessary time spent maintaining individual data scientist's environments, and promotes collaboration. Deploying your team's data science work to your stakeholders gives them self-service access to the insights when and where they need them, greatly increasing the impact of your team's work. Centralizing your development and deployment environments makes administration, security and management far easier, and package management promotes reproducibility over time.