From IT to management: How data science leaders are harnessing the power of open source in pharma

On the left, a minimalist cartoon of people doing various pharma activities. On the right, text that says Pharma Community Events Round Up.

We recently wrote about how open source has emerged as a powerful tool for innovation and collaboration in the pharmaceutical industry. From end-to-end open-source workflows for clinical reporting to high-performance computing environments, open source is transforming the industry and driving positive change for patients worldwide.

During our weekly Data Science Hangouts, data science leaders from across organizational departments shared their roles and responsibilities in the pharma industry as well as their connection to the open-source world:

Data Science Hangout with Satish J. Murthy, Senior Manager, Pharma R&D IT at Janssen, described the technical infrastructure and requirements behind Janssen’s data science platform.
Data Science Hangout with Regis James, Senior Manager, Biopharmaceutical Data Science at Regeneron Pharmaceuticals, shared how to achieve scalability and show the value of community.
Data Science Hangout with David Granjon and Bo Wang, Senior Experts Data Science at Novartis described how they use Shiny to engage their colleagues.

We were truly captivated by the insights shared during the hangouts. Each co-host has different experiences and perspectives depending on their respective job titles, resulting in varying approaches to harnessing the power of open source. The discussions shed light on the diverse possibilities in continuing the transformation in the industry, and we’re delighted to share a few highlights below.

Satish J. Murthy, Senior Manager, Pharma R&D IT at Janssen

Satish J. Murthy and Mike Stackhouse, Chief Innovation Officer at Atorus, joined us at a recent Data Science Hangout. During the hangout, we dove into the topics of validated environments, containerization, and proving the stability of open-source tools to regulatory organizations. Our ears perked up at every mention of the implementation of open source in the IT infrastructure of a pharma company.

Data Science Hangout Highlight: Implementing open-source tools to set up a validated environment that builds confidence in clinical trial results

Satish primarily focuses on the R (and soon Python) data science platform that Janssen scientists use for clinical trial reporting and grid-based computing. The platform also supports SAS. Still, Satish says, “as people see the power of R and open-source community support, they ask for more capabilities to be added to the platform.” Because of this, “there is an accelerated growth to have them move to R.” Therefore, Satish works on growing the R use case at the organization, which enables his team to adopt other tools and tech.

But there’s a caveat. Development environments must be “validated”: users must have confidence that what they did yesterday will run the same way today. Because of the power of open source and the number of people contributing to the R ecosystem, “often what we are seeing from an IT perspective, there are some challenges in reproducing this platform as-is,” notes Satish. Validation isn’t just a user preference, either. In the highly regulated pharmaceutical industry, it is a requirement.

Thankfully, IT can implement the necessary tools to meet regulatory requirements and build users’ confidence in their environment. For this, Janssen partnered with Atorus to create a validated setup. Examples of the tools include the renv package to record package versions in projects and Posit Package Manager to lock down versions of R packages.

When building these environments, it is crucial to refrain from introducing changes that will break things. “The one thing that the users absolutely want is stability,” states Satish. Containerization allows users to test workflows and packages for clinical studies. Once defined, the IT runs through the build process and creates containers that are verified in Posit Workbench. If all looks good, the container is locked and moved to production. A development environment must run through production before it is validated.

IT also helps with the linting and unit testing pipeline so that users can focus on their expertise rather than trying to build something independently. “We make it sort of easier for them,” says Satish. “We are trying to templatize [the process] to minimize disruption to our users.” Containerization and templatizing pipelines ensure that nothing breaks, everything is tried and true, and all work can be backed up and replicated.

There is an element of change management in the implementation of open-source tools. Mike emphasized the importance of bringing people along for the journey. “There’s definitely a balance to strike from an organizational standpoint,” he states. This balance boosts users’ confidence and comfort in their open-source tools, increasing the chance that the tools will be used in the long term.

Regis James, Senior Manager, Biopharmaceutical Data Science at Regeneron Pharmaceuticals

In a recent Data Science Hangout, Regis James discussed network graphs, scaling knowledge in a global organization, how people’s lives are like the Tron light cycle paths, and thoughtful approaches to enabling others to make data-driven decisions. Regarding open source, Regis shared his experience creating a community of computational colleagues and how that became part of his job role.

Data Science Hangout Highlight: Creating and growing a community across a global company

How do you show people what is possible at your organization? At Regeneron, Regis does this in several ways. One is by using cutting-edge machine learning and artificial intelligence to improve the quality of clinical trials. Another is by maintaining a community of data scientists around the company.

His community-building work began when he set up a Posit ecosystem to conduct his work. Others caught on and were eager to use the tools. They began creating and sharing Shiny apps that could be viewed not just by other data scientists but by higher-ups, too. The showcase of apps helped illuminate “the art of possible.”

Thus began a community that assembles computational colleagues for networking, exchanging insights, and initiating collaboration. Regis shares an example where three departments discovered they were all running into the same natural language processing issue. Before the meeting, the teams did not know that others were working on the same problem. Because they had the time and space to meet, they realized they shared the same questions and could answer them together.

However, a community does not keep going on its own. “With great open source access comes great responsibility to support an ongoing community of people,” says Regis. He describes various activities he does to sustain and grow the group: from scheduling to coaching speakers to documentation. Contributors get to share their successes and lessons learned, and participants walk away with insightful and relevant information.

Beyond discrete tasks required, leadership buy-in is key for a successful community. To gain their endorsement, first, you must define the role of data science. “The way to prove the value of a community… I think about ‘stuck’ and ‘unstuck’,” he says. “There are obstacles. How do they get unstuck? That’s what data science is for.” Once defined, you can create a link between the community and the work. “You can show people in the organization did this, this, this, and this… that shows impact.”

David Granjon and Bo Wang, Senior Experts Data Science at Novartis

David Granjon and Bo Wang from Novartis joined us at a special Data Science Hangout during the Appsilon Shiny Conference. Throughout the hour, they shared their thoughts on topics like great user interfaces, organizational silos, and how the renv package enables reproducibility. Shiny plays a major role in their jobs, and we were delighted to hear how interactive apps empower their team.

Data Science Hangout Highlight: Sharing user-friendly, self-service, maintainable tools

Data scientists’ use of Shiny to present data in a more accessible format has proven to be a valuable asset in the industry where time is a precious commodity. “Data review in the industry can be complex and time-consuming,” Bo says. “[The reviewers’] time is very expensive…they have very limited time that they can give us to review the data.” Thanks to Shiny’s interactivity, Bo is able to create tools that speed up the process. Bo describes one of her projects as “writing and managing a Shiny app that displays safety data in a more user-friendly way than 200 pages of PDFs.” She concludes, “So having an app as a supplementary tool aids their review of the data with the limited time that they have.”

David described one of his responsibilities: a human resources data project. There are over 100,000 employees at Norvatis, and he was tasked with creating a tool to help with time management across people and departments. He began with Shiny. Thanks to the integration capabilities in open-source tools, he expanded his apps to use databases and tools like React. “It shows how you can integrate Shiny into the whole ecosystem, and that was a great project.”

He also used the tools available on Posit Connect, Posit’s publishing platform, to improve his work. With the Connect API, he can see “who is accessing the application, which gives you some insights like which team is using it and how to target support.” He supplemented this information by developing an R package, shinyHeatmap, to record in-app usage. “Each click is recorded to identify dead zones to refactor design,” he says. “If you have an app no one visits, maybe it’s poorly designed.” Thanks to the built-in features of Connect, as well as the ability to create and share with open-source programming languages, he can build Shiny apps that people actually use.

Shiny offers the additional benefit of leveraging the increasing popularity of the R programming language in the industry, so the responsibility of maintenance is across a larger group rather than a select few. “Our goal is to create something that we can ship off to our statisticians to use on their own and to manage on their own,” says Bo. “We have much, much better adoption of R, especially with the younger generation of statisticians. Many of them are quite fluent in R.” David agrees. “R is doing what we want to do.”

Learn more about open source in pharma

Thank you to our data science leaders for sharing their experiences and how they harness the power of open source at their organizations.

We look forward to supporting everybody in their critical role in the open-source transformation, whatever their job title.

Join us at our next Data Science Hangout to learn more from industry leaders: pos.it/dsh
Visit our pharma industry page to learn more about the exciting shift happening across the field, including how our industry experts can help you adopt open source.
Explore Posit Academy, our bespoke training solution to help professional teams learn R and Python.
Find resources and guidance on building a community at your organization on the Community Building website.