Open source in pharma from five perspectives

Data science leaders from Roche, AstraZeneca, GSK, Eli Lilly, and Pfizer share their perspectives on the adoption of open source in the pharmaceutical space.
On the left, a minimalist cartoon of people doing various pharma activities. On the right, text that says Pharma Community Events Round Up.

The pharmaceutical industry is constantly evolving to develop new treatments and therapies for a wide range of diseases and conditions. As we look to the future, many exciting open-source developments have the potential to transform the way that research and reporting are done.

We have seen considerable investment in and long-term commitment to open source, particularly in R as a statistical platform, over the past several months when data science leaders at leading organizations across the industry joined us to host virtual events:

  • Shifting to an Open-Source Backbone in Clinical Trials with Roche – Ning Leng, People and Product Lead, Kieran Martin, R Enablement Lead, and Thomas Neitmann, Senior Data Scientist, at Roche and Genentech presented their ambitions to make open source the default for clinical workflows in 2023.
  • R at AstraZeneca – Gabriella Rustici, Data Science Learning Senior Director, and Guillaume Desachy, Statistical Science Director, presented AstraZeneca’s plans to upskill their workforce on open-source languages like R at scale, along with their experience with Posit Academy.
  • Data Science Hangout with Christina Fillmore, Data Science Leader at GSK Christina led our weekly Data Science Hangout, answering questions about clinical workflows and offering insight into her work with open source.
  • Data Science Hangout with Eric Nantz, Director at Eli Lilly and Company – Eric detailed his early experiences using R and Shiny at Eli Lilly, along with his inspiring progress and ambitions for open source in the clinical space.
  • Data Science Hangout with Mike Smith, Senior Director at Pfizer – Mike, a longtime open-source leader and member of the R Consortium, discussed his efforts to build an R Centre of Excellence at Pfizer.

These presentations provide over 5 hours of discussion on open source in the pharmaceutical space, and how collaborators across the industry are working for the common good. This blog post highlights a few insights that particularly excite us.


Perspective #1: Shifting to an Open-Source Backbone in Clinical Trials with Roche


In 2023, Roche will make R the core data science tool for new clinical studies. This shift represents an exciting time in the industry, and we highly recommend watching the full recording of the presentation detailing everything involved.

Roche's 2023 Goals

Roche’s Ambitious Goals for 2023


Kieran Martin, R Enablement Lead at Roche, kicked off the presentation detailing the two reasons driving this shift. 

One reason is the talent base. “Generally speaking, when we’re recruiting new graduates, they’re much more likely to know these open-source tools such as R and Python than they are commercial software.”

The other major reason is more future-looking. “We wanted to go open source. And the reason for this is we think open source offers loads of opportunity for how we work.”

Those opportunities include:

  1. Getting the latest developments more rapidly – “Typically, if there’s a new statistical method used to analyze data, that will often be implemented in R first. So, if we want to get access to those more quickly to provide better results, we should be using R.
  2. Ability to switch between contexts and languages more easily – “We knew we wanted to go language agnostic, and we knew we wanted that ability to switch between languages, and we know that open source is going to make that easy. Because with no propriety formats, it’s usually easier to switch between formats.”
  3. Unprecedented collaboration with external partners – “If we go open source, we suddenly have this ability to get the inputs of people from across the industry, which I think in the long run will lead to much more efficient and better code, and ultimately better outcomes for us and for our patients.”

Roche - Why Change

The Benefits of Shifting to Open Source


The team expanded on this cross-industry collaboration, highlighting the pharmaverse, a curated stack of open-source R packages that enable clinical reporting from beginning to end. The pharmaverse is backed by a community of individuals and organizations, including Atorus, GSK, J&J, and Roche. Projects like the pharmaverse enable pharma companies to create tools that aid in the shift to an open-source backbone.

Thomas Neitmann featured admiral, an open-source modular package for generating ADaM datasets in R. Admiral was borne out of a collaboration between Roche and GSK, but it wasn’t a package that began closed source and was opened up. The package was developed in the open from the beginning and has since grown into an ecosystem of packages from other contributors. “This has been quite a success,” says Thomas. “We certainly want to use this model of collaboration moving forward.”

Thomas also zoomed in on the specific next-gen tools that Roche is using for clinical reporting. One in particular, OAK, is Roche’s solution for automating STDM mapping. “STDM mapping used to take a considerable amount of our time and was rather a labor-intensive process, so we opted to streamline this by creating an automated solution.” The result and benefit: OAK can now automate around 80% of STDM domains with around 22 reusable algorithms. This, in turn, really saved us a lot of time. We’ve seen efficiency gains by at least 50%.”

Automating SDTM Mapping using OAK

Using R to Automate SDTM Mapping


Thomas notes that OAK is a relatively simple tool, making it sufficient for both experienced and beginner R developers. “It’s a single R package you interact with plus a web app.” However, under the hood, OAK is more complex. The tool needs to connect to other systems, which the team at Roche refers to as The OAK Garden.

The OAK Garden

The Systems that OAK Needs to Connect With


The presentation concluded with Kieran and Ning Leng zooming back out and discussing the shift from business and organizational perspectives.

Our takeaway: It is crucial to embrace lifelong learning and an open-source culture. Kieran acknowledges a mix of experience and requirements in R. There are a host of tools: assessments, self-guided resources, training resources, and more. There are also mentoring opportunities to upskill as a team. By providing programmers with solutions that fit their needs, the Roche team can get their talent base ready for the open-source shift.

Ning notes the beauty of a modern open-source backbone but acknowledges the challenges of transitioning a business. Project teams may encounter difficulties shifting their work in a timely manner. To ensure success, “project teams need their own fit-for-purpose plan,” she says. Like renovating a house, a team needs to assess their needs, timing, and available resources to decide how to move forward. Parallel to this work, the broader organization is building an adoption roadmap to onboard project teams to make the transition for new hires easier. Throughout all this, they gain the feedback needed to improve their tools. “The journey continues, and we’re really excited to be part of this journey.”


Perspective #2: Teaching R at AstraZeneca


“There’s been a true paradigm shift,” says Guillaume Desachy of open source in the pharmaceutical industry. As Statistical Science Director at AstraZeneca, he has witnessed the evolution from the prevalence of two distinct camps – R or SAS – to an interconnected system between the two languages.

New hires are often multilingual: comfortable with both languages and moving between them. Multilingualism extends beyond statisticians and data scientists to all organizational roles, from clinicians, medical directors, and more.



The team at AstraZeneca recognizes that most R programmers are self-taught and have informal upskilling opportunities. “Not everybody is fortunate, at their workplace or throughout their career, to have a wonderful, dedicated learning development team,” noted Gabriella Rustici, Data Science Learning Senior Director. Formal structures that provide education, exposure, and experience are vital for data scientists to grow their skills.

AstraZeneca invested significant resources and time to develop a Data Science Academy. The Academy provides a diversity of learning options. Virtual, instructor-led courses provide a project-based curriculum from experienced instructors. Experimental learning initiatives such as an internal R conference allow attendees to network with their fellow learners. Self-study resources offer data scientists an asynchronous, flexible way to upskill.

AstraZeneca also enrolled a cohort into Posit Academy, our training solution for professional teams that want to learn R and Python. AstraZeneca was one of the first cohorts to go through Posit Academy, prioritizing employees who had a timely need to learn and implement R and citing Academy’s “training around a particular project or objective” relevant to the pharmaceutical space.

Crucial to the success of the Data Science Academy was buy-in from AstraZeneca leadership. The leadership team raises awareness of the importance of setting aside time for learning. Managers are encouraged to see the connection between training and achieving goals. The leadership team also helps steer the development of these programs. This collaboration aligns all levels of the organization on the purpose of the Academy and ensures that formal structures grow and expand with their teams.


Perspective #3: Data Science Hangout with Christina Fillmore, GSK


In a recent Data Science Hangout, Christina Fillmore from GSK joined us to chat about package development, community building, and tables (so many tables!). The whole 60-minute conversation was great! One particular thing that caught our attention was this insight:


Hangout Highlight: Developing an end-to-end pipeline in R


When it comes to data science at GSK, “the priority has really been developing an end-to-end pipeline for us that’s feasible,” says Christina. “A lot of pharmas are moving into the R space and starting to use R as a primary tool in order to report clinical trials…we want a fully-developed pipeline for R.” As part of this effort, Christina and her team work on packages in the pharmaverse such as metacore, whose goal is to establish a common foundation for the use of metadata within an R session.

As mentioned above, the pharmaverse is backed by a community of individuals and organizations. “It’s a collaboration where other pharmas have also said it’s important,” says Christina, particularly since it’s an opportunity for an organization’s voice to be heard. Different organizations working together ensure that the tools are useful and applicable to different standards and needs. “We are making sure that the code or products we’re developing are in the open so that our pharmas can use those the way they see fit.”

“We are really trying to focus as much as possible to do stuff in the open source,” emphasizes Christina. Open-source tools bring various advantages. Packages are freely accessible to others. Data scientists are able to automate their workflow. By developing packages that mirror SAS macros, users can easily transition from their previous workflows.

Christina acknowledges there is still work to do. “One thing that has been a barrier is without every single building block…it can be hard to take the leap and try R on a study.” With the dedicated work from groups like Christina’s to expand open-source tools in the pharma space, pharmaceutical leaders will be reassured that the appropriate tool is available to run clinical trials.



Perspective #4: Data Science Hangout with Eric Nantz, Eli Lilly


During Eric Nantz‘s Data Science Hangout, the discussion topics ranged from his love for R to his transition from academia to working at Eli Lilly. He also discussed his view on the most significant changes in pharma over the past 12 years. We wanted to share one insight below:


Hangout Highlight: Moving from open source as a tool to open source as infrastructure


Minimizing the white space between treatment development and getting medicine to patients is the utmost goal of pharmaceutical companies. Tools that allow data scientists to run efficient analytics in high-computing environments are very attractive. “[We’re] starting to get more attention on what we’re seeing the general tech sector do for voluminous data and easier access to data… I think the seeds are being planted,” says Eric.

Many may be aware of the benefits of open-source tools like Shiny to create interactive web applications. Eric advocates for the ‘downstream’ use of open source to develop stand-alone solutions. They are immediately accessible to stakeholders, who can recognize the benefits by interacting with the data products.

However, Eric notes the opportunity to embrace open source in bigger ways. By using open-source tools more ‘upstream’ and integrating them with other systems, data scientists can achieve something that may be difficult, time-consuming, or impossible with proprietary data formats.

Eric led by example at Eli Lilly, demonstrating the potential of R to turn around answers quickly. He opened the eyes of leadership to what is possible, giving them a peek into “what…people are doing with open source, bringing automation in a seamless way.”

Open-source infrastructure provides a lower barrier compared to proprietary formats but also opens the door to broad innovation in design. High-performance computing systems empower data scientists to move beyond traditional analytics and run cutting-edge algorithms. Ultimately, gains in efficiency and quality mean that patients get better treatments faster. “The fact that we got this mandate that said we need to shorten the time we’re in this research phase… we had to really put all hands on deck to put out new solutions,” says Eric. “It also opened the door for those in my group…to think differently about how we’re leveraging R in the design space and become one of the industry leaders in doing clinical simulation.”



Perspective #5 Data Science Hangout with Mike Smith, Pfizer


Mike Smith, Senior Director at Pfizer, shared many wonderful insights during his Data Science Hangout, especially for those wanting to learn more about a career in data: should you study statistics or data science? R or SAS? Mike also described how they built a Center of Excellence at Pfizer to help teams across the business build reproducible workflows and use analytics tools effectively & efficiently. We want to highlight this section below:


Hangout Highlight: Building communities to harmonize work and solve problems faster


“There’s a benefit in being able to solve problems strategically,” shares Mike. In a poll across the company, he found that 1,500+ colleagues had downloaded R. People in different business lines worldwide were doing amazing work with open-source tools, but they weren’t yet sharing that across the organization.

It was essential to build a community to bring people together, harmonize how they do things, provide a platform to share lessons learned, and give people an opportunity to show off their work.

In starting this community journey, Mike recommends talking to someone with experience. Doug Robinson came to Pfizer from Novartis, where they had done something similar. They concluded that they needed a technical team to help solve problems.

This led to the creation of their R Center of Excellence at Pfizer – a focused group dedicated to helping teams across the business build reproducible workflows and use analytics tools effectively & efficiently.

With the R Center of Excellence, the ongoing community building at Pfizer has been a multi-pronged approach. They have created a Teams Channel with 1,000 people to broadcast tips and tricks, a monthly R Community of Practice seminar where they feature different success stories, a newsletter on the advances in R with a focus on Pfizer, training opportunities, and participation in external community groups – like the R Consortium working groups. These efforts ensure Pfizer’s voice is being heard.


Learn more about open source in pharma


Thank you to these respected data scientists for the hours spent with us. The focus on open source is evident from what we’ve seen in these events, and we look forward to learning more about how open source will shape the future of healthcare for all of us.

  • Visit our pharma industry page to learn more about the exciting shift that is happening across the industry, including how our industry experts might be able to help you adopt open source.
  • Explore Posit Academy, our bespoke training solution to help professional teams learn R and Python.