GSK x Posit Live Event November 2024 Q&A
Do you want to know how a global pharmaceutical giant like GSK is scaling the “Open-Source Mountain”? In a recent webinar, GSK data science leaders shared their experiences in transitioning from pilot projects to enterprise-wide adoption of open-source tools, particularly R. Andy Nicholls (Head of Data Science Innovation and Engineering), Ben Arancibia (Director of Data Science), and Becca Krouse (Data Scientist) discussed overcoming organizational barriers, upskilling teams, and driving cultural change. Watch the recording:
The session wrapped up with a lively Q&A, but some questions went unanswered. The team tackles the most pressing ones below.
Moving to open-source tools
In what ways has moving to open source helped GSK to innovate?
Open sourcing allows us to get a diversified set of perspectives on user design and how to solve problems. This is crucial for getting the best technical solutions for problems. Seeing others’ work and combining it with our skills/capabilities allows us to innovate and blend approaches to solve problems.
Many public health graduate schools teach R nowadays. Did it help?
Yes, the changes in school curriculum and workforce were one of the drivers for our starting to use R more in our pipeline.
Why do study team members need to use R itself instead of using R Shiny to enable R use without knowing coding? Why not implement Shiny dashboards? Why do teams need to learn code?
Shiny is a package that makes it easy to build interactive web apps. Study teams use code for a variety of tasks, including data transformations, derivations, and statistical analyses. However, it is not feasible to have study teams use Shiny for all tasks, especially highly custom or statistically complex tasks.
With Python, is reproducibility problematic?
No reproducibility is not problematic.
How do you do code review internally? Which standard would you adhere to? (For example, Google code review practices.)
We use our own code review practices and utilize tooling in GitHub, such as pull requests. Practices depend on the team and individuals doing the review. For example, a code review might be different for a group of clinical programmers compared to a group of statisticians. Additionally, a coaching or mentoring relationship might yield different practices than a peer-to-peer relationship.
Quarto supports the integration of several programming languages, such as R, Python, Jupyter, etc. Do you support Quarto’s goal at GSK with this in mind?
We use Quarto for different activities in GSK.
Can you post the link to the git repo for Becca’s comparison of R, SAS, and Python?
CAMIS is here: https://psiaims.github.io/CAMIS/
Platform, packages, and environments
Many companies, including GSK, are moving to cloud-based analysis and reporting. Is your R approach working there?
Yes. Our approach has been to just lift and shift from on-prem to the cloud. There have been some lessons learned and obstacles, but generally, it is working.
Can you comment on your SCE and how R is integrated into it? What SCE are you using?
We have Posit Workbench in one system and Domino Data Platform in another system.
How do you typically validate a package, let’s say, dplyr?
GSK helped write the white paper on the R Validation Hub: https://www.pharmar.org/white-paper/. Our views on package validation and system validation are aligned with the white paper.
How do you guys deal with dependency problems, i.e., using other Bioconductor packages that depend on the same packages but have different versions?
We solved the dependency problem by using CRAN snapshot dates instead of specific package versions. This ensures all packages are compatible, helping prevent unexpected dependency issues. For Bioconductor specifically, users must manage their own environment and package versions. We currently do not have Bioconductor packages in our Frozen R Environment, but we might consider adding them in the future.
R constantly has updates. How does GSK work with the updates and with code from older versions of R?
For clinical studies, we create frozen R environments based on CRAN Snapshot dates. These environments contain a fixed set of packages that do not change and cannot be modified by users. Frozen R Environments contain all the necessary tools for studies. These environments are updated at a slower pace. For more exploratory projects, we believe users should update as often as they want, and we encourage frequent updates.
Can you please discuss the platform and package validation and whether the FDA’s CSA guidance has been helpful?
We helped write the R Validation Hub White Paper: https://www.pharmar.org/white-paper/. Our views can be found there. We follow the framework described there.
More on the team and AccelerateR
Is the R Engineering team part of IT or the Biometrics/stats team? What is the validation strategy you use for study delivery?
It sits within the Statistics and Data Science Innovation Hub, which sits in Biostatistics. We helped write the R Validation Hub White Paper: https://www.pharmar.org/white-paper/. Our views can be found there. We follow the framework described there.
How many people were on the AccelerateR team, and how many studies would you typically be involved in at a time?
The number of people on the AccelerateR team is quite small, typically around 4-6. Around three are permanently part of the AcceleateR team, and the other 1-3 individuals rotate into the team through a secondment-type model. Once the team finishes working on a study, we ask that the study provide individuals to help support the next study. It is not a full-time project and is advertised to do personal development.
The AcceleateR team focuses on supporting and mentoring 1 study at a time. The team will continue to answer one-off questions from other study teams as well. AccelerateR Talk at posit::conf(2023) is found here: https://www.youtube.com/watch?v=VDu2qdpYko8
Change management
Internal researchers often perform the same tasks across studies, so how do you motivate them to develop in-house, home-grown solutions?
We strive to have internal tools written in R that perform the same tasks across studies. This allows study teams to focus on working on nonstandard code. We have central standard reporting tools teams that develop internal standard tool code that addresses the same tasks across studies.
When delivering R training, how did you support colleagues emotionally through periods of doubt or frustration, especially when the learning curve was steep?
We support individuals by meeting them where they are in the learning curve. We engage with colleagues at all levels and our goal is to help them through whatever phase of the curve they are in. This usually is a combination of being an active listener to their issues, answering any questions they have related to R, or connecting them with a mentor or other expert to help guide them. The support is highly individualized.
Have any colleagues opposed open source projects due to misguided security concerns or not seeing the value added? If so, how did you resolve it?
We encountered this at the beginning of our journey, but it has been resolved by having conversations with those who have concerns, listening to those concerns, and discussing how to mitigate those concerns. Improved general cybersecurity knowledge in the public has also helped to resolve issues.
How was the adoption of the whole model experience from the people’s perspective? Was there resistance from teams/persons involved? How did you deal with it?
We allow people to use whatever tools they feel help them deliver the fastest. During our early pilot projects, we focused on motivated users. Proving that R can be used successfully end-to-end helps individuals who have resistance. There will always be individuals who prefer to use the tools they are most comfortable with, and we believe that is fine.
Have you or any of the programmers felt that SAS would do this job better than R?
Definitely – but it’s for either niche statistical methods or the individual has a lot more experience in SAS compared to R. That is totally fine.
What does the future look like for using R @ GSK? Is the choice for R v SAS based on the project (which statistical methods are required), the team members, or others?
As mentioned in the webinar, GSK’s commitment is that 50% of code is written using open-source technologies and that all new internal tools must be built on open-source languages. Users can still decide which tools to use for their day-to-day tasks. We do not believe in limiting the use of tools.
Once teams participate in a pilot program, do you find they continue to use R and the GSK R tools in their future programs?
Yes, they tend to be champions in our organization and push the adoption of R. Individuals who participate in pilot programs are more likely to be highly motivated or interested in using R.
Would you recommend investing the time/risk in the transition from SAS to R for small biotechs aiming for FDA submission?
I think that depends on the small biotechs. Each organization needs to make its determination based on its own capabilities.
Leadership buy-in
You mentioned that stats leadership initiated this adoption. How did you get buy-in from programming to switch or expand their processes to accommodate R?
This initiative was started a partnership with both statistics and programming. The group leading the initial push/pilots started in statistics. It was not initiated without buy-in from both organizations to start this journey.
How did you get leadership in your organization to approve open-sourcing your tools? Some leaders are resistant and view these tools as a competitive advantage.
It was a conversation about culture and explaining the role of the open-source ecosystem. It is not a quick conversation and requires leaders to understand a change in perspective. It is also important to ground the conversation in terms that leaders will better understand. For example, statisticians have an interest in publishing papers publicly, which often includes code; it’s the same concept for open-sourcing tools. Finding that common language and ground is how to get leadership buy-in.
How did you get the buy-in (and resources) on, say, AccelerateR?
For AccelerateR specifically, it was based on our GSK Biostatistics goals. GSK had a desire to embed R in study teams so AccelerateR was the mechanism to do that. Resources are allocated based on the objectives of organizations.
Taking the first step
Any advice for adopting R in smaller teams? Like an NGO or a startup?
Just start! Find small problems and begin to prove that R can work as a tool for solving the problems you want to solve.
I’m a natural R enthusiast, but I’ve never been asked to use it at work. How can I start working with it? Are there volunteer opportunities, etc.?
I would just start using R at work. If you can use it to create your outputs or deliverables for work, just do it!
Transition your organization to open source
Many thanks to Andy, Ben, and Becca for sharing how they are scaling open source at GSK.
- If you missed the event or would like to rewatch it, please find the recording on YouTube.
- If you are interested in the use of open source in clinical trials, schedule a call to speak with our pharma experts.