Training & empowering more than 1,100 R users at AstraZeneca

As you might have seen in a blog post on the Posit website, R is playing a larger and larger role at AstraZeneca. Some AstraZeneca collaborators even call this a paradigm shift: up until a few years ago, a vast majority of our new hires were skilled in SAS while R was unchartered territory. But now, most of our new hires are both skilled in SAS and R. And this is something we really value as we leverage the complementarity of the two tools in everything that we do.
There is a strong interest at AstraZeneca to develop the use of R, across functions. To do so, a few working groups were put together and have been actively developing and promoting the use of R in the organization.

Learning & development

Led by Gabriella Rustici

As part of Data Science Academy – our suite of instructor-led and online learning built for us, by us, we have delivered, since 2020, more than 700 places on R training courses (instructor-led) and 250 more through online learning. The image below shows geographical distribution of learners registered on R training courses (Instructor-led only).
Courses cover basic programming, statistical analysis, machine learning and the analysis of particular data types (primarily transcriptomics and proteomics), using R. In collaboration with AZ Biometrics, we have also developed a bespoke R training program aimed at SAS programmers wishing to learn R, which combined taught modules and project-based work.
All R courses are in high demand; participants report improved coding skills (59%) and improve ability to handle data (50%) six months after having attended a course, demonstrating how training is having an impact on participants’ ways of working.
We often partner with external training providers to meet the needs of our audience. We were among the first users of the Posit Academy and are currently planning further collaborations with Posit in the learning space.


R in Biometrics

Led by Francis Kendall

The Biometrics teams across Astra Zeneca are preparing to add R into their Software analysis toolbox. The preparation is being driven by an internal initiative team that’s overarching objective is to operationalize R within the group, to add value to the Analysis and Reporting teams. The initiative comprises of three areas. Firstly, ensuring that all staff know how, where and when to access the various analysis platforms that contain R access, second to look at innovative uses of R within the Biometrics workflow that will add value and test these and finally a team looking on how to capitalize on R graphical capabilities. In addition to the above, we are also actively involved in external R initiatives to ensure we are in alignment in how R is being used in Biometrics groups across the industry.

R Governance

Led by Per Arne Stahl

The strength of R with its community is also its weakness in the regulatory environment where documented controls around the R application and data lifecycle are necessary for traceability and reproducibility. Enterprises using R are responsible for providing evidence of how they control R installations when inspected. R Governance aims to enable the analytical power of R for GCP use through a standardized R configuration with validated and version-controlled components. These artifacts will be deployed across multiple R platforms, ensuring consistent, repeatable, and compliant analyses. R users can move seamlessly between platforms knowing what R functionality is available and how it is controlled. There is now a central group leading the package management and validation activities to facilitate the vison of embedded R as a standard tool within Biometrics and beyond.

R in the pharmaceutical industry: a case study,

Written by Abhijit Das Gupta

At AstraZeneca, the objective of the Early Oncology department is to help identify molecular, genomic and proteomic targets for drugs, that can then be translated into actual drugs that target them. This process involves bioinformatics analyses on extracted RNA, DNA and proteins from cells and tissues, both in bulk or at the single-cell level. The data is usually delivered as spreadsheets or as files with specialized formats that are produced by different instruments.

The Bioconductor project, which houses thousands of R packages related to data ingestion, analyses, visualization and annotation derived from RNA, DNA, proteins and other molecular entities, becomes the workhorse for our bioinformatic analyses. There are specialized packages like Biostrings, rtracklayer, and SummarizedExperiment, Seurat, among many, many others, that allow data from different experimental setups and instruments to be read into R. Various analytic packages (Seurat, limma, DESeq2 and others) help with different kinds of analyses, from differential expression and quality control to cell classification, biomarker discovery, and profiling signatures based on differential outcomes. Bioinformaticians at AZ use these tools on a daily basis to understand the effects of different genes and proteins on different outcomes or even death, and also understand how different genes and proteins can interact with different treatments and drugs to affect outcomes. This vast R toolbox allows faster discovery of molecular targets and eventually create drugs that help patients improve their lives.

R @ AZ: the community of R users at AstraZeneca

Led by Guillaume Desachy

Across the world, one of the strengths of R lies in its community. Having this in mind, we kicked off the development of a community of R users at AstraZeneca in April 2021.
This led to developing a few initiatives. In order to meet the needs of the community, these are quite diverse as they go from an internal TidyTuesday all the way to some Lunch & LeaRns, blog posts and help desks.
We can’t rest on our laurels and will always have to come up with new ideas, including the recently launched AZ R-Ladies! What fascinates us is the steady growth of the community since its launch.
But it is really not about the numbers: what makes us so happy is to see how vibrant this community is and all the cross-department connections that were made possible thanks to R @ AZ!
We really can’t wait to see what will unfold in the months and years to come for the community of R users at AstraZeneca.

Working in a constantly evolving environment driven by Research & Development, these working groups are not set in stone and will evolve in the future. This will ensure that we meet our needs across the organization, be they in term of learning & development, use of R across the organization in a compliant way or even fostering the community.

If you are interested in applying your R skills to life sciences and the pharmaceutical industry, make sure to check out our current open positions.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great science. By subscribing, you'll get alerted whenever we publish something new.