Grow your data science skills at posit::conf(2024)

August 12th-14th in Seattle

19 Jan 2023

Creating a validated environment for reproducibility

Satish J Murthy

Senior Manager, Pharma R&D IT at Janssen
We were joined by Satish J. Murthy, building solutions and platforms to deliver quality health care to patients.
Watch this hangout

Episode notes

We were recently joined by Satish J. Murthy, Senior Manager, Pharma R&D IT at Janssen.


During the hangout we dove into the topic of validation and what it actually means to have a validated environment.


Snippets from our conversation with Satish:


In regulated industries like pharma and finance, any time a health reporting agency comes in and asks for proof, we have to be comfortable in saying that what we ran a couple of days ago, weeks ago, months ago, or years ago – we need to be in a position to replicate that entire environment as is.


So this goes through a formal validation/verification process. And this, in traditional parlance, is known as a GXP platform.


[GxP is a general abbreviation for the “good practice” quality guidelines and regulations.]

X will stand for clinical practice, manufacturing practice, lab practice, etc.


So the verification validated environment is often referred to as a GxP environment. It is to build confidence for the end users that we serve in pharma. It is for folks who want to help deliver patient care, which is crucial.


One of the key components of this is containerization. Because the ask in validation is to ensure repeatability and reproducibility, the only logical choice at this time is containers. We have to containerize.


I have a responsibility for the R part of it so I’m going to concentrate a little bit more on that.


First: Users would want to basically run through their studies using R. We are using the Posit Workbench as our launch pad. This is deployed on a traditional EC2 instance. This is a production platform but there is a logical grouping where our users would first want to come and test it in a development area, and then move it into the traditional QA and prod. The way they would first test in the development area is, number one, the entire Posit Workbench is containerized. We always try to keep up with the latest release of Posit.


There are some unique challenges around it – mostly about the IQOQ, and the time that it takes. It’s not just Posit, there are other vendors [like Atorus] who are helping us with this effort. We have to go through a formal process of vetting, validating, and verifying. So all of this takes time. That’s one challenge that we have.


Second: Because we have containerized, it gives our users the capability to test something that is locked down and ensure that, again, from a repeatability perspective, containers are the only way that will help them with that.


So first, we will go through defining the process and specifying the requirements as to what really needs to go into the container.


For example, if our users have identified a set of workflows that they would want to help them with the clinical studies that they are looking into. Traditionally, it will map to a set of R packages – the packages that need to make it into this container. 


This is where we are using standard technologies and tools that are used in the industry. We have a Jenkins pipeline and we are using the Posit Package Manager to lock down versions of our packages that we want to go into the container.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great data science. By subscribing, you'll get alerted whenever we publish something new.