Case studies and industry use cases

Creating ADaM Subject-Level Analysis Datasets (ADSL) with the Pharmaverse, Part 1

Written by Isabella Velásquez

2023-03-30

Text: "Creating ADaM Subject-Level Analysis Datasets (ADSL) with the pharmaverse." pharmaverse hex sticker is shown.

This blog post is based on the Clinical Reporting in R workshop from R/Pharma 2022, led by Christina Fillmore (GSK), Ellis Hughes (GSK), and Thomas Neitmann (Roche). The author thanks the instructors for sharing their valuable resources. Would you like to work through this live? Here is a ready-to-go environment on Posit Cloud!

Watch the full recording on YouTube, check out the workshop materials, and learn more about the pharmaverse and R/Pharma.

Pharmaceutical organizations must adhere to a specific set of procedures regarding their clinical trial submissions before sharing data with regulatory agencies. One crucial step in this process is the creation of subject-level analysis datasets (ADSL) and their accompanying metadata, which must comply with the Analysis Data Model (ADaM) standards.

To create ADaM datasets, a prespecified process that involves importing, tidying, and transforming data is required. Establishing a proper structure enables others to generate tables, listings, and figures more efficiently and ensures traceability. And by following this process, regulatory agencies can quickly review and approve a submission, which accelerates the release of safe and effective medicine to patients.

Creating common ADaM datasets follows a workflow that looks something like this:

Import data that would be helpful to add to your ADaMs
Pull in metadata
Combine predecessor variables
Run any calculations
Drop unused variables
Export the dataset

Example ADaM workflow, presented by Christina Fillmore

There are many different tools available that analysts can use for each step of the workflow. But often, analysts end up creating customized ways of doing things, which can be time-consuming and inefficient. Instead, it’s better to use standardized processes that can be reused across different projects. This not only saves time but also ensures consistency and accuracy in the work.

Noting this, representatives across Atorus, GSK, Janssen, and Roche started the pharmaverse, a curated stack of open-source R packages for clinical reporting. The pharmaverse is a collaboration between several pharmaceutical companies and individuals to reduce duplication efforts in clinical reporting and, ultimately, shorten the drug development process.

The pharmaverse provides analysts with a series of package to support the processes of clinical reporting, including building ADaM datasets. They don’t have to search for tools that serve their needs or create something from scratch.

With the pharmaverse, the workflow now looks like this:

Import data: use the haven package to import .sas7bdatfiles into R
Pull in metadata: use the metacore package to import and hold metadata, particularly for specifications
Combine predecessor variables together: use the metatools package to enable the use of metacore objects
Run any calculations / Drop unused variables: combine the tidyverse, metatools, and admiral for any ADaM-building needs
Export the dataset: use xportr to export files that meet clinical standards

Example ADaM workflow with respective pharmaverse package, presented by Christina Fillmore

This post provides a brief overview of the first three steps in the workflow, covered in the first part of the Clinical Reporting in R workshop. For a more in-depth understanding, we suggest referring to the two-part recording of the workshop. Stay tuned for part 2 of the series: Derive ADaM variables and parameters with admiral!

Setup project

To begin, load the necessary packages and functions. The workshop’s datasets consist of fake clinical data that complies with SDTM standards. We will import them from the workshop GitHub repository later on, but for now, we save all the relevant URLs in R objects.

library(metacore)  # CRAN v0.1.1
library(tidyverse) # CRAN v1.3.2
library(admiral)   # CRAN v0.9.1
library(haven)     # CRAN v2.5.1
library(metatools) # CRAN v0.1.3
library(xportr)    # CRAN v0.1.0

file_load  <- function(url, file, ext) {
    download.file(url = url, destfile = paste0(file, ".", ext), mode = "wb", quiet = TRUE)
}

# Location of files for this walkthrough
specs_url <- "https://github.com/pharmaverse/r-pharma2022/blob/main/specs/specs.xlsx?raw=true"
dm_url <- "https://github.com/pharmaverse/r-pharma2022/blob/main/datasets/SDTM/dm.xpt?raw=true"
vs_url <- "https://github.com/pharmaverse/r-pharma2022/blob/main/datasets/SDTM/vs.xpt?raw=true"
ex_url <- "https://github.com/pharmaverse/r-pharma2022/raw/main/datasets/SDTM/ex.xpt"
sv_url <- "https://github.com/pharmaverse/r-pharma2022/blob/main/datasets/SDTM/sv.xpt?raw=true"
ae_url <- "https://github.com/pharmaverse/r-pharma2022/blob/main/datasets/SDTM/ae.xpt?raw=true"

Hold metadata for SDTM and ADaM datasets using metacore

Companies hold their predefined ADaM metadata in idiosyncratic ways, and standardizing this data became necessary to automate parts of ADaM creation. The metacore package solves by storing metacore objects in an organizational structure that standardizes specifications across organizations. It has been available on CRAN for over a year, and its developers continue to update it to comply with the latest CDISC standards.

Loading metadata into the metacore object requires readers. The metacore package comes with built-in readers for common metadata formats like Pinnacle 21 (P21). Here, we can import a P21 spec into R using spec_to_metacore():

specs <- file_load(specs_url, "specs", "xlsx")
metacore <- spec_to_metacore("specs.xlsx", where_sep_sheet = FALSE)

Warning: `core` from the `ds_vars` table only contains missing values.

Warning: `supp_flag` from the `ds_vars` table only contains missing values.

Warning: `dataset` from the `supp` table only contains missing values.

Warning: `variable` from the `supp` table only contains missing values.

Warning: `idvar` from the `supp` table only contains missing values.

Warning: `qeval` from the `supp` table only contains missing values.

Warning: The following derivations are never used:
ℹ ADSL.ACOUNTRY

✔ Metadata successfully imported

ℹ To use the Metacore object with metatools package, first subset a dataset
  using `metacore::select_dataset()`

Immediately, we see that several columns are missing values: core, supp_flag, etc. Since we’re not creating a supplemental dataset yet, we can move on. We can remove the warnings by adding quiet = TRUE.

Automate dataset creation based on metacore

Now we can read in the SDTM demographic (dm) data using read_xpt() from the haven package:

dm <- file_load(dm_url, "dm", "xpt")
dm <- read_xpt("dm.xpt")

We select the dataset to build using select_dataset(). In this case, we want the ADaM Subject-level Analysis (ADSL) dataset:

adsl_spec <- metacore %>% 
  select_dataset("ADSL")

Warning: `core` from the `ds_vars` table only contains missing values.

Warning: `supp_flag` from the `ds_vars` table only contains missing values.

Warning: `format` from the `var_spec` table only contains missing values.

Warning: `dataset` from the `supp` table only contains missing values.

Warning: `variable` from the `supp` table only contains missing values.

Warning: `idvar` from the `supp` table only contains missing values.

Warning: `qeval` from the `supp` table only contains missing values.

✔ ADSL dataset successfully selected

adsl_spec

── Dataset specification object for ADSL (Subject-Level Anaylsis) ──────────────

The dataset contains 35 variables

Dataset key: STUDYID USUBJID

The structure of the specification object is:

→ codelist: character [17 x 4] code_id, name, type, codes

→ derivations: character [30 x 2] derivation_id derivation

→ ds_spec: character [1 x 3] dataset, structure, label

→ ds_vars: character [35 x 7] dataset, variable, order, keep, key_seq, core,
  supp_flag

→ supp: character [0 x 4] dataset, variable, idvar, qeval

→ value_spec: character [35 x 8] dataset, variable, origin, type, code_id,
  sig_dig, where, derivation_id

→ var_spec: character [35 x 6] variable, length, label, type, format, common

To inspect the specification object use `View()` in the console.

With metatools, we loaded our data and subset it to only contain ADSL.

Now that we have the dataset to build, we can use the metatools package to automate the creation of variables from metacore objects. The next step is to combine predecessor variables. We can use metatools’ build_from_derived() to pull in a metacore object and the list of datasets from which to build (in this case, the dm dataset).

# Pull together all the predecessor variables 
adsl_pred <- build_from_derived(adsl_spec,
                                ds_list = list("dm" = dm),
                                keep = TRUE) %>% # Keep old name
  filter(ARMCD %in% c("A", "P")) # Filter out anything with ARM codes other than placebo or active

Warning: Setting 'keep' = TRUE has been superseded, and will be unavailable in future
releases. Please consider setting 'keep' equal to 'ALL' or 'PREREQUISITE'.

head(adsl_pred)

# A tibble: 6 × 13
  STUDYID   USUBJID    COUNTRY SITEID   AGE AGEU  SEX   ETHNIC RACE  ARM   ARMCD
  <chr>     <chr>      <chr>    <dbl> <dbl> <chr> <chr> <chr>  <chr> <chr> <chr>
1 GSK123456 GSK123456… CAN        101    62 YEARS F     NOT H… MULT… Plac… P    
2 GSK123456 GSK123456… CAN        101    60 YEARS M     NOT H… AMER… Plac… P    
3 GSK123456 GSK123456… CAN        101    38 YEARS F     NOT H… AMER… Plac… P    
4 GSK123456 GSK123456… CAN        101    47 YEARS F     NOT H… MULT… Plac… P    
5 GSK123456 GSK123456… CAN        101    57 YEARS F     NOT H… AMER… Plac… P    
6 GSK123456 GSK123456… CAN        101    62 YEARS M     NOT H… ASIAN Plac… P    
# ℹ 2 more variables: ACTARM <chr>, ACTARMCD <chr>

We can see we have 13 variables and 200 subjects. We can start building out some variables, but first, let’s see if any are missing.

check_variables(adsl_pred, adsl_spec)

Error:
! In: [check_variables(adsl_pred, adsl_spec)]

The following variables are missing: ACOUNTRY, AGEGR1, AGEGR1N, SEXN, ETHNICN,
  RACEN, TRT01P, TRT01PN, TRT01A, TRT01AN, HEIGHTBL, WEIGHTBL, BMIBL, SAFFL,
  MITTFL, RANDFL, TRTSDT, TRTEDT, LALVDOM, LALVSEQ, LALVVAR, LSTALVDT

We are missing quite a few variables! To start, let’s create the SEXN variable. The metacore object contains information from your spec, including control terms. We can obtain them using get_control_term().

get_control_term(adsl_spec, SEXN)

# A tibble: 2 × 2
  code  decode
  <chr> <chr> 
1 1     F     
2 2     M

Metatools makes it easy to code our variables without tedious if-else statements. The create_var_from_codelist() function helps us pull in the reference variables we need to get our desired output.

adsl_pred %>%  
  create_var_from_codelist(adsl_spec, SEX, SEXN) %>% 
  select(USUBJID, SEX, SEXN)

# A tibble: 200 × 3
   USUBJID        SEX    SEXN
   <chr>          <chr> <dbl>
 1 GSK123456-1101 F         1
 2 GSK123456-1104 M         2
 3 GSK123456-1105 F         1
 4 GSK123456-1106 F         1
 5 GSK123456-1107 F         1
 6 GSK123456-1108 M         2
 7 GSK123456-1109 M         2
 8 GSK123456-1111 F         1
 9 GSK123456-1112 F         1
10 GSK123456-1115 M         2
# ℹ 190 more rows

We can see that F values became 1 and M values became 2 under SEXN. We can do the same for other variables like RACEN, ETHNICN, etc., in a single pipe.

adsl_decode <- adsl_pred %>%  
  create_var_from_codelist(adsl_spec, SEX, SEXN) %>% 
  create_var_from_codelist(adsl_spec, ETHNIC, ETHNICN) %>% 
  create_var_from_codelist(adsl_spec, RACE, RACEN) %>% 
  create_var_from_codelist(adsl_spec, COUNTRY, ACOUNTRY) %>%
  create_var_from_codelist(adsl_spec, ARMCD, TRT01PN) %>% 
  create_var_from_codelist(adsl_spec, ACTARMCD, TRT01AN) %>%
  create_var_from_codelist(adsl_spec, ARMCD, TRT01P) %>% 
  create_var_from_codelist(adsl_spec, ACTARMCD, TRT01A)

We can check adsl_decode to see if there are any other missing variables.

check_variables(adsl_decode, adsl_spec)

Error:
! In: [check_variables(adsl_decode, adsl_spec)]

The following variables are missing: AGEGR1, AGEGR1N, HEIGHTBL, WEIGHTBL,
  BMIBL, SAFFL, MITTFL, RANDFL, TRTSDT, TRTEDT, LALVDOM, LALVSEQ, LALVVAR,
  LSTALVDT

We can also use metatools for categorization, such as with age group, with create_cat_var(). Like before, we provide an input variable and a reference variable, as well as name our desired output variable. We can also add a variable for decoding.

get_control_term(adsl_spec, AGEGR1) # See age group categories

# A tibble: 3 × 2
  code  decode
  <chr> <chr> 
1 <65   <65   
2 65-80 65-80 
3 >80   >80

adsl_decode %>% 
  create_cat_var(adsl_spec, AGE, AGEGR1, AGEGR1N) %>% 
  select(USUBJID, AGE, AGEGR1, AGEGR1N) %>% 
  head()

# A tibble: 6 × 4
  USUBJID          AGE AGEGR1 AGEGR1N
  <chr>          <dbl> <chr>    <dbl>
1 GSK123456-1101    62 <65          1
2 GSK123456-1104    60 <65          1
3 GSK123456-1105    38 <65          1
4 GSK123456-1106    47 <65          1
5 GSK123456-1107    57 <65          1
6 GSK123456-1108    62 <65          1

We can double-check our newly created dataset’s variables.

adsl <- adsl_decode %>% 
  create_cat_var(adsl_spec, AGE, AGEGR1, AGEGR1N)

check_variables(adsl, adsl_spec)

Error:
! In: [check_variables(adsl, adsl_spec)]

The following variables are missing: HEIGHTBL, WEIGHTBL, BMIBL, SAFFL, MITTFL,
  RANDFL, TRTSDT, TRTEDT, LALVDOM, LALVSEQ, LALVVAR, LSTALVDT

Calculations are needed to fill in the missing variables. In the upcoming post of this series, we will proceed to the next step in our workflow, which involves running the necessary calculations with admiral.

Learn more

We hope you enjoyed the first post on how to use the pharmaverse for creating ADaM ADSL with the pharmaverse. We showed only some of the available packages and functions; check out the breadth of the pharmaverse on the website and peruse the provided examples.

Again, we thank the instructors of Clinical Reporting in R for their materials. Please watch the Day 1 and Day 2 recordings for more detailed information and for walkthroughs on other parts of the clinical reporting workflow.

At Posit, we have a dedicated Pharma team to help organizations migrate and utilize open source for drug development. To learn more about our support for life sciences, please see our dedicated Pharma page, where you can book a call with our team.

Isabella Velásquez

Sr. Product Marketing Manager at Posit, PBC

Isabella is a content strategist, data enthusiast, and author. Her goal is to drive engagement around all the awesome things happening at Posit.

Creating ADaM Subject-Level Analysis Datasets (ADSL) with the Pharmaverse, Part 1

Setup project

Hold metadata for SDTM and ADaM datasets using metacore

Automate dataset creation based on metacore

Learn more

Isabella Velásquez

Related Content

Score a Disease Surveillance Model Inside Snowflake, Without Moving

Governed AI for Public Health: Reading Free-Text Records with Snowflak...

Public Sector Shiny Showcase: Government Agencies Worldwide Building D...