How TruDiagnostic Trained AI and Statistical Models 10x Faster, Saving a Year of Development Time While Cutting Infrastructure Costs by 60%

TruDiagnostic customer story thumbnail

About:

TruDiagnostic is the leading epigenetic testing company focused on biological aging, nutritional deficiencies, and cellular systems health—all based on DNA methylation analysis. They provide testing to clinical practitioners and their patients, researchers, and consumers, with the goal of helping people live longer, healthier lives.

Industry:

Biotechnology

Technology used:

Posit Workbench, Amazon SageMaker, Amazon EFS, Amazon S3, Amazon EC2

Data cloud partner integration:

Amazon SageMaker

Summary

TruDiagnostic faced an infrastructure challenge, managing the world's largest epigenetic dataset with fragmented tools. By adopting Posit Workbench with Amazon SageMaker, they empowered their scientists to control their computational resources with greater efficiency, unifying their R and Python environments. This transformation saved them a full year of product development time, cut their cloud infrastructure costs by 60%, and increased both their AI and statistical model training performance tenfold.

"The best innovation happens when scientists can follow their curiosity without infrastructure constraints. Posit Workbench gave us the ability to have RStudio on a highly scalable service of AWS SageMaker without the hassle of architecture management"

Dr. Varun Dwaraka PhD
Director of Research, TruDiagnostic

TruDiagnostic has become known for bringing the most precise and comprehensive epigenetic insights to market, setting the gold standard in longevity research. As a leader in epigenetics-based diagnostic solutions, they process the world’s largest DNA methylation dataset—with more than 80,000 samples—to offer advanced testing that helps people understand and optimize their health at the cellular level. TruDiagnostic develops AI models to analyze and interpret epigenetic data, providing researchers, clinicians, and consumers with groundbreaking insights into health and longevity.

Epigenetics involves changes in gene activity that don’t alter the DNA sequence but still affect gene expression. Lifestyle, environment, and age directly influence these changes, showing how external factors impact our genetic code and overall health.

The TruAge Test

TruDiagnostic’s flagship product determines:

  • An individual’s biological age (OmicmAge clock)
  • Annual aging rate (Dunedin PACE clock)
  • The biological age of 11 vital organ systems (brain, heart, liver, and more) (SymphonyAge clock)
  • 75+ epigenetic biomarker proxy estimations of protein, metabolites, and clinical measures directly related to human health

The Challenge: Scaling Infrastructure to Match Scientific Ambition

“We were literally training five thousand different models, serially, the first time. And that took months.”

Dr. Varun Dwaraka, PhD, Director of Research at TruDiagnostic, recalls the infrastructure challenge with processing large sets of data as quickly as they desired. What started as a small team using R for processing had grown into a full-fledged team processing over 80,000 biological samples—the world’s largest epigenetic dataset.  

The speed and efficiency challenges compounded as they grew. Each new team member brought their preferred tools: some used SageMaker, others Docker, while the R devotees stuck to RStudio. “Sayf was in Jupyter, but Natalia, Laura, Kirsten, and I were all in RStudio,” Dr. Dwaraka explains. This fragmentation hampered their ability to quickly deliver.  

With celebrity endorsements from the Kardashians and features on Netflix and HBO documentaries driving demand, TruDiagnostic faced a key choice: find a way to scale their infrastructure, or face slowing their turnaround times.  

Their infrastructure challenges created two critical bottlenecks:

  • Language Silos:To build their next generation of advanced machine learning products, TruDiagnostic needed to leverage Python for MLOps with tools like PyTorch. Meanwhile, established bioinformatics workflows published and validated in the scientific community relied on R. They lacked a unified environment that could seamlessly support both languages.
  • Engineering Bottleneck: In the early days, scientists were constrained by local machines and a single EC2 instance. Anytime they needed more compute—whether starting a larger instance or scaling down—they had to wait on an engineer. This dependency slowed research and made turnaround times lag. By moving to SageMaker, the team gained the ability to independently manage and scale their workloads in the cloud, improving convenience, security, and overall performance.

The Solution: Unified Platform

The Transformation by the Numbers

  • 1 full year of product development time saved
  • 1,600 model iterations: From months to 1 hour
  • 60% reduction in infrastructure costs
  • 10x performance improvement
  • 30+ peer-reviewed papers published

The answer came from an unexpected source: combining Posit Workbench with SageMaker, empowering scientists to scale compute on-demand without leaving their familiar tools. 

SageMaker saved us almost a year of product development,” Dr. Dwaraka says, still amazed by the transformation. The same 5,000 models that once took months to train? They could now process them in parallel, cutting time by over 90%. “By incorporating RStudio on SageMaker and Positron (part of Posit Workbench) in this environment, these tools enable the parallelization that SageMaker affords, while providing our team with their familiar coding environment.”

The solution provided a way forward with better security, compliance, performance, and scalability.  

Creating a Unified Environment

Posit Workbench solved the fragmentation problem by creating a single environment where R devotees and Python enthusiasts could finally work together. “We use Docker and GitHub. For IDEs, Sayf uses Jupyter, and Natalia, Laura, Kirsten, and I all use RStudio. It really has everything unified,” Dr. Dwaraka explains. Instead of forcing everyone into a single language or tool, the platform lets each scientist use their preferred environment while still collaborating seamlessly.

From MacBook Air to Supercomputer

The cloud-based solution untethered computational power from hardware limitations. “Now I can go around with my MacBook Air, which doesn’t really have a lot of memory, but then be able to work everything there,” Dr. Dwaraka shares. A lightweight laptop could now harness the power of massive cloud infrastructure, democratizing access to computational resources across the team.  

The technical implementation leveraged the power of AWS infrastructure integrated with Posit’s familiar tools:

  • Posit Workbench on SageMaker providing scalable compute instances for R and Python users to spin up independently
  • Amazon EFS for shared file storage across the team’s workflows
  • Amazon S3 buckets for automatic data uploads from their Lexington lab
  • Kubernetes orchestration and Argo Workflows for handling thousands of parallel model training jobs
  • MLflow for experiment tracking across their massive model development pipeline
  • R Markdown and Quarto for automated report generation directly to academic partners

As Dr. Dwaraka explains, “We create RStudio Sessions directly within SageMaker system. As the team grew, we had Docker, GitHub, Jupyter, RStudio, and now Positron. Posit Workbench unified everything—everyone could use their preferred tools while collaborating seamlessly.”

TruDiagnostic is actively evaluating Posit’s emerging AI capabilities such as Databot. Early trials with Databot have been promising, helping prototype exploratory analyses and code much faster while letting the team stay focused on study design and interpretation.

Results: 60% Cost Reduction, 10x Performance Gain

Scientific Velocity Unleashed

Posit Workbench enabled TruDiagnostic to accelerate its velocity in leading the forefront of epigenetic science discovery and publishing over 30 peer-reviewed papers. Their most recent breakthrough came with the Epigenetic Biomarker Proxies (EBP) project: what previously required months to train 1,600 model iterations now completes in just one hour. Their product development cycle, which once required serial training of 5,000 models over months, now processes over 1,700 validated models for their consumer products by leveraging SageMaker and Posit Workbench. “We were able to whittle down from a total of five thousand after clinical validation.” Dr. Dwaraka explains.  

The company’s commitment to scientific validation sets them apart in a field often plagued by overpromises. “We’re not Theranos,” Dr. Dwaraka emphasizes with a laugh. “Every claim we make is backed by peer-reviewed research. That’s why we’ve published over 30 papers—we believe in proving what works and being transparent about limitations.”  

 

Dramatic Infrastructure Savings

The switch to Posit Workbench delivered a 60% reduction in cloud infrastructure costs while achieving 10x performance improvements. For a quickly growing startup, these savings allowed the team to focus on scale rather than infrastructural troubleshooting.  

 

Strengthening Security & Compliance

The move to SageMaker and Posit Workbench also simplified TruDiagnostic’s compliance posture. Handling clinical data requires stringent controls, and the unified environment made it possible to achieve SOC 2 and HIPAA compliance more quickly.

 

Bringing Epigenetic Testing to the Mainstream

Most importantly, the infrastructure improvements enabled TruDiagnostic to bring advanced epigenetic testing to the mainstream. Making biological age assessment accessible to consumers globally, TruDiagnostic has been featured everywhere from Netflix documentaries to Keeping Up with the Kardashians. Khloé Kardashian was notably thrilled with her results, calling it the best such test she’d ever taken. Their R Markdown workflows now automatically generate reports for academic partners, while Shiny apps empower non-technical team members to visualize data from their database of over 80,000 samples.  

“Laura, on our team, developed a Shiny visualization tool that makes our database of over 50,000 test results accessible to everyone in the company,” Dr. Dwaraka notes. “Now our sales and marketing teams can explore the data themselves without needing a data scientist.” This democratization of data access exemplifies how the platform empowers the entire organization.  

As Dr. Dwaraka puts it: “Posit really was the answer.”

Modernize Your Analytics with AWS and Posit

Learn more about how AWS and Posit help organizations modernize analytics and accelerate time-to-insight with secure, scalable, and governed AI. Watch the webinar “From Wrangling to Insight: Human-in-the-Loop AI for Analytics” to see how human expertise and automation work together to deliver faster, more trusted results.  

Get started with RStudio on SageMaker here, or contact us directly to dive deeper into Posit and AWS products.

Learn more about TruDiagnostic

Discover more about TruDiagnostic’s epigenetic tests, designed by leading physicians and researchers, at their website: trudiagnostic.com.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great science. By subscribing, you'll get alerted whenever we publish something new.