Scaling Data Excellence at CARFAX: How NLP and Automation Saved 12,000 Hours

CarFax and Posit customer story

Summary

Tiger Tang, Senior Manager of Data Science at CARFAX, faced a tightening constraint: the company’s massive and rapidly expanding data required more powerful, scalable processing solutions. By leveraging the Posit ecosystem, Tang’s team transformed manual workflows into automated powerhouses, reclaiming over 12,000 hours of labor while scaling product content to new heights.

About:

As a part of S&P Global and the most visited automotive site in the U.S., CARFAX maintains North America’s most comprehensive vehicle history database, powered by over 35 billion records and more than 150,000 data sources. 

Industry:

Technology

Technology used:

Posit Workbench, Posit Connect, Posit Package Manager

"By modernizing our analytical workflows, Tiger’s team has done more than just save time—they've increased our organizational agility. Moving from hours of manual investigation to near-instant results allows us to deliver must-have insights at the scale that our customers expect from CARFAX."

Srinidhi Melkote

Chief Analytics Officer at CARFAX

The challenge

Clearing data bottlenecks

As a part of S&P Global and the most visited automotive site in the U.S., CARFAX maintains North America’s most comprehensive vehicle history database, powered by over 35 billion records and more than 150,000 data sources. While this scale provides unparalleled value to millions of consumers, the sheer diversity of the incoming information (ranging from standardized federal records to unstructured, human-written technician notes) presented a unique opportunity for high-level optimization.

According to Tang, the diversity of the data is both a strength and a major hurdle as there are countless ways to describe a simple task like an oil change, ranging from formal entries to cryptic shorthand. For years, standardizing this influx of information required constant manual oversight.

This "manual tax" extended internally, where high-value analysts often found themselves caught in a "SQL-Excel-Outlook" loop, spending more time moving data than extracting its strategic value. To keep pace with CARFAX’s expanding footprint, Tang’s mission was to modernize these successful processes into reproducible, code-based workflows, transforming a labor-intensive bottleneck into a high-velocity production line.

The solution

From human-dependent process to reproducible code

To shift their operations into a higher gear, Tang’s team leveraged Posit Team to construct a more scalable workflow. The team tackled messy, unstructured vehicle data by developing Natural Language Processing (NLP) models on Posit Workbench

These powerful NLP models and processes empowered CARFAX products with deep data insights and classified millions of unique service records. By automating the extraction of these insights, the team turned ad-hoc investigations that previously required 2.5 to 3 hours of manual effort into a streamlined process completed in as fast as 15 seconds.

The team also focused on eliminating the "SQL-Excel-Outlook" loop by migrating manual reports to Posit Connect, where tasks are now handled by scheduled, reproducible R code. This shift turned a slow, human-dependent process into a high-speed production line.

To ensure company-wide adoption, the team focused the user experience of their tools on business stakeholders, addressing three critical areas:

  • Engineering for scale: To support growing adoption without performance lags, the team implemented asynchronous programming, which enabled their tools to handle at least 50 simultaneous business users.
  • The "one-click" experience: They simplified access by creating a centralized landing page with a memorable internal URL. This removed the technical barrier for non-data users, making answers as easy to find as a web search. This allowed account managers to instantly evaluate data health by service location, providing curated insights and action items without having to wait for a manual analyst report.
  • Built-in trust: Every automated tool includes a "failsafe" plan. By documenting how to handle edge cases or revert to manual processes if needed, the team reduced operational risk and built confidence among stakeholders.

12K+

Hours Saved

The documented time reclaimed from manual processes across the organization during initial project rollout.

160K+

Requests

The number of times automated tools and apps have been utilized by stakeholders, rather than filing a manual data request ticket. This is a milestone reached during the initial 3-year rollout

560

Monthly Automated Runs

The number of automated reports that previously required a dedicated team for manual intervention.

The results

12,000 Hours and a Cultural Shift

The ROI was immediate and measurable. By replacing the "manual loop" with scheduled, reproducible code on Posit Connect, the organization reclaimed over 12,000 hours of manual labor.

The impact extended beyond just time saved:

  • Self-Service Growth: Stakeholders now conduct hundreds of self-service runs each day via interactive applications hosted on Posit Connect, bypassing the traditional data request ticket.
  • Systemic Trust: With 160,000+ automated requests processed, the team has built a fail-safe system—including handoff documents for manual reverts—that ensures the organization never loses momentum.

Today, the team has shifted from reactive reporting to proactive data engineering. With stakeholders across the organization relying on the Posit platform, code-based workflows have evolved from a technical preference into a core business necessity.

The team is now leveraging Positron, Posit’s next-generation IDE, to accelerate their polyglot development velocity and continue pushing the boundaries of what is possible at the intersection of data science and vehicle history. Tang and his team are no longer just writing code; they are building the scalable analytical framework that powers the insights behind the CARFAX brand.

Helpful resources

Clock showing 12pm on a laptop monitor on a desk framed by a computer terminal square

Data Science Hangout with Tiger Tang, Senior Manager of Data Science at CARFAX

Tiger Tang conf talk

Saving 1,000 Hours with RStudio - Selling R in Your Workplace

shiny logo on blue background

From Concept to Impact - Building & Launching Shiny Apps in the Workplace

We're here to help

Building your business case?

Getting buy-in for new tools can be hard. We get it. We've put together resources to make it easier.