Governed AI for Public Health: Reading Free-Text Records with Snowflake Cortex and Posit
Snowflake Cortex AI runs large language models (LLMs) inside your Snowflake account, on data already governed by your account roles and policies. Case investigation notes, lab comments, and complaint narratives stay where they belong. Combined with the Posit Team Native App, which runs Posit Workbench, Posit Connect, and Posit Package Manager on Snowpark Container Services inside the same security boundary, public health and science teams can summarize, classify, and structure free-text records, then deploy the results to program staff, without any protected data leaving the account. This post walks through the architecture, then shows two complete examples:
- Summarizing and structuring free-text case investigation notes for an epidemiology team.
- Triaging incoming public inquiries with a natural-language interface for program staff.
If you work in public health, you have probably had a version of this conversation in the last year. A program director asks why the team is not “using AI” on the pile of free-text already sitting in the warehouse: case notes, lab remarks, inspection narratives, call-center logs. You explain that the data is protected, that an external API call would route case-level information out of the Snowflake account, and that no one is eager to draft an exception to the data egress policy. The conversation usually stops there.
Snowflake Cortex AI removes the constraint, because the model runs inside Snowflake on the data already governed by your account roles and policies. The Posit Team Native App does the same for the analytical workbench: Posit Workbench, Posit Connect, and Package Manager run as a Snowflake Native Application on Snowpark Container Services, inside your Snowflake security boundary. Put the two together and your R and Python teams can build LLM-backed analyses, score sensitive text, and deploy the result to colleagues without protected data leaving Snowflake.
Why does it matter where the AI model runs in public health?
Privacy, security, and oversight teams in public health share a duty to protect personal information. HIPAA and protected health information rules, the FISMA authorization boundary, and agency data residency requirements all expect that you can answer four questions about any workflow that touches protected text:
- What data went in?
- What model produced the output?
- Who saw it, and when?
- Can the result be reproduced?
Sending text to an external LLM API breaks the first answer the moment the request leaves your network. You no longer fully control where the payload was logged, what cache it touched, or how the provider’s data retention policy maps onto your agreements. Even if the legal team is comfortable, your privacy officer may not be.
Cortex AI removes that exposure because the call never leaves Snowflake. A SELECT SNOWFLAKE.CORTEX.COMPLETE(...) runs on Snowflake-managed infrastructure inside the same region as your account, governed by the same role-based access control (RBAC). Output is materialized as a normal column, which means it is queryable, joinable, and visible to the same audit logs and access controls you already trust.
The Posit Team Native App applies the same logic to the IDE and the deployment platform. Your analysts open Positron, RStudio Pro, JupyterLab, or VS Code from a browser, but the session itself is a container in Snowpark Container Services running under a Snowflake service role. Posit Connect publishes Shiny apps, Plumber and FastAPI endpoints, and Quarto reports on the same foundation. Nothing has to be routed through a corporate virtual private network (VPN) to a separate analytics environment.
What does the Cortex AI and Posit architecture look like?
R and Python sessions reach Cortex through the standard Snowflake drivers and Snowpark, so the model call runs in the warehouse. Posit Connect picks up published content from Workbench and serves it inside the same Native App. The credentials your analysts use are Snowflake credentials, so role-based access on the underlying tables also governs who can call which Cortex function on which data.
Use case 1: How can an epidemiology team summarize case notes?
Case investigation notes are long, inconsistent, and full of detail that matters: exposure settings, symptom onset, suspected pathogens, follow-up actions. An epidemiologist preparing a weekly summary needs a short readable digest of each note and a structured pull of a few key fields. Across hundreds of notes, doing this by hand is slow, and it is the first task that gets dropped when case volume rises.
The plan: store the notes as text in a CASE_NOTES table, summarize them with SNOWFLAKE.CORTEX.SUMMARIZE, and extract structured fields with SNOWFLAKE.CORTEX.COMPLETE against a JSON schema.
Connecting from R inside Workbench
Inside a Workbench session running on the Native App, the warehouse and role are already wired up. You connect with odbc and the Snowflake driver:
library(DBI)
library(odbc)
con <- dbConnect(odbc::snowflake(),
notes <- tbl(con, in_catalog("MARKETING_DEMO","CASES", "CASE_NOTES"))tbl() returns a lazy table reference. Anything you compose with dplyr translates to SQL that runs in Snowflake, which is what you want here because the note text is large and you do not want to pull it back to the R session.
Calling Cortex from dplyr
Cortex functions are SQL functions, so you can call them through dplyr::sql() and let the warehouse do the work:
summaries <- notes |>
filter(REPORT_WEEK >= "2026-01-01") |>
mutate(SUMMARY = sql("SNOWFLAKE.CORTEX.SUMMARIZE(NOTE_TEXT)")) |>
select(CASE_ID, REPORT_WEEK, SUMMARY)
summaries |>
head(5) |>
collect()For the structured extraction, you want a longer prompt and a controlled output. SNOWFLAKE.CORTEX.COMPLETE accepts a model name and a prompt. A JSON-mode prompt keeps the output parseable:
prompt_base <- "You are reviewing a public health case investigation note. Return a JSON object with the keys exposure_setting, symptom_summary, suspected_pathogen, and follow_up_action. Each value should be a short string. Do not include any prose outside the JSON. NOTE: "
extracted <- notes |>
filter(report_week >= "2026-01-01") |>
mutate(
FIELDS_JSON = sql(paste0(
"SNOWFLAKE.CORTEX.COMPLETE(",
"'llama3.1-70b', ",
"CONCAT('", prompt_base, "', NOTE_TEXT)",
")"
))
) |>
select(CASE_ID, FIELDS_JSON)
extracted_results <- extracted |>
collect()The entire pipeline runs in the warehouse. You are paying for Cortex credits and warehouse compute, but the note text never crosses the IDE. Because this is just SQL, your data engineering team can add it to a scheduled task without rewriting anything.
Note: Pin the model name explicitly rather than relying on a default. Recording which model produced each summary is part of the audit trail, and it keeps your weekly outputs consistent when new models are added. The full list of available models is in the Cortex documentation.
Use case 2: How can program staff query records in plain language?
Program staff who are not R users still need to explore incoming public inquiries and route them to the right team: foodborne reports, respiratory concerns, vaccine questions, environmental complaints. Today that is a manual triage queue.
The plan: classify each inquiry with SNOWFLAKE.CORTEX.CLASSIFY_TEXT, then publish a Shiny application on Posit Connect that lets staff filter the routed records and ask questions in natural language.
Classifying free text
CLASSIFY_TEXT takes a string and a list of candidate labels. For a public health intake queue, the labels map to existing program areas:
inquiries <- tbl(con, in_catalog("MARKETING_DEMO","CASES", "PUBLIC_INQUIRIES"))
routed <- inquiries |>
mutate(
CATEGORY = sql(
"SNOWFLAKE.CORTEX.CLASSIFY_TEXT(INQUIRY_TEXT, ['foodborne', 'respiratory', 'vaccine_inquiry', 'environmental', 'other'])"
)
) |>
select(INQUIRY_ID, RECEIVED_DATE, CATEGORY)
routed |>
compute(name = in_catalog("MARKETING_DEMO","CASES", "INQUIRIES_ROUTED"), temporary = FALSE)The classification runs inside Snowflake and writes a normal governed table. Nothing about the inquiry text leaves the account.
Serving a natural-language interface
For the staff-facing layer, the querychat package lets you build a Shiny app where users ask questions in plain language and the app generates governed SQL against a specific table. Because the app runs on Posit Connect inside the Native App, it authenticates against Snowflake and inherits the same role-based access as the warehouse:
library(shiny)
library(querychat)
library(DBI)
library(DT)
library(odbc)
library(bslib)
con <- dbConnect(odbc::snowflake())
qc <- QueryChat$new(
data_source = DBISource$new
(con,
DBI::Id(catalog = "MARKETING_DEMO", schema ="CASES", table = "INQUIRIES_ROUTED")),
greeting = "Ask about routed public inquiries by category or date.")
ui <-page_sidebar(
sidebar = qc$sidebar(),
DT::DTOutput("table"))
server <- function(input, output, session) {
chat <- qc$server()
output$table <- DT::renderDT(chat$df())}
shinyApp(ui, server)A staff member who can only see one program’s records in the warehouse sees only those records in the app, because the role travels with the identity. You are not rebuilding access control in the application layer; you are inheriting it.
What does this mean for privacy and oversight?
The combination above gives you a code-first analytical environment, an LLM, and a deployment platform that all share one identity model and one audit trail. Your privacy, compliance, and oversight teams can verify the following directly, without a separate evidence-gathering exercise:
- Every Cortex call is associated with a Snowflake user and role, captured in the account’s query history.
- Every Posit Connect deployment is a versioned bundle pinned to the package versions installed by Posit Package Manager, which can mirror an internally curated repository.
- Every Workbench session runs as a container with logged start, stop, and resource usage.
- Every output table from a Cortex pipeline is governed by the same RBAC, masking policies, and row access policies as the source data.
The same governance story your team already tells about SQL workloads in Snowflake now extends to the AI-assisted R and Python work that has historically lived somewhere else.
How can I get started with Cortex AI and Posit Team?
If you want to try this in your own Snowflake account, start with the Posit Team Native App listing on Snowflake Marketplace and the Cortex AI documentation. The links below are good starting points, and the Posit Solutions Engineering team is happy to walk through a proof of concept with your data.
- Posit + Snowflake partnership page for an overview of integration capabilities.
- Curious about Posit Team, but not ready to commit? Start a 30-day free trial of the Posit Team Native App immediately, no sales calls required.
- Learn about Posit’s integration with Snowflake Cortex.
- Snowflake Cortex AI function reference for the full list of available functions and models.
- Posit Team Native App documentation for setup and configuration.
You can also schedule a demo with a Posit expert to see the Native App in action with your specific use cases. If you build something interesting on this stack, we would like to hear about it on the Posit Community forum.
Frequently asked questions
Does using Snowflake Cortex AI require moving my data?
No. Cortex AI functions run on data already in Snowflake. There is no data export, no external storage, and no network egress to a model provider. This is the core reason public health and science agencies can use generative AI on sensitive text without rewriting data governance policies.
How is Snowflake Cortex different from calling an external LLM API?
With an external API, the payload leaves your network, so you no longer fully control where the data was logged, what cache it touched, or how the provider’s retention policy maps onto your agreements. With Cortex, the call never leaves Snowflake, and inputs and outputs are governed by the same RBAC, masking policies, and audit logs as the rest of your account.
Which Cortex functions are useful for free-text public health data?
SUMMARIZE for digesting long case notes, COMPLETE for general-purpose completion and structured JSON extraction (with model selection across llama3.1-70b, mistral-large2, and others), CLASSIFY_TEXT for routing inquiries against custom labels, and EXTRACT_ANSWER for question-answering over a document. All run as SQL functions and return columns you can join, filter, and persist.
Can I call Cortex from R and Python, not just SQL?
Yes. Because Cortex functions are SQL functions, you can call them from R through dbplyr::sql() and from Python through Snowpark, both shown patterns above. The work runs in the warehouse either way, so the data residency story is identical whichever language your team writes in.
How do Posit Workbench and Posit Connect run inside Snowflake?
The Posit Team Native App packages Posit Workbench (Positron, RStudio Pro, JupyterLab, VS Code) and Posit Connect into a single application running on Snowpark Container Services. Both run as containers inside the customer’s Snowflake account under a Snowflake service role, so credentials, RBAC, and audit are inherited from Snowflake.
Can public health teams use this under HIPAA and FISMA?
Yes, and the architecture is designed to make that easier. Because Cortex calls execute inside Snowflake and outputs materialize as governed tables, privacy and security teams can verify what data went in, which model produced each output, who accessed the result, and whether the result can be reproduced, all within the same authorization boundary that already governs the warehouse.
Together, Posit and Snowflake bring AI-powered data science where your data lives. Manage your entire data science lifecycle inside the secure, governed Snowflake AI Data Cloud with Posit Team, through the Connected App or the Posit Team Native App.
About Posit
Posit (formerly RStudio) is the data science platform for R and Python, used by teams across government, public health, life sciences, financial services, and academia. Posit Team, including Posit Workbench, Posit Connect, and Posit Package Manager, gives organizations in regulated industries the tools to develop, deploy, and govern analytic work, with deep support for the open-source R and Python ecosystems. Learn more at posit.co.
About Snowflake
Snowflake delivers the AI Data Cloud, a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Snowflake’s Native App Framework and Snowpark Container Services allow software partners like Posit to run their applications inside customer Snowflake accounts, giving regulated industries a single governance boundary for both data and the analytical tools that work on it. Learn more at snowflake.com.