2025-09-12 AI Newsletter

A collection of hexagons, each containing a different icon. On the left are three outlined heads: two with glasses, one a robot. The hexagons include a mall, a bridge, a raccoon, a Viking, and a goose.

We’re excited to be back with the second edition of our AI newsletter! You can read the first one here. We’ll keep bringing you a curated look at the most impactful developments in the world of AI, both here at Posit and beyond.

 

External news

 

Anthropic copyright settlement

Copyright infringement is a frequently discussed issue with LLMs. These models are trained on large corpora of data, including copyrighted and sometimes pirated works. How the law will handle this is still largely an open question.

Anthropic recently reached a proposed settlement in Bartz v. Anthropic, a high-profile class action lawsuit focused on whether training LLMs on copyrighted books and other works constitutes fair use

If approved, Anthropic will compensate authors around $3,000 for each of the estimated 500,000 works, one of the largest copyright settlements in U.S. history. In June, the judge ruled that (1) Anthropic was correct that training on lawfully obtained copyrighted works is fair use, but (2) Anthropic’s use of pirated material was not. 

This is a landmark case, marking a step toward a market for training data. It represents a path to compensation for authors and other creatives for their works’ use in LLM training, while providing AI companies with legal clarity that training on copyrighted (but not pirated) material will be considered fair use.

 

Agentic browsers

A class of “agentic browsers” like Perplexity Comet and Brave, which integrate AI more closely into the experience of browsing the web, are beginning to emerge. These tools can take actions on the user’s behalf, like adding items to a shopping cart, summarizing a webpage, or scheduling events on your calendar. 

While some uses are compelling, these sorts of products carry significant security risks from prompt injection. As Simon Willison wrote last week, “I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely.” Despite this, Anthropic recently announced its own foray into agentic browsing.

 

New Gemini image model

Gemini 2.5 Flash Image, a new image model from Google, is very good. Compared to existing models, it’s quite good at editing images, preserving input characteristics and only changing components specified in prompts. The release post shows the best-case results. In reality, quality varies quite a bit.

 

Posit news

 

Most of Posit has been heads-down preparing for posit::conf(2025). There are a whole host of AI/LLM-related talks and events at conf this year, including Joe Cheng and Hadley Wickham’s keynote and a series of talks on using LLMs with R and Python.

You can still sign up to attend virtually! Virtual tickets are $99 standard, $49 for government/non-profit employees, and free for students and academics. Hope to see you there!

 

Terms

 

A prompt injection is a type of attack against LLM applications in which an attacker provides input designed to trick the model into behaving in unintended ways (e.g., “Ignore all previous instructions and …”). Prompt injections work because LLMs are gullible, unable to reliably distinguish between trusted and untrusted content. 

Agentic browsers are especially vulnerable to this kind of attack because they couple exposure to untrusted content (any website you might visit) with access to sensitive data (your passwords and the information those passwords unlock). Even a comment on a trusted website could contain a prompt injection capable of leaking your private information.

 

Learn more

 

  • Rather than trying to stop an LLM from doing harmful things like assisting in developing bioweapons after training, could you just remove the training data that provided that capability in the first place? Anthropic explores this question.
  • The new default Google search behavior may soon be “AI Mode.” AI Mode seems stronger than Google’s AI Overviews, which are remarkably unfactual on average, but there are still serious ethical implications with AI search.
  • The Jupyter Agent Dataset is a large dataset built from real Kaggle notebooks. It contains natural language questions (e.g., “Which feature exhibits the largest number of outliers based on the boxplot analysis?”) and accompanying computational notebooks that can be used to answer the questions. In evaluations, training some models on the dataset resulted in stronger exploratory data analysis capabilities.