2025-09-26 AI Newsletter

A collection of hexagons, each containing a different icon. On the left are three outlined heads: two with glasses, one a robot. The hexagons include a mall, a bridge, a raccoon, a Viking, and a goose.

External news

 

Anthropic model degradation incident

 

Over the past month, Anthropic experienced significant model reliability issues, with Claude sometimes producing notably worse or nonsensical responses. According to Anthropic’s postmortem, three separate, overlapping bugs degraded model quality for several weeks. 

Anthropic noted that diagnosing and fixing the issues was complicated both by the overlapping failures and by their own privacy protections, which limited engineers’ ability to inspect user conversations. 

The postmortem stressed that the issues were infrastructure bugs only, and not caused by Anthropic intentionally degrading model quality due to load or demand, as some users had suspected. 

These incidents damaged trust in Claude’s reliability and in Anthropic as a company. Many users seem to have felt that Anthropic was slow to acknowledge and fix the problems, a particular source of frustration for those on the $200/month Max plan.

 

Codex updates

 

OpenAI released updates to Codex, their software engineering agent, along with a new model trained specifically for coding, GPT-5-Codex, which they recommend you use with Codex. 

Before the release of GPT-5-Codex, OpenAI recommended that you use GPT-5 with Codex. Our impression is that the recent changes to Codex, as well as the release of the GPT-5 models, have substantially improved Codex and turned it into a strong competitor for tools like Claude Code

There are multiple ways to use Codex, which is included in any paid ChatGPT subscription

You can also access GPT-5-Codex via the Responses API

A major focus of GPT-5-Codex (and Codex generally) is code review. Codex can review PRs in GitHub, and in OpenAI’s evals, GPT-5-Codex produced fewer incorrect review comments and more “high-impact” feedback than GPT-5. You can read more about how the code review functionality works here.

 

Posit news

 

Last week was posit::conf! There were many AI-related talks and events, including:

If you attended (virtually or in person), you can watch sessions you missed in the conference portal. You can still register for a virtual ticket ($99) to get access to all recordings, or you can wait until the videos are posted to YouTube in December.

 

Terms

 

We’ve talked about agents in this newsletter before, but have never really defined what an agent is. The term can feel mysterious because it can be difficult to pin down a commonly agreed-upon definition, but that doesn’t mean that all agents are complicated. 

At the heart of the definition of an AI agent is the ability to make tool calls, a way for an LLM to interact with external systems like APIs. For example, an LLM that queries a weather API to figure out the current weather is making a tool call. 

Generally, agents meet three conditions:

  1. An agent can learn about the world through tool calling (for example, query a weather API).
  2. An agent can make changes to the world through tool calling (for example, write the weather forecast to a file)
  3. An agent operates in a loop, making tool calls, feeding results back into the model, and repeating until the goal is met (for example, determine the current location and time, query the weather API, write the forecast to a file)

The last step is important because it means the agent can figure out its own path to its goals. It doesn’t need the user or another party to spell out which tool calls it should make and in what order. 

Simon Willison has a similar definition. Anthropic’s article on building agents also has some helpful definitions and diagrams.

 

Learn more

 

  • This Economist article is a useful overview of the lethal trifecta for a general audience. 
  • BetterUp Labs and the Stanford Social Media Lab are investigating “workslop,” which they’ve defined as AI-generated content that looks polished, but is largely unhelpful and shifts the work burden onto coworkers. Read about their findings here
  • Anthropic’s September Economic Index report has some interesting findings about Claude usage, including usage patterns by country.  
  • Along with some other models, Qwen released Qwen3-Omni, an open-weights “omni” model that supports text, image, audio, and video.