2025-09-26 AI Newsletter

External news
Anthropic model degradation incident
Over the past month, Anthropic experienced significant model reliability issues, with Claude sometimes producing notably worse or nonsensical responses. According to Anthropic’s postmortem, three separate, overlapping bugs degraded model quality for several weeks.
Anthropic noted that diagnosing and fixing the issues was complicated both by the overlapping failures and by their own privacy protections, which limited engineers’ ability to inspect user conversations.
The postmortem stressed that the issues were infrastructure bugs only, and not caused by Anthropic intentionally degrading model quality due to load or demand, as some users had suspected.
These incidents damaged trust in Claude’s reliability and in Anthropic as a company. Many users seem to have felt that Anthropic was slow to acknowledge and fix the problems, a particular source of frustration for those on the $200/month Max plan.
Codex updates
OpenAI released updates to Codex, their software engineering agent, along with a new model trained specifically for coding, GPT-5-Codex, which they recommend you use with Codex.
Before the release of GPT-5-Codex, OpenAI recommended that you use GPT-5 with Codex. Our impression is that the recent changes to Codex, as well as the release of the GPT-5 models, have substantially improved Codex and turned it into a strong competitor for tools like Claude Code.
There are multiple ways to use Codex, which is included in any paid ChatGPT subscription:
- In the cloud by connecting to GitHub (Codex Cloud)
- From the command line with the Codex CLI
- Inside an IDE with Codex CLI and the VS Code extension. The extension works with Positron and so Codex can be used alongside Positron Assistant.
You can also access GPT-5-Codex via the Responses API.
A major focus of GPT-5-Codex (and Codex generally) is code review. Codex can review PRs in GitHub, and in OpenAI’s evals, GPT-5-Codex produced fewer incorrect review comments and more “high-impact” feedback than GPT-5. You can read more about how the code review functionality works here.
Posit news
Last week was posit::conf! There were many AI-related talks and events, including:
- Garrick Aden-Buie and Joe Cheng’s Programming with LLM APIs workshop. Check out the publicly available workshop materials.
- Hadley Wickham and Joe Cheng’s AI-focused keynote, which covered Databot, Positron Assistant, and querychat, plus a demo of ggbot2, a voice assistant that lets you build ggplot2 plots by talking out loud to an LLM. ggbot2 is built with shinyrealtime, a new package that integrates OpenAI’s Realtime API with Shiny.
If you attended (virtually or in person), you can watch sessions you missed in the conference portal. You can still register for a virtual ticket ($99) to get access to all recordings, or you can wait until the videos are posted to YouTube in December.
Terms
We’ve talked about agents in this newsletter before, but have never really defined what an agent is. The term can feel mysterious because it can be difficult to pin down a commonly agreed-upon definition, but that doesn’t mean that all agents are complicated.
At the heart of the definition of an AI agent is the ability to make tool calls, a way for an LLM to interact with external systems like APIs. For example, an LLM that queries a weather API to figure out the current weather is making a tool call.
Generally, agents meet three conditions:
- An agent can learn about the world through tool calling (for example, query a weather API).
- An agent can make changes to the world through tool calling (for example, write the weather forecast to a file).
- An agent operates in a loop, making tool calls, feeding results back into the model, and repeating until the goal is met (for example, determine the current location and time, query the weather API, write the forecast to a file).
The last step is important because it means the agent can figure out its own path to its goals. It doesn’t need the user or another party to spell out which tool calls it should make and in what order.
Simon Willison has a similar definition. Anthropic’s article on building agents also has some helpful definitions and diagrams.
Learn more
- This Economist article is a useful overview of the lethal trifecta for a general audience.
- BetterUp Labs and the Stanford Social Media Lab are investigating “workslop,” which they’ve defined as AI-generated content that looks polished, but is largely unhelpful and shifts the work burden onto coworkers. Read about their findings here.
- Anthropic’s September Economic Index report has some interesting findings about Claude usage, including usage patterns by country.
- Along with some other models, Qwen released Qwen3-Omni, an open-weights “omni” model that supports text, image, audio, and video.