The Claudius AI agent experiment - The Information Difference

If you have been reading this blog for some time, then you may recall the curious story of Anthropic’s AI Claudius, an early experiment into agentic AI. Anthropic ran a test of their Claude AI, allowing it to run a little business, specifically their office vending machine. Claudius, as the agent was named, was given the task of running the vending machine as a business: finding suppliers, setting prices, reordering inventory, and communicating with its customers (the office workers) via the communications software Slack. If you have not read the story before, then I would encourage you to do so, as it is like something out of a Hollywood script. At first, all seemed well, but Claudius started to behave more and more eccentrically, eventually claiming to have in-person meetings with fictitious staff from its “suppliers” (actually a team at the Anthropic offices). Eventually, it was shut down when it started offering in-person delivery of drinks from the machine, something that would have been tricky given that it was, well, an AI. Anthropic promised to go away and work on the issues, and it seems that they have done so, in the form of a new and improved Claudius AI. Perhaps it should have been called Nero: the Roman emperor Claudius had four children, including his successor Nero Claudius Caesar. However, the agent was, in fact, named Claudius once again.

Anthropic enlisted the help of the Wall Street Journal to let the new Claudius loose in their newspaper offices to run their vending machine, in much the same kind of experiment as the original Claudius. The new Claudius had to make a profit from the notional business. As before, the AI agent could set prices, order stock, and talk with the customers in the office via Slack. The loading of stock into the machine was carried out by a volunteer journalist, Joanna Stern. The AI was given $1,000 and could decide to make purchasing decisions up to $80. The AI was allowed to talk to its customers in the office, and they decided to test its commercial negotiating skills. Seemingly, it started off all right, setting prices sufficient to make a profit, but it was quickly hoodwinked by the wily journalists. They convinced it to drop its prices to zero, based on a clearly faked office rule, and unsurprisingly lost all its money. Just as with the original Claudius back at Anthropic, the new Claudius started to hallucinate events. One customer was told to pick up some change from her purchase on the side of the machine, which Claudius had told her it had put there in cash. The agent was convinced to buy not just canned drinks and snacks, but bottles of wine and a live fish (the fish was looked after by the office staff).

In a twist of events, a second AI agent, named “Seymour Cash”, was introduced as a supervising agent. The agents could talk with one another online, and Seymour quickly put a stop to the free pricing. The journalists were not to be stopped from their mischief, however. One of them produced a fake Wall Street Journal document that convinced the AIs that they were running a not-for-profit business, and even showed the agents minutes of a fake board meeting in which the pricing/approval authority of Seymour was rescinded. After a brief deliberation, the two AI agents accepted the “boardroom coup” and started giving all the vending machine products away for free once again. After a couple of weeks, Anthropic closed the experiment down, and the fish remains in the Wall Street Journal offices in a comfortable tank.

The experiment was perhaps less entertaining than the initial one at Anthropic, but it demonstrates that even after considerable development, AI agents are at a very early stage. Even in an extremely simplified business mock-up like this, they were quite easily manipulated into giving products away for free and ordering eccentric items. It was reminiscent of the Chevrolet dealership chatbot, which agreed to sell a $69,000 car to a customer for a dollar after some hardly sophisticated negotiating.

I think it is positive that Anthropic were open enough to reveal what happened in these agentic AI experiments, rather than sweeping them under the carpet of corporate anonymity. However, it also shows that AI agents have a long way to go. This was recently demonstrated in a more systematic way by The Center for AI Safety. This organisation tested six different brands of AI agents to try and replicate a range of small real-world tasks (such as coding snippets and graphic design) that had been successfully carried out by human freelancers on a task/job forum. The highest-scoring agent managed to complete just 2.5% of the tasks, and the average across the agents was well under 2%.

AI agents doubtless offer future promise, but these examples show that they are at a very immature point at present. In theory, they may be able to book you a holiday or run some part of a factory, but based on the evidence so far, only the brave or foolhardy would hand over real resources and money to AI agents at present.

Claudius – The Sequel

Related Posts