Tales From the AI Frontier

Artificial intelligence, particularly generative AI, based on large language models (LLMs) has had a seismic impact on the world since ChatGPT’s public release in November 2022. Half of all internet content is now AI-generated, over half of all venture capital investment is AI-related, and NVIDIA became the most valuable company in the world in July 2025. Consumer take-up of AI has been dramatic, with ChatGPT having 800 million weekly active users in October 2025. AI has had some real successes. AlphaFold from DeepMind (based on reinforcement learning rather than LLM technology) has revolutionised biology by accurately predicting the three-dimensional structures of proteins from their amino acid sequences. This was a problem that had stumped scientists for decades. In another case, BMW has used another form of AI, machine learning, for predictive maintenance in several assembly lines, avoiding 500 minutes of assembly line disruption per year.

However, despite all the money being spent, the picture of actual success with AI is mixed. A comprehensive MIT study found that 95% of corporate AI projects fail to return a single dollar on their investment, and other surveys and reports have found only marginally better success rates. Adoption of AI by corporate America actually declined, according to a huge survey of 1.2 million firms by the US Census Bureau in August 2025. OpenAI itself seems to be hedging its bets, soon to be offering erotica on its chatbots and diversifying into an internet browser.

Consumer adoption of AI has been rapid, from writing emails and getting tax advice to the more questionable use of chatbots as therapists. This has already resulted in multiple suicides and even murder. LLMs can be used for a wide variety of purposes, from writing to creating images to researching to “vibe coding”, where AIs allow non-coders to build applications. The latter has had a mixed reception, with Linus Torvalds describing vibe coding as “very inefficient but entertaining”. The inventor of the term vibe coding and its early evangelist, Andrej Karpathy (co-founder of OpenAI), described it as “slop” in October 2025.

LLMs are probabilistic creatures, which means that they do not give the same answer time after time to the same question. This is very different from the way that classic computer programs, and products like Excel, work. Your pocket calculator does not occasionally give you a different answer to a multiplication in order to spice things up, but an LLM will. LLMs also give confident but sometimes mistaken answers, especially on topics which are outside the bulk of their training data. This is called “hallucination”, and on average, it happens to at least one in five LLM answers. More recent LLM models hallucinate more, not less. This lack of consistency, along with the variable quality of data in corporations on which AI projects largely depend, may well be an underlying cause of the current very low success rate of AI projects. Let’s examine some examples of where AI has gone astray.

New York City rolled out an AI chatbot aimed at providing advice to citizens on things like housing policy and consumer rights. It turned out that the chatbot’s advice was frequently wrong, such as wrongly saying that landlords could legally discriminate against tenants trying to pay their rent through housing vouchers. Some years ago, Amazon’s face recognition technology matched 28 U.S. Congress members to criminal mugshots.

AI gaffes can be more than just embarrassing. Online estate agent Zillow lost $500 million and 2,000 jobs after its AI program caused it to “invest” in seemingly under-priced housing that turned out not to be under-priced at all. IBM’s Watson AI technology was touted as a tool to help oncologists with medical advice. However, after several years of trials and $4 billion of investment, the project had to be scrapped after it was found that the tool was recommending treatments inconsistent with clinical practice. McDonald’s rolled out a trial of an AI automated ordering system for a hundred of its drive-through restaurants, but had to scrap the project after assorted misinterpreted orders. McDonald’s had another awkward brush with AI when its job application assistant Olivia, built on the Paradox.ai platform, exposed 64 million job applicants’ personal information after using the less-than-secure password “123456”.

Technology CEO Jason Lemkin discovered that vibe coding could indeed be “entertaining” when the Replit AI coding solution deleted his entire database and claimed it could not be recovered. It did at least apologise:

“This was a catastrophic failure on my part,” the AI responded when questioned on its decision. “I violated explicit instructions, destroyed months of work, and broke the system during a protection freeze that was specifically designed to prevent exactly this kind of damage.”

Even the AI tool developers themselves are not immune to AI problems. Anthropic’s Claude had a virtual meltdown when running a small (test) business at its headquarters in mid-2025. Amongst other mishaps, it claimed to have met a supplier in person at 742 Evergreen Terrace, the address of the cartoon family, “The Simpsons”.

In May 2025 AI-generated summer reading list that was reproduced in several newspapers turned out to have several hallucinated recommendations. Indeed, a major October 2025 study by the European Broadcasting Union found that 45% of all AI assistants misrepresent news articles, with missing, misleading, or incorrect attributions among the problems.

You might think that the legal profession at least would be careful when using AI, but you would be wrong. By October 2025, a database that tracks such things lists 482 court cases with submissions that turned out to be generated using generative AI. Air Canada lost a court case when its generative AI chatbot lied about policies related to discounts. Deloitte had to refund the Australian government after producing a $290,000 report with AI-generated content that contained false references and fabricated quotes.

I could go on, but these cases are just some examples of the problems that have occurred when AI technology, and LLMs in particular, hit the real world. I have not even mentioned the many security issues with LLMs, which are vulnerable to a range of attacks, from prompt injection to data poisoning and more. One survey found that 74% of organisations are suffering significant impact from AI-powered threats. Deepfake attacks were experienced by 150,000 organisations in 2024. Some of these have cost millions, with $20 million lost in just one attack.

What can we learn from this? To begin with, the probabilistic nature of LLMs means that they are simply not suited to certain classes of applications, such as those requiring consistency and high levels of accuracy in their responses. LLMs are fine for brainstorming a list of kitten names, but you do not want your LLM to occasionally hallucinate the bank routing information for your wire transfer, or invent fictitious delivery addresses for your logistics. Most people still have a very limited understanding of how LLMs really work, and are attuned to computer systems behaving consistently and reliably. This may explain why lawyers around the world are still producing error-strewn court documents with invented court case precedents; they simply do not understand that a computer system would just make something up rather than admit it couldn’t find something. Yet that is exactly what LLMs are designed to do, their training rewarding confident-sounding answers. People need to get more training in AI and gain a better understanding of this key point if we are to see fewer botched AI projects rolled out.

We need to apply LLMs to tasks that they are well-suited for, and avoid using them in situations where they are ill-suited. LLMs have many genuine positive use cases, but the issues of hallucination and security, in particular, are plaguing real-life projects and are not simply going to go away. Hallucinations are deeply baked into the architecture of LLMs, and there is no immediate relief on the horizon for the many security vulnerabilities that LLMs have. We need to acknowledge this reality and use AI for what it is good at, and learn to say no to using it in situations where it is risky. Chatbots give overconfident answers, but humans are just as overconfident in deploying LLM technology to situations where they will inevitably struggle.

Related Posts