The release of ChatGPT in November 2022 opened the eyes of most of the world to the power of large language models (LLMs). This is the core technology that underpins ChatGPT and its rivals Claude, Gemini, Perplexity, etc, as well as their image-generating cousins Midjourney, Leonardo, Imagen, Firefly and Sora. Excitement about LLMs reached a fever pitch, with ChatGPT reaching 100 million users by February 2023, quicker than Netflix or Instagram to that milestone. By 2025, that had grown to 800 million weekly active users, with over a billion queries processed every day. This shows dramatic growth, but something odd was happening too. We are used to software getting better with each production release, but with LLMs progress was less clear. The performance of LLMs has actually deteriorated by some measures. There are various theories as to why this may be, but the problem is real. This study, published in Nature, examines this subject in depth and found that larger, more complex LLMs indeed have become less reliable.
Hallucinations are a major headache for anyone who works with LLMs regularly. The rates of hallucinations vary depending on a number of factors, but in June 2025 a large study found that around 15% of LLM queries contained hallucinations on average for the best–performing LLM tested. Google’s Gemini scored a 43% hallucination rate. Curiously, the more advanced, recent LLM models actually hallucinate more than their ancestors. This is a gigantic problem for those hoping to deploy AI in corporate or industrial applications, where it is simply unacceptable to sometimes hallucinate a delivery address or an invoice amount. Most industrial processes expect an error rate of under 1%, and often much better than that, with 3.4 errors per million being the target for “six sigma” process improvement. High hallucination rates are part of the reason why AI corporate deployments are failing at a dismal rate of 95% in 2025, according to a major MIT study in July 2025.
By 2025 even Yann Lecun, Chief AI Scientist at META and an AI pioneer, said “LLMs are a dead end”. A broader survey of 475 scientists in March found that 75% of those surveyed agreed with him, at least in terms of the likelihood of LLMs leading to artificial general intelligence (AGI), a future state where AI performs at a human level at any intellectual task. Certainly the widely derided launch of ChatGPT 5 in August 2025 did nothing to dispel the scepticism. This is not to say that LLMs are not useful. Even as of today they clearly have found use cases in many areas and situations. However, they may not get much better anytime soon.
So, if LLMs have hit a wall, then what might be next for AI? There are actually many strands of AI research that have promise. Neuro-symbolic AI is an approach that combines neural networks with “symbolic AI”, which uses explicit rules and logical reasoning. To explain, a neural network might look at an image and recognise certain features like curves and shapes and textures. In the image, it may highlight areas that look like whiskers or pointy ears or a furry body. A symbolic component may have a rule that says that if an object has pointy ears and whiskers and a furry body, then it is a cat. The combination of the neural network and the explicit rules may work better than either approach on their own. In particular, such an AI would likely hallucinate less than an LLM, as it has a grounding in explicit rules. This is more than a research idea. Some fraud detection systems use neural networks for anomaly detection alongside symbolic rules for compliance with legislation. Some robots use neural vision, but combine this with symbolic logic for navigation. Early commercial AI products following this approach include the product Imandra Universe from early-stage Austin-based start-up Imandra. There are many others. As with any system that relies on explicit rules, such as the early expert systems, a challenge will be dealing with novel or ambiguous situations.
A separate approach is multi-modal AI, where systems are designed to integrate information from different data types, such as text, images, audio and video, and combine these into a common mathematical format. Essentially, it combines several separate neural networks and allows the model to understand the relationships between them. Outputs might be a text description of an image or a video generated from a text prompt. Some existing AI products already incorporate this approach, including DALL-E 3 and Google Gemini. A practical example is in the world of autonomous vehicles, where data is integrated from cameras, LIDAR and radar in order to make driving decisions. This approach requires particularly high resources to train, with the source data (text, image, video) needing to be carefully collected and annotated in order to ensure coherence. On the other hand, combining multiple modes of input can give better results. For example, combining radiology images with patient notes can lead to more accuracy than studying the images alone. Another example would be a customer service chatbot that could analyse a customer’s expression or tone of voice to give a better answer than just a text interface.
Another strand of research is developmental AI, which seeks to mimic the learning processes of humans, such as children learning how to walk, adapting based on feedback. By learning from feedback, developmental AI systems should be more resilient to unforeseen circumstances and novel data and situations, such as those encountered by robots moving across unknown terrains. The drawback is long development cycles and complexity. Examples of this approach can be found in self-driving cars and autonomous robots. Another example would be the adaptive approach of recommendation engines used in Netflix and Amazon, which learn over time of a customer’s preferences and develop steadily better recommendations.
Another paradigm in AI is reinforcement learning, where an AI model makes decisions based on trial and error, receiving feedback from an environment in the form of rewards or penalties to maximise a cumulative reward. An example of this approach is the technology of DeepMind, exemplified in the chess-playing program AlphaZero. Previous specialist chess programs used brute-force approaches to evaluate millions of possible chess moves and assess positions based on hand-crafted rules based on the experience and strategic principles of strong chess players. By contrast, AlphaZero was taught just the basic moves and rules of chess, and was left to play itself again and again and learn for itself what strategies succeeded in practice. At first its play was very weak, but after millions of iterations, it became better and better in games with itself, until it finally reached a level (after 44 million games, which took it just two hours) where it was no longer improving. At this point, it was pitted against the strongest old-school chess-playing program in the world, Stockfish. In a hundred games in 2018, AlphaZero won 72, drew 28 and lost none to Stockfish. This success showed the value of reinforcement learning as a technique. This success was later replicated with AlphaGo, where it became the first computer to beat a human master at Go, which is vastly more computationally complex than chess. It went on to beat the human world champion, Lee Sedol, at Go. The same technique has been applied to AlphaFold to predict the ways that proteins will fold in three dimensions, an immensely challenging problem and one that is very important in drug discovery. The latter achievement won DeepMind’s founders the 2024 Nobel Prize in chemistry.
There are yet more AI research approaches. Causal reasoning AI attempts to model cause-and-effect relationships rather than just correlations. It goes beyond recognising patterns and relationships in data and seeks to predict the effect of changes. For example, it will try and predict whether an outcome would still occur if a different action were taken. In the world of healthcare, a causal AI system would analyse historical patient data to identify which treatments have better health outcomes, and not just correlate treatments with effects. Such a system would be able to predict how a patient’s health would change if a particular treatment were changed. This approach has been applied in a number of areas. In one case a US city applied causal AI to traffic accident data and found that adding protected bike lanes would reduce accidents significantly. Some banks have used causal AI to identify the causes of loan defaults rather than just note correlations. Uber has used it to understand how pricing changes and incentives cause shifts in driver behaviour.
At this point in time, no one can really know what the future of artificial intelligence will look like. Perhaps one of the various approaches will turn out to be best, or maybe a complementary approach, combining different AI techniques, will prove most effective. As Nobel Laureate Niels Bohr said, “Prediction is very difficult, especially if it is about the future.”