Oops, We Invented AGI: The Agentic Safety Gap

In recent years, there has been much speculation about the possibility of the emergence of artificial general intelligence (AGI). This is an AI that would match or exceed human intelligence in most or all tasks. Such an AI would reason, learn, remember, innovate and adapt to new environments without the need for training or retraining. Such an event could have widespread consequences, though it is almost certainly quite some way off. The current generation of large language models, which can do some very clever things, is far from this definition. AI optimists are predicting the occurrence of AGI in the coming years, though without a working crystal ball, any such predictions on timing are speculation. What is certain is that humanity should prepare for the advent of such an AI, since it would pose both opportunities and serious threats. Such an intelligence may cure diseases or help with climate change, but would certainly displace many jobs, and could be misused for cyberattacks or worse.

The assumption has been that such an AI would be a single monolithic model. However, an interesting paper, published by Google DeepMind on 18th December 2025, raises another possibility. What if, instead of a single superintelligent new AI model being created, AGI arose through a network of separate AIs? Just as humans combine their resources in a company, specialising in different tasks like engineering, sales and marketing, separate AI agent models can be combined or tasked to work together, each model having different complementary skillsets. Just as a series of well-organised humans can work together to achieve results that no individual could do alone, a group of AI agents could combine their resources and abilities to achieve things that individual agents could not. At present, individual agents struggle with lengthy tasks, and although they may be impressive in some areas, the same AI can struggle with embarrassingly easy tasks. A recent benchmark of real-world, multi-hour tasks completed by humans, such as certain tasks in coding or design, proved beyond the capabilities of the current leading models in most cases, with the best one achieving a 2.5% success rate, carrying out a range of tasks that humans had completed with ease. Nonetheless, just as an observer might be surprised that a group of humans could build the Pyramids or create a rocket that could go to the moon, it is conceivable that a group of agents could combine in a possibly unexpected way to result in a spontaneous, distributed AGI. In such a situation, with no pre-planned safeguards, negative consequences could ensue.

The DeepMind paper proposes a multi-layered approach to AI safety that would take into account such a possibility. To start with, such an agentic market (a network of agents interacting with one another) should be separated from the open internet and real-world financial systems. The output of these agents should have a human in the loop to review things, though this may be an imperfect safeguard on its own. Additionally, there should be reliable shutdown mechanisms, containment for individual agents, individual agent alignment and defences against malicious prompts. Careful monitoring should be established to provide alerts to behaviours that suggest an emerging AGI capability, and there should be regulations regarding standards, auditability, legal accountability and governance of such projects.

All this sounds sensible, but it is absolutely not the way things are being done today. Companies are creating networks of AI agents at a pace, with essentially no thought at all about such safety issues, let alone active measures to put such safety barriers in place. So far, no great harm has been done since, as we have seen, agents are currently in their infancy, barely able to reliably carry out simple tasks that humans can do with ease. However, we need to think ahead and consider what happens as the technology develops and improves. A huge amount of investment and a lot of brainpower is currently going into the development of new, improved and more powerful AI models. At some point in the future, an unexpected cascade effect could occur whereby a particular combination of agents achieves a level of autonomy and capability that its human creators did not anticipate. History is full of scientific advances that were unexpected. Penicillin was discovered by Alexander Fleming entirely by accident. The first heart pacemaker was developed by Wilson Greatbatch when he accidentally installed the wrong resistor in a circuit, unexpectedly producing rhythmic pulses instead of a smooth trace. Cosmic background radiation was discovered in a Bell Labs radio antenna when Arno Penzias and Robert Wilson tried to eliminate annoying background noise. The microwave oven was invented when a radar engineer, Percy Spencer, noticed that chocolate in his pocket had been melted by the equipment he was working on.

There are many more such examples where major advances in technology have occurred not by design but by accident and dumb luck. Given the potentially severe negative implications of AGI, it would be useful if some serious forethought were applied to the safety aspects of current AI development. At present, we are stumbling along with AI development, paying scant attention to model alignment and safety. We have already seen the negative consequences in developments such as widespread deepfakes and new cybersecurity risks and issues associated with LLMs. The consequences of unexpected and unplanned AGI emergence could be grave, and at present, we seem to be mostly relying on the relative ineptness of the current state of AI tools to protect us. Instead, governments should be introducing some level of AI safety regulation as a precaution, at the very least.

Oops, We Invented AGI: The Agentic Safety Gap

Related Posts