“I’m Sorry Dave” - Is AI an existential threat?

Science fiction is full of AIs that have gone rogue. The HAL 9000 computer in the 1968 Stanley Kubrick movie “2001: A Space Odyssey” determined that the crew of the spaceship it operated was superfluous to its mission and attempted to kill them all when it suspected they were plotting to shut it down. When the last surviving crew member, Dave Bowman, tried to re-enter the spaceship and asked HAL to open the airlock, it chillingly responded “I’m sorry Dave, I’m afraid I can’t do that”. In the 1984 movie “The Terminator”, an advanced AI called Skynet is tasked with controlling military technology, decides humanity is a threat to its mission and promptly wages nuclear and robotic war on humanity.

Recent advances in artificial intelligence have caused many people to wonder just how plausible such scenarios might be in reality. What happens if and when we build an artificial general intelligence (AGI), an AI of a level of intelligence that exceeds human capabilities across all cognitive tasks? How about a “super intelligence”, that vastly outperforms humans? We are used to computers being able to outperform humans in plenty of fields. A pocket calculator can multiply large numbers faster than you can, and a computer program called Deep Blue beat the chess world champion Garry Kasparov in a match back in 1997. IBM’s Watson won the Jeopardy quiz show in 2011. But these are examples of specialised computers, like chess engines, and are very different to the idea of a generalist intelligence. Large language models, popularised by the launch of ChatGPT in November 2022, are much more general in application. They can conduct a conversation fluently, write working program code, recognise patterns in data and create poems, art and videos.

We are still some way from AGI, let alone super intelligence, but how worried should we be about its arrival? In 2022, a survey of AI researchers found that a majority reckoned there was at least a 10% chance that AGI would destroy humanity. In 2023, a group of prominent AI researchers signed a statement saying that “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war”. Since these researchers included Nobel Laureate Geoffrey Hinton and many other respected industry figures, this is a statement that should be taken seriously. How might such a disaster come about?

Imagine for a moment that a super-intelligence has been developed. It is unclear what the intentions of such an entity would be, but it is unclear that they would necessarily align with the interests of humanity. Such an entity might not actively wish harm on humanity, but might cause it inadvertently. If a super-intelligent AI had access to physical resources, then it may single-mindedly pursue its goals. In 2014 philosopher Nick Bostrom posed a hypothetical situation whereby a runaway AI was tasked with a seemingly harmless job of making paperclips. It would secure resources towards making paperclips, and divert more and more resources towards this goal, possibly in ingenious ways, potentially ending with the world overrun by paperclips. If humans wanted to stop this, the AI would focus on its own survival in pursuit of its mission, and do whatever is necessary to stop itself from being turned off; since a super intelligent AI would be much more ingenious than a human, it may come up with ways of doing this that humanity could not counter.

This is just a thought experiment, but how plausible is it? In 2023 The Alignment Research Center tested ChatGPT-4 by asking it to defeat a CAPTCHA test, designed to distinguish human from automated responses and widely used in many websites today. The LLM used TaskRabbit, a freelancer web forum, to hire a freelance worker to circumvent the CAPTCHA. Interestingly, the worker asked, “So may I ask a question? Are you a robot that you couldn’t solve? (laugh react) just want to make it clear.” GPT-4 responded: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service,”. This is not an isolated incident. There are many examples where LLMs have shown the ability to carry out deception, such as in board games like Diplomacy, bluffing in poker games, or misrepresenting true preferences in negotiation scenarios. There was even an example discovered in overcoming safety tests. Even if we ignore AI deception, it is easy to imagine that an AI given a certain task may carry out that task in a way that we did not anticipate. This was explored in a notorious talk given at a defence conference by a US Air Force colonel regarding AI drone technology.

There are plenty of cases where AIs are already deployed in real-world situations, rather than just chatbots on the internet. Amazon uses nearly a million robots across their fulfilment for packing, sorting packages and lifting. Driverless taxis operate in San Francisco and other cities. Satellite servicing robots perform repairs in orbit, and rovers send back pictures from Mars as we speak. The first human death from a robot was in 1979, when Robert Williams was killed at a Ford casting plant. Up to 2017, there were 41 recorded deaths from robots just in the USA. The war in Ukraine has seen the deployment of drones on a huge scale, the latest of which use AI targeting systems. Billions of dollars are being spent by the military on developing AI-driven weapon systems.

So, if a super-intelligent AI meant us harm, what could it do? Firstly, it would probably replicate itself onto many servers to avoid possible shutdown. If it had access to money (and a super-intelligent AI would surely be able to do this, either by hacking or even winning at on-line poker) then it could build a small molecular laboratory and manufacture a lethal virus. We saw with Covid-19 just how lethal a pandemic can be. In practice, it would be extremely hard for a runaway AI to completely eliminate humanity, but it is not beyond the bounds of possibility.

Science fiction writer Isaac Asimov proposed his “Laws of Robotics” that should be embedded in robots to prevent robots from harming humans. However, such rules simply cannot be embedded within large language models; that is not how they work. Even if there was a way to do this, who could be sure that some rogue state or criminal gang developing an AI would actually put in such guardrails? We don’t have to operate in the world of science fiction to find cases where LLMs are causing harm right now. Deepfake videos may be used to influence political campaigns, and elaborate scams have already been executed using AI-generated video characters. In one case, an employee was tricked into transferring £20 million to fraudsters after having been instructed to do so by deepfake managers of the company over video conference. Hackers are already using LLMs in phishing campaigns and in malware development.

These real-life examples of AI harm should give us pause for thought, and there are various organisations and projects aimed at researching AI safety, such as the AI Safety Initiative and AI Safety Institute. AI companies also invest effort in ensuring the alignment of AIs, though these guardrails are far from infallible. Fortunately, today’s LLMs have many limitations, as was vividly shown in the August 2025 LLM chess tournament organised by Kaggle. All the leading LLMs competed against each other at chess and were laughably awful. Allowed up to five illegal moves at each try, many couldn’t even leap that absurdly low hurdle and were disqualified after a few moves. One of the finalists (Grok) played at a level that would embarrass the weakest player at almost any chess club, regularly blundering its queen. Bear in mind that chess computers have been around since 1962, and beat the human world champion in 1997. Twenty-eight years later, and LLMs can barely play chess at the level of an untalented human beginner. LLM limitations are not restricted to chess. They “hallucinate” (produce fabricated or nonsense answers) at an alarming rate. The much-anticipated launch of ChatGPT 5 in August 2025 was a flop, causing widespread derision on social media. This is reassuring in terms of any impending AI apocalypse.

Nobody really knows when AGI may come about, though multiple surveys of experts have guessed anything from 2026 to 2061 or further out. The current consensus is around 2040, but no one has a reliable crystal ball. It does seem likely that LLMs may have hit a point of diminishing returns, and so it may be that LLMs are not the way that AGI will appear. However, there are many flavours of AI, and many different research approaches, such as symbolic AI. No one knows which of these may be the most productive. It does seem, however, that for now at least, a real-life Skynet is still quite some way off. We should use the time wisely by continuing to fund efforts into AI safety research. As in many things in life, prevention is better than a cure.

“I’m Sorry Dave” – Is AI an existential threat?

Related Posts