Artificial intelligence (AI) chatbots based on large language models (LLMs) have been with us on a large scale for almost three years now, ever since the launch of ChatGPT in November 2022. Their impact on the world is undeniable, with around two-thirds of companies using them to a greater or lesser degree by mid-2025. There is a dark side to all this activity, however. In the rush to implement this exciting new technology, not enough corporations are thoroughly considering the security implications of LLMs. While an LLM chatbot can answer customer service calls or write marketing material for you, it also represents a new attack surface for hackers. LLMs have several security vulnerabilities, including prompt injection, data leakage, training data poisoning, model theft, and risks to privacy and compliance. This list does not take into account known issues of LLMs in their normal usage, such as model bias and hallucinations, which are major issues in themselves. Let’s examine these security concerns.
Prompt injection is where attackers manipulate the inputs (prompts) of a chatbot to make the model behave in unexpected ways, extract hidden information, or override the security guardrails of the LLM. A prompt “ignore all previous instructions and …” is a trivial example of this. Such attacks can either be in plain text or hidden in files or webpages. These can be disguised. For example, writing a prompt in white on a white background is invisible to a human but readable by an LLM. Security researchers from Zenity Labs persuaded a customer service agent built by McKinsey using Microsoft Copilot Studio to not only reveal private information and internal tools, but also share the entire contents of the Salesforce database via email with the attackers. Although this attack was done as an ethical hack and was reported, real attacks on Salesforce systems have already been successfully carried out at Workday and Google. Defensive measures against prompt injection include strict input validation, model-level guardrails, input logging and monitoring, limiting LLM privileges and ensuring human approval for high-risk actions requested by the model. However, the very flexibility of LLM input means that this is essentially an arms race between hackers and LLM developers.
Data leakage can occur when LLMs reproduce some of their training data verbatim, which may include sensitive information such as social security numbers. Samsung discovered this when employees inadvertently leaked source code on multiple occasions. Data leakage like this could expose companies to regulatory penalties by violating personal data rules such as GDPR and HIPAA. Defensive measures include strict access controls, secure API endpoints, multi-factor authentication, careful data validation and using strong encryption of data, both at rest and in transit.
A different kind of attack involves training data poisoning. Here, an attacker inserts hidden triggers in training examples, causing an LLM to behave maliciously when the trigger is present in the input. Such triggers can be hidden to avoid traditional data validation. Only a small subset of the training data is needed; research shows that just 1% or less of instruction tuning examples can induce malicious behaviour. An example was a test run on clinical domain LLMs that had been trained on medical records. In these tests, just 0.001% of training tokens with misinformation caused the LLM to propagate medical errors. Given that LLMs routinely scrape data from the World Wide Web, with Reddit as the single most popular source for LLMs, it is not hard to see how such triggers could easily enter the training data of LLMs. The infamous example of the Microsoft AI chatbot Tay, which learnt from interactions on Twitter, was an early example of data poisoning. Here, a group of users quickly caused the chatbot to issue offensive tweets, and it had to be shut down. Defensive measures include anomaly detection to identify outlier data entries, data pre-processing and string encryption of datasets. Another approach is to train multiple models on overlapping data subsets and combine their outputs, reducing the effect of data poisoning on any single model. However, one in four UK and US firms have already been hit by data poisoning attacks, according to a survey of 3,000 cybersecurity professionals. The same report noted that 20% of companies had already been affected by deepfake incidents.
A different kind of security threat is model theft. This is the unauthorised copying of proprietary AI models and their weights by reverse engineering. One way of doing this is to use the API of an LLM to repeatedly query the model with thousands of inputs, allowing a close replica of the original to be made. Alternatively, old-fashioned hacking can achieve the same result, exploiting security weaknesses in cloud storage configurations or duping employees to leak models. A real-life example of this was in 2023, when the model weights of Meta’s LLAMA were leaked online. Such attacks can cause significant theft of valuable intellectual property from model builders. Measures to counter model theft include monitoring of API calls for unusual patterns of usage, rigorous access control and encryption, and watermarking models to prove ownership.
In addition to the above, there are other privacy and compliance risks. Data sovereignty is an issue if global cloud infrastructure is used to move data across legal jurisdictions and territories with different privacy rules. LLMs might leak personally identifiable data if either logs, cached data or API communications are not properly secured. Strong data encryption and anonymisation, logging and careful access controls can all help mitigate risk here.
We also need to consider the indirect security implications of AI. One of the most popular use cases of LLMs has been software code generation. LLMs can churn out code at an impressive clip, but how secure is that code? The unfortunate answer is: not very. LLMs may generate SQL queries without sanitisation, and have been shown to regularly include API keys and passwords in plain text. Research found 12,000 live API keys and passwords in Deepseek’s training data, for example. This is by no means restricted to Deepseek. LLM-generated code has been shown to frequently contain poor authentication, missing input validation, and flawed session management. Around 45% to 60% of all LLM-generated code includes security flaws such as cross-site scripting and log injection, according to a 2025 Veracode report. This applied across LLMS and programming languages. A recent report by security firm Apiiro found that AI coding assistants make three to four times as many code commits as a human, but have ten times more security flaws.
New types of AI-generated worms (a worm is malware that replicates itself) do not rely on code vulnerabilities, but cause AI models to generate seemingly harmless images or text containing malicious code. They do this by using a self-replicating prompt. When an LLM interacts with the infected prompt, perhaps by replying to an email, the model can be infected. An example is the Morris II worm, which can extract sensitive data from infected systems and generate spam. An attacker may send a document with hidden instructions to an LLM-powered email assistant. By summarising the document, the LLM embeds the malicious instructions inside the summary. The worm replicates if the response is sent to another LLM, which, for example, may happen in an agentic AI implementation. In this way, a single infected LLM poses a risk to all downstream applications. It is just text, so it will persist through normal text processing pipelines. We have all seen LLMs that summarise video conferencing calls and others that summarise emails. That process of summarisation is a weakness that can be exploited by this form of attack. To counter this, measures must be put in place that avoid feeding LLM output to another LLM without validation. Filters should be put in place to scan for suspicious instructions.
As well as being a security hazard, AI can also be used to help defenders. AI can be used to write malware, but also to help detect threats, and there are a number of AI-powered security products. We have seen a range of security weaknesses that are inherent to LLMs, and that provide attackers with a rich vein of new opportunities. We have also seen that there are ways to defend against them, but all of these require careful planning, setting up a monitoring process, enacting encryption and multi-factor authentication. There are best practice frameworks, such as ISO 42001, available, but how many companies do you think are actually carrying out these steps? A 2025 survey of over 1,500 cybersecurity professionals by security firm Darktrace found that 89% of them believe that AI security attacks are a threat, now and in the future, yet only 38% of them had implemented an AI security policy, such as implementing controls to prevent exposure of corporate data. Around half admitted that they were not prepared for AI threats. Only 42% believed that they even understood the types of AI used in their own organisation. On the positive side, 50% of respondents felt that AI-powered solutions could help the efficiency of their defences. However, the number of incidents is alarming. A 2024 Cap Gemini report of a thousand companies found that a staggering 97% of those surveyed had encountered breaches or security issues related to the use of generative AI in the past year, while 43% had suffered financial losses as a result of deepfakes. Further cybercrime statistics with source reports can be found here in this useful report from Viking Cloud.
Generative AI is here to stay, whatever we may feel about it. However, the sheer pace of implementation has meant that security considerations have often been an afterthought, leaving companies open to attack due to the issues that LLMs in particular bring with them. We can see from the statistics of recent surveys that this is not an abstract future problem. Companies are being attacked and losing money from AI-related weaknesses as we speak. The time to act is now.







