Plagiarism, presenting someone else’s words, ideas, or work as your own without giving proper credit to the source, is not new. The term was coined by the poet Martial in ancient Rome, around 80 AD. However, it has taken on a new lease of life with the widespread use of generative artificial intelligence (AI) based on large language models (LLMs). Products like ChatGPT, Claude, and Gemini can effortlessly produce lengthy essays on any subject at the whim of a human prompt, causing chaos in academia.
University lecturers and teachers cannot be sure whether a student’s essay was written by the student or by an AI.
In response, there has been an emerging industry of AI plagiarism detector tools, estimated to be worth perhaps $1.6 billion in 2024. These tools analyse sentence structures and writing patterns that are characteristic of AI-generated text. The latter tends to have greater uniformity and predictable phrasing than human-written text, so statistical modelling can predict whether a piece of text is written by an AI rather than a human. Certain phrases are commonly used by AI tools, such as “at its core”, and the common use of the “em dash” i.e., “—” punctuation by AIs has led to human authors avoiding it to pre-empt accusations that their text has been AI-generated. The AI plagiarism tools make bold claims such as “99% accuracy”, but the reality is very different. Indeed, some testing suggests that LLMs themselves can be better at spotting AI text than the (paid) specialist AI detection tools like GPTZero, ZeroGPT, Pangram, Quillbot and Copyleaks. In truth, these tools have very variable success rates of detection according to academic studies. The tools rarely achieve accuracy beyond 60-70%, and frequently flag up false positives i.e. declaring a high probability of AI generation even when the text is indisputably human-written. A particularly ludicrous example is the testing of the US Declaration of Independence, written in 1776. One AI detector claimed that this document was 97.5% likely to be AI-generated, and another that it was 98% likely to be generated by AI. Other indisputably human texts have had similarly nonsensical assessments by various AI detectors.
This is a serious problem, as the effect of accusing students or academics of cheating can have a major effect on careers. On the other hand, there is little doubt that students are making widespread use of AI for writing text. One UK survey found 7,000 proven cases, and this is likely to be just the tip of the iceberg. Remarkably, annual anonymous surveys have shown that over 60-70% of students admit to cheating before the release of ChatGPT in late 2022. In that sense, generative AI is just the latest tool to help students cheat. Some use of AI may be perfectly acceptable by students e.g. for brainstorming ideas or for research. However, there is concern that the widespread use of AI may actually dull cognitive faculties. One MIT study found that differences in brain activity (based on EEG scans) between students tested on various tasks, including essay writing, where some used LLMs and some did not. Use of LLMs correlated with poorer results in tests and in brain activity levels.
Outside the academic world, LLMs are colliding with copyright law. LLMs are trained on vast bodies of text, and generate notionally new content from a statistical analysis of their training data combined with the weights of each AI model. However, it has become apparent that some of the text used to train LLMs was used by AI vendors without the permission of the authors. Moreover, AIs can be asked to produce content in the style of a certain writer or, in the case of images, of a certain artist. This is a legal morass. Even the status of AI-generated content is unclear – who owns the copyright on that generated text, if anyone? There is now a host of lawsuits gradually making their way through the legal system regarding copyright and AI. One high-profile class action case was settled out of court by Anthropic in September 2025 for a minimum of $1.5 billion. Many more such cases are in the pipeline, affecting numerous AI companies. The AI industry has won some cases, such as Stability AI’s broad win over Getty Images in a UK court case in November 2025. However, Thomson Reuters won its case in February 2025 against ROSS Intelligence for using its copyrighted content to train a legal research tool. Many other cases are working their way through the system, such as the New York Times case against OpenAI and Microsoft. The situation with these copyright cases is fluid and unfolding.
Back in the academic world, it is clear that the AI genie is out of the bottle, so academic institutions need to review their policies regarding AI use by their students and indeed by their own faculty. Some teachers are using AI to assess student papers, and there are even tools on the market (such as Graide, Examiner AI, and CoGrader) specifically targeted for this. We are in danger of entering a world where students submit an AI-written essay that is then marked by another AI. At that point, some may question what is the value of academic education.
There is a broader societal issue. Over half of internet content is now AI-generated, by some estimates, and this proportion may rise to as much as 90% by 2026. When the vast majority of content on the internet is AI slop, and new LLMs are trained on this material, how easy will it be to establish what is true and what is false? With LLMs hallucinating fabricated content at an alarming rate, and with newer LLMs hallucinating more than the early ones, there will reach a point when a plurality of internet content may be fabricated by AI hallucination. At that point, who is to say what is true and what is false? There is a danger of information collapse, where AI generates content which gets indexed, new LLMs are trained on it, and hallucinations get worse in a vicious cycle. At some point, we will need mechanisms to counter this. Perhaps LLMs will be restricted to carefully curated training content, and search engines may need to be tweaked to value verified content (such as peer-reviewed scientific papers or trusted journalistic sites) much more highly than at present. This is an urgent problem, given the pace of AI-generated content. In an age where most internet content is AI-generated, how will we ever be sure what the truth is?







