The 26th of August 2025 may end up being quite a significant day when the definitive history book of AI is published. I had written previously about the unresolved issue of copyright associated with the training of large language models (LLMs), which underlie the generative AI technologies like ChatGPT, Claude, Perplexity, Gemini and also the image generators such as Midjourney, Stable Diffusion, Leonardo and Firefly. All LLMs rely on large amounts of training data, whether that is the text of books or social media posts, or images or videos. Much of that material has copyright protection, and yet AI companies have simply scraped the material up for training, claiming “fair use” of the material. As I have noted elsewhere, artists have started to fight back against this, not only in court, but in some cases by actively poisoning data against LLMs by use of tools such as Nightshade.
There was one particularly significant case because it had extended to a class action suit, with potentially huge damages at stake. This case, “Bartz v Anthropic”, was filed in August 2024 and due to go to trial in December 2025, but has just been settled by Anthropic, on terms that may be at least partially revealed in September 2025. This case is not entirely bad news for the AI industry, in that the judge had found that training data on lawfully purchased copyrighted books constituted fair use, given that LLMs do not merely copy out the material they are trained on, but use it to generate fresh output. However, the use of pirated libraries of books was found to infringe copyright. Even if an AI company did this and later tried to buy legal copies or licences, they would still be infringing copyright by their initial actions.
This is not actually the first AI-related copyright case to be settled. That honour goes to Vacker v Eleven Labs just a few days ago. However, the Bartz v Anthropic case is a landmark case because of the size of the claim (involving the use of 7 million books from pirated libraries, with potential damages of $900 billion) and its potential for how other similar lawsuits may be resolved. There are currently a host of AI copyright lawsuits rumbling through the judicial system, several dozen just in the USA. Others include Getty Images v Stability AI, The New York Times v OpenAI, Dow Jones v Perplexity AI and more. One database of these can be viewed here. Anthropic at least now has some uncertainty removed, since the scale of the case was potentially an existential threat to the company, which did about $3 billion in revenues last year, with a $61.5 billion valuation in its series E financing round in March 2025. Even with a touted series F valuation of $170 billion, the damages from the lawsuit could have finished off the company.
Publishers and AI companies, as well as government regulators, will need to work together to find a workable balance between the rights of copyright holders and the need for AI companies to train their LLMs on vast amounts of data. As and when more details of this settlement are revealed, it will be easier to understand the likely implications, but even now, this settlement is bound to have ramifications in other ongoing cases. As in so many aspects of life, the one group of people who are sure to prosper here are the lawyers involved.







