When rightsholders find out that their works have been used to train an AI model without their permission, this often makes them angry. Many have decided that as well as getting mad, they should try to get even, filing claims against the tech companies responsible for those models.
These claims provide a revealing glimpse into the urgency of the scramble for training data. While the courts are left to decide to what extent the existing law permits the use of copyright works to train AI, governments across the world are debating whether the law should be changed. In the middle of this mess, these are some of the questions that are keeping rightsholders up at night:
- How do I know if my works have been used for training AI models and, if so, which model(s)?
- How am I going to continue to earn a living selling content created by human beings if AI floods the market by generating practically unlimited volumes of similar works, much more quickly and cheaply?
- What business model(s) can give me a fair share of some of the revenues generated by AI products that arguably could not do what they do without the human-authored copyright works on which they were trained?
As is often the case with complex questions, there is someone out there trying to sell you answers that are clear, simple and wrong. This is not one of those times where two out of three ain’t bad! While we don’t claim to have all the answers, we aim to shed some useful light on the AI and copyright debate by providing a primer on:
- Two June 2025 court decisions in ongoing copyright infringement lawsuits that pit various high-profile US authors against Anthropic and Meta.
- The passionate debate over whether/how to update UK law in an effort to strike a fair balance between AI companies and rightsholders.
The court decisions
The specific details of the interim rulings in Bartz v. Anthropic and Kadrey v. Meta are – from a UK perspective – of less interest than the general principles that these cases raise. In his interim ruling in the Anthropic case, California District Court Judge William Alsup made the following observations:
- “From the start, Anthropic “ha[d] many places from which” it could have purchased books, but it preferred to steal them to avoid “legal/practice/business slog,” as cofounder and chief executive officer Dario Amodei put it.”
- “Another Anthropic cofounder, Ben Mann, downloaded…at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated.”
- “As Anthropic trained successive LLMs, it became convinced that using books was the most cost-effective means to achieve a world-class LLM. During this time, however, Anthropic became ‘not so gung ho about’ training on pirated books ‘for legal reasons.’”
- “[Anthropic] sent an email or two to major publishers to inquire into licensing books for training AI…but let those conversations wither.”
- “[Instead of that], Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form — discarding the paper originals. Each print book resulted in a PDF copy containing images of the scanned pages with machine-readable text.”
Although global copyright laws vary, they are reasonably consistent in terms of certain core rights that they confer on the copyright holder. Generally, only the person who owns the copyright in a work is permitted to:
- Reproduce (ie make copies) of that work
- Adapt the original copyright work to create a new work that is derived from it
On the face of it, you'd expect that if Anthropic had used a meaningful/substantial part of the copied books – or the digitised print books – to train its AI, this would amount to copyright infringement.
Under UK law, this is most likely correct (note that arguments in the recent Getty Images v Stable Diffusion trial in the High Court focused on which acts took place in the UK, and the question of whether a substantial part of the relevant works appeared in the AI model outputs).
In the US, the ‘fair use’ defence complicates things in respect of the lawfully purchased print books. In the Bartz case, District Judge William Alsup decided that Anthropic’s use of the digitised print books was permitted under the fair use doctrine. The following comments are representative of his reasoning:
- “The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”
- “[The] Authors contend generically that training LLMs will result in an explosion of works competing with their works…[but their] complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works.”
In the interim ruling in Kadrey v. Meta – two days after Alsup’s decision in Bartz v. Anthropic – District Court judge Vince Chabria responded directly to his colleague:
- “Using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.”
Chabria also neatly summarised the AI and copyright from a typical rightsholder’s perspective:
- “Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things.”
- “Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal. Although the devil is in the details, in most cases the answer will likely be yes.”
- “Another argument offered in support of the [AI] companies is more rhetorical than legal: Don’t rule against them, or you’ll stop the development of this groundbreaking technology. The technology is certainly groundbreaking. But the suggestion that adverse copyright rulings would stop this technology in its tracks is ridiculous. These products are expected to generate billions, even trillions, of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”
While both the Anthropic and Meta rulings have been reported as ‘wins’ for AI, the commentary above shows that the reality is more nuanced. For better or worse, the courts aren't likely to provide a definitive answer to the question of if/when it's legal to use copyright works to train AI without the rightsholder’s permission anytime soon, as:
- each case will be decided on its specific facts
- copyright is a territorial right and while copyright laws in many countries have core similarities, the legislation, case law, policy considerations and legal traditions all vary and are likely to lead to non-uniform outcomes
Is there a need to update UK law to strike a fair balance between AI and copyright?
If the courts are not going to resolve the uncertainty over the lawfulness of using copyright works to train AI (and legal commentators are sceptical regarding whether they will), there is more pressure on governments to legislate to clarify what should/shouldn't be allowed from a public policy perspective.
The UK government proposed two important changes to the existing law to attempt to strike a fair balance between AI and copyright in a consultation that closed on 25 February 2025. The consultation was rather more popular than such exercises usually are, attracting more than 13,000 responses. The government’s central proposal was to introduce “a data mining exception [to copyright] which allows right holders to reserve their rights, underpinned by supporting measures on transparency.”
The intention was to establish:
- a default rule that would permit AI companies to use publicly accessible copyright works to train AI
- a mechanism to allow rightsholders to opt out of having their work used for AI training by marking their digital content with a ‘machine readable’ notice saying something like ‘no AI please.’
It's fair to say that creative types of all stripes in the UK have not been best pleased with this proposal. Sir Elton John has been one of the most outspoken critics, describing the ministers responsible for the plans as “absolute losers.” There are real doubts over whether a technically workable ‘opt-out’ mechanism can be found. Even if such a mechanism could be found, some rightsholders find the idea of being required to mark their work with what they think of as the digital equivalent of a ‘please do not steal’ post-it note inherently offensive.
Perhaps the biggest problem for rightsholders seeking to enforce their copyright when their works are used to train AI is that they don’t know when it’s happening. In an effort to resolve this problem, Baroness Kidron – a film director and member of the House of Lords – tried very hard to introduce amendments to the Data (Use and Access) Bill that would have required AI developers to be much more transparent about the works used to train their models. The Bill was not, however, intended to deal with the guardrails around AI use; it focuses on adjusting the balance between data privacy, and access to data for the purposes of innovation. In a compelling piece of political theatre, the House of Commons kept rejecting the amendments, only for Kidron to return with a reworded version aiming to achieve more or less the same thing. While the Bill did eventually pass into law last month, Kidron’s amendment was not present.
We're still awaiting the government’s response to the AI and copyright consultation. In the meantime, we're told a series of workshops are planned in order to identify a solution that simultaneously:
- allows AI companies to train their models on copyright works, enabling the AI industry to continue to thrive, innovate and boost economic growth
- doesn't cause harm to the UK’s thriving and economically important creative industries.
When the UK government does legislate in this area (which shouldn't be expected any time soon), it's hard to see it avoiding difficult choices that would inevitably upset some people – and there is no certainty that new legislation would produce the intended effect in any case. For the time being, AI companies and creatives will have to navigate to present legal uncertainty for a good while longer, with the courts stepping in from time to time to influence proceedings.
If court decisions and legislation do begin to move against a training free-for-all, AI model developers will perhaps take comfort from the words of District Judge Chabria: “If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”
Our content explained
Every piece of content we create is correct on the date it’s published but please don’t rely on it as legal advice. If you’d like to speak to us about your own legal requirements, please contact one of our expert lawyers.