
$1.5 billion. That's what Anthropic paid in the Bartz settlement — the largest AI copyright case in history. Millions of pirated books used to train Claude. Not a hypothetical scenario, not a future problem. This happened. And it changes everything.
For years, AI companies operated on a simple principle: train first, ask later. Massive datasets scraped from the internet — books, articles, images, music — without permission, without compensation, often without creators even knowing. That era is over. The courts have spoken, legislators are following. What this means for companies using AI? More than most realize.
The Bartz v. Anthropic settlement in fall 2025 was the breaking point. $1.5 billion — not for a faulty product, but for the way the product was made. Anthropic had used millions of copyrighted books from piracy databases to train its AI. The judge found: downloading and processing entire works goes far beyond anything that could qualify as fair use.
But Bartz was just the beginning. UMG v. Udio ended with a revolutionary agreement: artists must actively consent to training (opt-in) rather than having to actively object (opt-out). Warner Music settled with Suno, agreeing to train new models exclusively on licensed content.
And then there's The New York Times v. OpenAI — the case still underway that may have the most far-reaching consequences. The Times argues that ChatGPT can reproduce entire articles verbatim — which is hard to sell as "transformative use."
The AI industry's defense was elegant and simple: training on copyrighted material is transformative use. The model "learns" from data like a human learns from a library — it doesn't reproduce, it abstracts. An influential 2019 paper compared LLM training to reading books and then writing your own texts.
The courts saw it differently. Three arguments proved decisive:
The last point proved particularly decisive. Fair use requires that the use doesn't impair the market for the original. But when an AI system offers precisely the service the original work was created for — namely information and entertainment — that's market harm, not innovation.
Behind the billion-dollar settlements are real people. The illustrator whose style is reproduced by Midjourney — now by clients who would have hired her. The non-fiction author whose knowledge appears in ChatGPT responses — without attribution, without compensation. The musician whose voice is imitated by Suno — without ever being asked.
The Writers Guild of America Strike 2023 was a turning point. Screenwriters struck for 148 days — and a core issue was AI. The result: studios cannot use AI as a basis for screenplays, and AI-generated material cannot establish authorship. An important precedent that extends far beyond Hollywood.
Sarah Silverman, Michael Chabon, the Authors Guild with its 10,000 members — the list of plaintiffs keeps growing. And it's not just about money. It's about a fundamental question: Who owns creative work in a world where machines can replicate it?
The US Copyright Office guidance from May 2025 was a turning point — not because of a radical new position, but because of its clarity. The Office stated: when AI training competes with or diminishes the licensing opportunities of a work, the fair use analysis weighs against fair use.
What does this mean in practice? If a publisher offers licenses for using its texts — say for summarization services or research tools — and an AI company trains on those same texts without a license for exactly such purposes, that's not fair use. The existence of a licensing market is the benchmark.
The Office also recommended that Congress introduce a mandatory transparency requirement: AI companies must disclose which copyrighted works they use for training. This isn't legislation yet — but the direction is clear.
The agreement between UMG and Udio established a principle that could transform the entire industry: opt-in instead of opt-out.
Until now, AI training worked on the opt-out principle: everything on the internet can be used — unless the creator actively objects. The problem: most creators didn't even know their works were being used. And even if they did — objecting was technically complex and often ignored.
Opt-in reverses the logic: nothing can be used unless the creator expressly consents. This matches how every other industry works — a publisher asks before printing, a film studio licenses before using. That the AI industry claimed an exception for years will, in retrospect, be seen as one of the biggest blind spots in technology history.
For artists, opt-in means: control. They can decide whether and under what terms their works are used for training. For AI companies, it means: higher costs, but also a sustainable business model not built on systematic copyright infringement.
If you use AI tools in your organization — and most do — this affects you directly. Not as a theoretical consideration, but as a legal and business risk.
The question isn't whether your AI vendor trained cleanly. The question is whether you can prove it. In an increasingly regulated world, the provenance of training data is becoming a compliance issue — similar to supply chain transparency for raw materials.
Specific questions you should be asking your AI vendor:
Companies that don't ask these questions are taking a risk. Not just an ethical one — a financial one. The Bartz ruling showed: the costs can run into the billions.
The trajectory of the last two years is clear: the law is catching up with technology. What began as the Wild West phase of AI development — scrape everything, train everything, monetize everything — is giving way to a regulated market with clear rules.
This isn't bad news. On the contrary: licensed, transparent AI is the foundation for sustainable trust. Companies like Anthropic, which shifted to licensed data after expensive settlements, demonstrate: it works — and it creates a competitive advantage. Customers want to know that the tools they use are built on a clean data foundation.
The future doesn't belong to companies that scraped the most data. It belongs to those that earned the trust of creators, users, and regulators. Copyright isn't an innovation barrier — it's the foundation for fair innovation.
Want to make sure your AI solution is built on properly licensed data? Talk to us about transparent AI text analysis.


AI data centers will consume 1,050 TWh of electricity by 2026. What does AI's energy hunger mean for climate and environment?
David
23 March 2026

AI and the military: Anthropic rejected a $200M Pentagon deal, OpenAI stepped in. Where is the ethical line for AI and weapons?
David
11 March 2026