A federal court in California has handed down a critical decision in a copyright lawsuit over the training of a generative artificial intelligence model. In a decision issued this week, Judge William Alsup of the U.S. District Court for the Northern District of California sided with Anthropic in part, finding that while the generative AI company’s use of copyrighted books to train its large language models may qualify as fair use, its acquisition and storage of pirated digital copies does not.
The Background in Brief: Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a copyright infringement lawsuit against Anthropic in August 2024, arguing that it built its generative AI product, Claude, by “stealing hundreds of thousands of copyrighted books.” Rather than “obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them.” In response, Anthropic argued that its use of copyright-protected books – whether downloaded from pirate sources or scanned from purchased print copies – was protected under the doctrine of fair use.
A Closely-Watched Win
In his June 23 opinion, Judge Alsup held that the Claude-maker’s decision to use millions of copyright-protected books, some scanned from legally purchased print copies, others downloaded from known pirate sites, raises distinct legal questions under the Copyright Act. The court ultimately drew a firm line between training AI models using lawfully acquired material (which it deemed fair use) and stockpiling pirated content under the guise of innovation.
In ruling on Anthropic’s motion for summary judgment, the court divided its analysis into two core uses: (1) the use of books to train language models and (2) the broader creation of a permanent internal library of books, many of them pirated. These uses received very different legal treatment.
> Training LLMs = Fair Use: On the issue of whether training Claude with copies of plaintiffs’ books constituted fair use, the court sided with Anthropic. Citing the U.S. Supreme Court’s ruling in Google v. Oracle and other transformative use precedent, Judge Alsup found that using copyrighted books to teach an AI to respond to new prompts with original text output was “quintessentially transformative.”
The court emphasized that the plaintiffs had not alleged direct output copying—that is, Claude did not regurgitate their text to users. Instead, the court likened the process to how humans read and internalize styles, themes, and writing structure. “This was akin to a reader aspiring to be a writer,” Alsup wrote. “Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different.”
> Scanning Purchased Books = Fair Use: The court also found that Anthropic’s digitization of millions of purchased print books was a permissible fair use, largely because the print copies were legally acquired and scanning merely facilitated internal storage and searchability. No additional copies were distributed externally.
This kind of format shifting – where a print copy is destroyed and replaced by a digital one – was found to be consistent with past rulings on media transformation, particularly Sony v. Universal and Authors Guild v. Google.
> Pirated Copies and Indefinite Retention = Not Transformative: Where Anthropic ran into legal trouble was in its decision to build its library using millions of pirated books downloaded from illegal sources. Even though only some of these books were later used for training, the company retained all of them indefinitely. Anthropic’s internal communications revealed that executives preferred piracy because it avoided the “legal/practice/business slog” of licensing books. “That rationale,” the court said, “cannot be squared with the Copyright Act.”
Crucially, Judge Alsup rejected the notion that a downstream transformative use (like training an LLM) could sanitize upstream infringement, especially when not all pirated books were even used for training. “Pirating copies to build a research library without paying for it … was its own use — and not a transformative one,” the order held.
Anthropic’s argument that all copying was in service of a higher transformative purpose was, in the court’s view, legally insufficient: “There is no carveout from the Copyright Act for AI companies.”
The Bigger Picture
The ruling is among the earliest substantive decisions in a series of high-profile copyright cases against AI developers. It provides early judicial validation for the argument that training AI models using lawfully obtained material can qualify as fair use – but also offers a clear warning that origin matters. AI companies that acquire training data through illicit channels, even with transformative goals, remain exposed to liability.
As content owners across publishing, fashion, and entertainment increasingly confront AI companies over the use of their intellectual property, the court’s decision underscores the importance of how – and from where – training material is sourced.
For now, the court has allowed the authors’ claims related to the pirated copies to proceed, while granting partial judgment in Anthropic’s favor on its transformative use defenses.
The case is Bartz v. Anthropic PBC, 3:24-cv-05417 (N.D. Cal.).
