OpenAI Argues Fair Use in Bid to Trim Authors’ Copyright Lawsuits

Image: Unsplash

OpenAI Argues Fair Use in Bid to Trim Authors’ Copyright Lawsuits

OpenAI is pushing back against a couple of lawsuits waged against it by authors that claim that it is running afoul of copyright law as a result of its use of large amounts of data – including the text of books that the plaintiffs have authored without their ...

August 31, 2023 - By TFL

OpenAI Argues Fair Use in Bid to Trim Authors’ Copyright Lawsuits

Image : Unsplash

Case Documentation

OpenAI Argues Fair Use in Bid to Trim Authors’ Copyright Lawsuits

OpenAI is pushing back against a couple of lawsuits waged against it by authors that claim that it is running afoul of copyright law as a result of its use of large amounts of data – including the text of books that the plaintiffs have authored without their authorization – to train the models behind ChatGPT. Setting the stage in a motion dismiss that they filed with the U.S. District Court for the Northern District of California on August 28, seven OpenAI entities (“OpenAI”) claim that the plaintiffs, including comedian Sarah Silverman, filed suit on behalf of themselves and similarly situated individuals “because they believe their texts were a tiny part of the dataset” that OpenAI used to teach the models underlying its generative artificial intelligence platform, ChatGPT. 

In the newly-filed motion to dismiss, OpenAI takes issue with the plaintiffs’ vicarious copyright infringement, Digital Millennium Copyright Act, unfair competition, negligence, and unjust enrichment claims. The San Francisco-based AI giant – which is currently on track to generate more than $1 billion in revenue over the next 12 months from the sale of AI software – argues that the bulk of the causes of action in both Silverman’s class action lawsuit and the “near-identical class action complaint” that authors Paul Tremblay and Mona Awad also filed against OpenAI in July should be dismissed. Specifically, OpenAI argues …

The Bulk of the Claims

Vicarious Copyright Infringement: OpenAI claims that the plaintiffs fail here because: (1) they have not alleged direct infringement; and (2) they have also failed to plead facts to support the elements of a vicarious infringement, including the defendants’ “right and ability to supervise” the alleged infringement at issue and the defendants’ “direct financial interest” in the direct infringement at issue. 

At the same time, OpenAI aims to chip away at the plaintiffs’ vicarious infringement claims, stating that they are “based on the erroneous legal conclusion that every single ChatGPT output is necessarily an infringing ‘derivative work’ – which is a very specific term in copyright law – because those outputs are, in only a remote and colloquial sense, ‘based on’ an enormous training dataset that allegedly included [their] books.” Under this theory, “every single ChatGPT output … is necessarily an infringing ‘derivative work’ of  the plaintiffs’ books,” according to OpenAI, and “worse still, each of those outputs would simultaneously be an infringing derivative of each of the millions of other individual works contained in the training corpus – regardless of whether there are any similarities between the output and the training works.”

Digital Millennium Copyright Act: The plaintiffs DMCA claims are based on OpenAI’s alleged removal of copyright management information (“CMI”) during the ChatGPT “training process” and the alleged distribution of “derivative” ChatGPT outputs without the plaintiffs’ CMI. Pushing back here, OpenAI asserts that the plaintiffs offer no facts to support their theory that it “intentionally removed” CMI and instead, it contends that the plaintiffs actually “allege a number of facts that would contradict” such a claim. Among other things, OpenAI states that the complaints are “completely devoid of any explanation as to (1) how [it] might delete author names and publication years from the books in its training data, (2) why [it] would do such a thing, or (3) what the plaintiffs’ good-faith basis for believing this occurred might consist of.” 

In fact, OpenAI maintains that the plaintiffs’ own pleadings “suggest the exact opposite” of it removing the relevant CMI, as the ChatGPT outputs that they cite in connection with the complaint include “multiple references to the plaintiffs’ names.” 

Unfair Competition: Since the plaintiffs’ California state law unfair competition claim is predicated on its alleged DMCA violations, the claim fails, per OpenAI. Moreover, OpenAI argues that the plaintiffs have not alleged an economic injury flowing directly from the alleged DMCA violations, with the “only allegation as to harm arising from these alleged violations is a single sentence included in both complaints: ‘Plaintiffs have been injured by OpenAI’s removal of CMI.’” And still yet, the plaintiffs have also allegedly failed to plead facts that would justify any relief under California’s unfair competition law. 

Negligence & Unjust Enrichment: Among other issues, OpenAI asserts that the plaintiffs’ negligence and unjust enrichment claims are “overt attempts to reframe the complaints’ direct copyright infringement claims in the vernacular of California common law claims,” and thus, are preempted by Section 301 of the Copyright Act.

With the foregoing in mind, OpenAI seeks dismissal of the majority of the plaintiffs’ causes of action. 

OpenAI, Direct Infringement & Fair Use

Potentially even more interesting than the claims that OpenAI is looking to escape early is the one that it does not take issue with in its motion to dismiss: The plaintiffs’ direct copyright infringement cause of action. While OpenAI refers to the plaintiffs’ direct infringement claim in its attempt to sidestep vicarious liability (namely, arguing that the plaintiffs have not alleged direct infringement), “the defendants are not asking for a dismissal of the direct copyright infringement claim,” per Andres Guadamuz, a reader in intellectual property law at the University of Sussex. This is “surprising,” he says, but it is also likely an indication that OpenAI “fancies [its] chances in court and wants a fair use declaration for training” – presumably in order to help prevent the filing of additional suits of a similar nature in the future. 

What OpenAI does do in the motion to dismiss is delve into its fair use argument, stating that the plaintiffs’ claims “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.” Specifically, OpenAI states that “even where a defendant has prima facie infringed one of the [copyright holders’ exclusive] Section 106 rights by creating a ‘substantially similar’ copy or derivative work, Section 107 of the Act provides that ‘the fair use of a copyrighted work … is not an infringement of copyright.’” 

Shedding light on how it views its use of others’ works (note: OpenAI does not deny that it has used the works at issue (and other works) to train the ChatGPT models), OpenAI cites the 2021 decision in Google v. Oracle, in which the Supreme Court held that Google’s use of a portion of the Oracle Java computer program in Google’s Android operating system constituted fair use. Specifically, OpenAI states that “is not an infringement to create “wholesale cop[ies] of [a work] as a preliminary step” to develop a new, non-infringing product, even if the new product competes with the original.” 

A note on the scope of Google v. Oracle: It is worth noting that while SCOTUS sided with Google (6-2) in the long-running copyright fight against Oracle and that such an outcome likely weighs in favor of OpenAI and co., its fair use-focused determination is not without nuance. Among other things (including its statement about the need to “recognize that some works are closer to the core of copyright than others”), the majority held, “The fact that computer programs are primarily functional makes it difficult to apply traditional copyright concepts in that technological world. Applying the principles of the Court’s precedents and Congress’ codification of the fair use doctrine to the distinct copyrighted work here, the Court concludes that Google’s copying of the API to reimplement a user interface, taking only what was needed to allow users to put their accrued talents to work in a new and transformative program, constituted a fair use of that material as a matter of law.”

In reaching this result, the majority held that it “does not overturn or modify its earlier cases involving fair use.”

The cases are Silverman, et al. v. OpenAI, Inc., 3:23-cv-03416 (N.D. Cal.) and Tremblay v. OpenAI, Inc., 3:23-cv-03223 (N.D. Cal.).

related articles