Image: OpenAI

OpenAI is looking to escape the lawsuit waged against it – and a number of other defendants, including Microsoft – for allegedly running afoul of the Digital Millennium Copyright Act (“DMCA”), and engaging in breach of contract, tortious interference, fraud, false designation, unfair competition, etc. in connection with Copilot, a subscription-based AI tool that it co-developed with GitHub. In response to the complaint filed against it in a federal court in Northern California in November by unnamed plaintiffs, OpenAI and related entities (collectively “OpenAI”) has lodged a motion to dismiss, arguing that in lieu of actually claiming that any of their code was used by Codex (the code-generator that powers Copilot), the DOE plaintiffs “allege a grab bag of claims that fail to plead violations of cognizable legal rights.” 

Setting the stage in its motion to dismiss, OpenAI asserts that the “essence” of the plaintiffs’ lawsuit is that Copilot (and therefore Codex) “generates snippets of code similar to the publicly available code that was used to train these two AI tools,” and does so “without also generating [the] copyright notices or open-source license terms that originally accompanied the code.” (Already, OpenAI calls foul on the plaintiffs’ allegations, arguing that this occurs only “rare,” with the plaintiffs’ own complaint pointing a study that found that the AI generators replicate existing code without abiding by the terms of the relevant license “1% of the time.”)

The primary problem, according to OpenAI, is that the plaintiffs “provide no allegation [in their complaint] that any code that they authored was used by Codex or generated as a suggestion to a Codex user” and similarly, “have not provided a single example nor alleged any injury that is concrete and particularized as to them.” Instead, they rely “entirely on generic descriptions of the alleged practices of the OpenAI entities to support their theory of injury.” As a result of their failure to sufficiently plead that they suffered a cognizable injury, OpenAI asserts that the plaintiffs lack the necessary standing under Article III, and thus, the complaint should be tossed out in its entirety. 

Beyond that, OpenAI takes issue with the plaintiffs’ claims on a bunch of other grounds, arguing, among other things … 

Preemption – OpenAI asserts that federal law, namely, the Copyright Act, “preempts [their state law] claims for tortious interference in a contractual relationship, unjust enrichment, and unfair competition, and accordingly,” thereby, providing another basis for dismissal.

DMCA – “Although the complaint is replete with allegations about alleged similarities between Copilot’s output and the code it was trained on,” OpenAI states that the plaintiffs do not assert a copyright infringement claim. While they do set out a DMCA claim, which should be tossed out, OpenAI asserts because they have failed to: “(i) identify specific works from which Copyright Management Information (‘CMI’) was removed, (ii) allege removal of CMI from identical copies, (iii) allege the requisite intent; (iv) allege distribution of works with removed CMI; or (v) allege false CMI conveyed in connection with copies of those works.” 

One of the particularly noteworthy elements here is OpenAI’s argument on the scienter front, maintaining that the plaintiffs “have not alleged facts sufficient to establish a substantial risk that any copyright infringement has occurred or that any future infringement is likely because of the removal of CMI, nor that any of the OpenAI entities had reason to know of any such likelihood.” In addition to the need to allege the “copying of protectible expression,” for instance, OpenAI states that the plaintiffs “would need to allege that any copying was not fair use—a heavy burden in light of the Supreme Court’s holding in the source-code context that ‘taking only what was needed to allow users to put their accrued talents to work in a new and transformative program . . . was a fair use of that material as a matter of law.’” (OpenAI points to Google LLC v. Oracle Am., Inc., 141 S. Ct. 1183, 1209 (2021) and Authors Guild v. Google, Inc., 804 F.3d 202, 225 (2d Cir. 2015).)

False Designation of Origin – The plaintiffs’ false designation of origin claim under the Lanham Act should also be dismissed, per OpenAI, which argues that “the Lanham Act does not provide a remedy for false attribution of authorship.” Citing Dastar, OpenAI asserts that a claim for false designation of origin “must relate to the origin of tangible goods, not the authorship of an intangible work like computer code,” and that the plaintiffs’ claim is “precisely the kind of false designation of origin claim foreclosed by [such] precedent.” 

THE BIGGER PICTURE: While copyright infringement is not a claim here (as OpenAI notes), the case, nonetheless, falls in line with a growing number of lawsuits centering on the unauthorized use of various types of works to train AI generators, a number of which raise the question of whether the use of copyrighted materials as training data for machine learning qualifies as fair use. 

In their own motion to dismiss, which includes many of the same arguments as this one, GitHub and Microsoft also allege that despite the plaintiffs claiming “software piracy on an unprecedented scale … they do not advance a copyright infringement claim at all.” This is “doubtless an attempt to evade the limitations on the scope of software copyright,” GitHub and Microsoft argue, “and the progress-protective doctrine of fair use.” Elsewhere in their motion, GitHub and Microsoft assert that the plaintiffs “do not even identify a copyrighted work,” and thus, allege “no invasion of their copyright interests—an allegation that would run headlong into the doctrine of fair use.” 

The case is J. DOE 3, et al., v. GitHub, Inc. et al., 3:22-cv-07074 (N.D. Cal.).