Image: Unsplash

Author Takes Issue with “Unfair” Use of Copyrights in OpenAI Lawsuit

A new copyright lawsuit targeting OpenAI and Microsoft sheds light on what at least one plaintiff is asserting in an effort to get ahead of the fair use arguments being waged by the companies behind large language models. In the complaint that he filed with U.S. District ...

November 27, 2023 - By TFL

Image : Unsplash

Case Documentation

Author Takes Issue with “Unfair” Use of Copyrights in OpenAI Lawsuit

A new copyright lawsuit targeting OpenAI and Microsoft sheds light on what at least one plaintiff is asserting in an effort to get ahead of the fair use arguments being waged by the companies behind large language models. In the complaint that he filed with U.S. District Court for the Southern District of New York on November 21, Julian Sancton alleges that ChatGPT-creator OpenAI and investor Microsoft “have built a business valued into the tens of billions of dollars” thanks to their “rampant theft of copyrighted works.”In particular, Sancton claims that the defendants “pretend as if the laws protecting copyright do not exist” in order to train their AI-powered models using his book, Madhouse at the End of the Earth, along with “thousands, maybe more, [of other] copyrighted works – including nonfiction books.”

Setting the stage in his complaint, Sancton – a reporter and non-fiction author – claims that OpenAI and Microsoft have used “millions, maybe billions, of copyrighted works” as the basis to “calibrate the GPT models to produce human-like expression.” The problem, Sancton alleges, is that neither of the defendants have paid for the books used to train their models. “Nor have [they] sought to obtain – or pay for – a license to copy and exploit the protected expression contained in the copyrighted works used to train their models.” Instead, OpenAI and Microsoft “took these works; made unlicensed copies of them; and used those unlicensed copies to digest and analyze the copyrighted expression in them, all for commercial gain.”

The end result, according to Sancton, is “a computer model that is not only built on the work of thousands of creators and authors, but also built to generate a wide range of expression – from shortform articles to book chapters – that mimics the syntax, style, and themes of the copyrighted works on which it was trained.”

Seemingly anticipating the arguments that OpenAI and Microsoft will make in response to his direct and contributory copyright infringement claims, namely, that their use of the copyrighted works in the input/training phases amounts to fair use, Sancton argues that their “commercial copying of [his] work and works owned by [other authors in] the proposed class was manifestly unfair use, for several reasons.” As for those reasons, Sancton asserts …

– “For starters, even by OpenAI’s own description, the use is of the same kind and purpose that an ordinary reading consumer may use a book – to review the expression in it, that is, the order of words, presentation of facts, and syntax, among others. OpenAI has suggested that it uses the training data to ‘learn’ how words and concepts fit to together, much in the way a human learns.”

– “While OpenAI’s anthropomorphizing of its models is up for debate, at a minimum, humans who learn from books buy them, or borrow them from libraries that buy them, providing at least some measure of compensation to authors and creators. OpenAI does not, and it has usurped authors’ content for the purpose of creating a machine built to generate the very type of content for which authors would usually be paid.”

– “Even OpenAI has acknowledged that its use is unfair to creators. In his testimony before the Senate, former OpenAI CEO and current Microsoft employee Sam Altman admitted that ‘creators deserve control over how their creations are used, and what happens sort of beyond the point of releasing it into the world’ and that ‘creators, content owners need to benefit from this technology.’ Yet OpenAI has given creators and copyright owners zero control over how their works are used in the training process – and zero compensation for it.”

– And still yet, “OpenAI, in taking authors’ works without compensation, has deprived authors of books sales and licensing revenues. There is, and has been, an established market for the sale of books and e-books, yet OpenAI ignored it and chose to scrape a massive corpus of copyrighted books from the internet, without even paying for an initial copy. OpenAI has also usurped a licensing market for copyright owners.”

When OpenAI and Microsoft do inevitably wage fair use arguments in their defense, it will not be the first time that OpenAI has done so. The AI giant argued this summer in response to lawsuits waged against it by a number of authors, including Sarah Silverman, that the plaintiffs’ copyright infringement claims “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.” Specifically, OpenAI stated that “even where a defendant has prima facie infringed one of the [copyright holders’ exclusive] Section 106 rights by creating a ‘substantially similar’ copy or derivative work, Section 107 of the Act provides that ‘the fair use of a copyrighted work … is not an infringement of copyright.’”

Shedding light on how it views its use of others’ works to train the ChatGPT models, OpenAI cited the 2021 decision in Google v. Oracle, in which the Supreme Court held that Google’s use of a portion of the Oracle Java computer program in Google’s Android operating system constituted fair use. Specifically, OpenAI claimed that it “is not an infringement to create ‘wholesale cop[ies] of [a work] as a preliminary step’ to develop a new, non-infringing product, even if the new product competes with the original.”

It is worth noting that copyright experts have been quick to point to the limited scope of the Supreme Court’s decision, namely, the decision is expressly limited to functional computer code. As Rothwell Figg attorneys stated in a note at the time, “The Supreme Court’s decision appears to be a narrow one in that it addressed only the copied declaring code from the Java API (e.g., as opposed to the implementing code) and it did not disturb the statutory legal standard for fair use, itself.”

Cleary Gottlieb stated in a separate note that the “upshot” of the “highly case-specific [Google v. Oracle] decision” in is that it is “unlikely to directly dictate the result in future disputes.” At the same time, though, the Clearly lawyers asserted that the decision could have “far-reaching implications, as the Court emphasized that ‘fair use can play an important role in determining the lawful scope of a computer program copyright,’ and the fact that the Court found fair use even though the defendant used the copyrighted content in a commercial venture, after having tried and failed to negotiate a license that would have encompassed the content, will make the decision a friend of those seeking to invoke the doctrine in similar circumstances.”

The case is Sancton v. OpenAI Inc., Microsoft Corporation, et al., 1:23-cv-10211 (SDNY).