Image: Unsplash

Google Angling for Dismissal in AI Lawsuit Accusing it of “Stealing” Data

Google is looking to escape a lawsuit accusing it of “stealing” web-scraped data and “vast troves of private user data from [its] own products” in order to build commercial artificial intelligence (“AI”) products like its Bard chatbot. On the heels of being sued in ...

October 19, 2023 - By TFL

Image : Unsplash

Case Documentation

Google Angling for Dismissal in AI Lawsuit Accusing it of “Stealing” Data

Google is looking to escape a lawsuit accusing it of “stealing” web-scraped data and “vast troves of private user data from [its] own products” in order to build commercial artificial intelligence (“AI”) products like its Bard chatbot. On the heels of being sued in a California federal court in July by eight unnamed individuals, Google argues in a new motion to dismiss that the plaintiffs fall short in pleading their unfair competition, negligence, invasion of privacy, and copyright infringement claims. At the same time, Google leans heavily into its assertion that generative AI brings with it “unprecedented promise to advance the human condition” and that the inability to train the underlying models using massive amounts of data “would take a sledgehammer not just to Google’s services but to the very idea of Generative AI.”

At a high level, Google asserts that the plaintiffs’ case “is framed at a sweeping level of generality.” The plaintiffs’ overarching theory, according to Google, is that it “found their personal information on the internet and used it to develop AI services like Bard.” The plaintiffs allege in their complaint that “for years, Google harvested [our personal and professional information, our creative and copywritten works, our photographs, and even our emails] in secret, without notice or consent from anyone.” In making such claims, the plaintiffs fail to provide “basic details,” per Google, including what “specific personal information … was allegedly collected by Google, how (if at all) that personal information appears in the output of Google’s Generative AI services, and how (if at all) [the plaintiffs] have been harmed.”

Without that information, Google claims that “it is impossible to assess whether the plaintiffs can state any claim and what potential defenses might apply.”

Even if this information was provided, Google asserts that the plaintiffs’ state law claims still must be dismissed for a number of reasons: (1) They have “failed to specify the info at issue or allege present, particularized harm,” and thus, do not plead an Article III injury in fact based on the collection or use of public info; (2) they allege negligence but fail to plead facts demonstrating a cognizable duty or injury; (3) they allege invasion of privacy but fail to identify the private info at issue and “actually admit that their info was publicly available”; (4) they allege conversion and larceny but fail to allege a property interest in their personal info, the conversion or theft of that info, or any injury; (5) they allege unjust enrichment, but “that is not an independent cause of action and [they] fail to plead facts supporting any quasi-contract”; and (6) they allege violation of California’s Unfair Competition Law but fail to allege “statutory standing or the requisite unlawful, unfair, or fraudulent conduct.”

Not limited to their state law claims, Google states that the plaintiffs’ lack of specificity “infects [their] copyright claims, as well.” For instance, Google argues that the plaintiffs do not allege “specific facts showing how any particular copyrighted work was infringed in the output of Google’s AI services, or how copyright management information was illicitly removed from any particular work.” Such specifics “matter not only for making out a claim, but also for Google’s defenses,” which include fair use.

Google argues that the plaintiffs’ negligence, conversion, larceny, unjust enrichment, and unfair competition causes of action fail for yet another reason. In claiming that Google “copied their information on the internet, used it to create an AI model, and/or further displayed it, all without their consent” as the basis for their property-based state-law claims, Google asserts that the plaintiffs are essentially “assert[ing] rights to control the reproduction and display of their creative content posted on the internet, and its use to create another work.” Since “those are copyright claims masquerading as state law property claims,” Google maintains that they are preempted by the Copyright Act and should be dismissed.

As for the plaintiffs’ federal claims for copyright infringement, which the plaintiffs lodge as a result of Google’s alleged use of copyright-protected works for the training of its models and Bard’s output, Google contends that the plaintiffs must show “substantial similarity between Bard or its output and the copyrighted expression in [plaintiff’s] book, but it does not even attempt to do so.”

A Point to Consider: One of the things worth noting here is Google’s emphasis on the “publicly available” nature of the information at issue. The issue of “publicly available” works is interesting from a copyright perspective (as well as a litigation strategy one), as there is a distinction between publicly available works/information and free-to-use works/information that seems to be getting glossed over here. The distinction is, of course, that certain publicly available work are, in fact, free to use. Works whose copyright protection have lapsed (i.e., works in public domain) are one example. On the other hand, there are works that are simultaneously subject to existing copyright protections (and registrations) and that are publicly available, but are not free to use.

The Books3 dataset – which consists of digital versions of some 196,000 pirated books in plain-text format – comes to mind here. That dataset has been identified as among the training materials in lawsuits waged by authors against generative AI model developers, including Facebook-owner Meta, as while the data was publicly available (i.e., it could be accessed freely via the web), the inclusion of the copyright-protected books in the datasets was done without the authors’ authorization. In light of the lack of consent by the authors whose works are included in that dataset and the subsequent (alleged) use of those works (by way of the dataset) by the likes of Meta and co. to train their generative AI models, a growing numbers of authors are waging copyright litigation.

All the while, the publicly-available vs. free-to-use distinction is made clear in a newly-filed copyright lawsuit waged against AI startup Anthropic, in which Universal Music and other publishers argue this exact point, asserting that “just because something may be available on the internet does not mean it is free for Anthropic to exploit to its own ends.”

So, what is Google’s aim here? Neil Turkewitz, president of Turkewitz Consulting Group, says that in his view, Google is “intentionally conflating ‘publicly available’ and ‘non-proprietary’ in [furtherance of] their desire to create a narrative that affects public perception, and that such a perception shapes the way juries and courts think about the issues.” This is “part and parcel of their overall embrace of efficient/predatory infringement: that if you can establish practices that are reliant on infringement, the ‘social utility’ of infringement becomes part of the legal analysis,” he says. And this is “pretty much [how Google] won Google v. Oracle, so, it works – at least with respect to a narrow category of works which excludes creative/cultural works.”

The case is J.L., C.B., K.S., et al., v. Google LLC, 3:23-cv-03440 (N.D. Cal.)