Image: Unsplash

OpenAI Sued Over “Unprecedented” Data Scraping, Use of Personal Info

More than a dozen underage individuals have filed suit against OpenAI and its partner/investor Microsoft in connection with the development and marketing of generative artificial intelligence products like ChatGPT, Dall-E, and Vall-E, which allegedly involves the scraping of ...

June 29, 2023 - By TFL

Image : Unsplash

Case Documentation

OpenAI Sued Over “Unprecedented” Data Scraping, Use of Personal Info

More than a dozen underage individuals have filed suit against OpenAI and its partner/investor Microsoft in connection with the development and marketing of generative artificial intelligence products like ChatGPT, Dall-E, and Vall-E, which allegedly involves the scraping of “vast” amounts of personal data. According to the newly-filed complaint, OpenAI and the other defendants (collectively, “OpenAI) have “stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge” in furtherance of their creation and operation of the aforementioned generative AI programs, and they “continue to unlawfully collect and feed additional personal data from millions of unsuspecting consumers worldwide, far in excess of any reasonably authorized use, in order to continue developing and training the products.”

Not mincing words in the 157-page complaint, which was filed with the U.S. District Court for Northern District of California on Thursday, the plaintiffs, who are identified exclusively by their individuals (because they are minors), assert that OpenAI’s “disregard for privacy laws is matched only by their disregard for the potentially catastrophic risk to humanity.” While OpenAI’s products – and the technology on which they are built – “undoubtedly have the potential to do much good in the world, like aiding life-saving scientific research and ushering in discoveries that can improve the lives of everyday Americans,” the plaintiffs assert that following an “abrupt” restructuring by OpenAI in March 2019 from a nonprofit research organization to “a for-profit business that would pursue commercial opportunities of staggering scale,” things changed. “OpenAI abandoned its original goals and principles,” according to the plaintiffs, and instead, elected to “pursue profit at the expense of privacy, security, and ethics.”

In doing so, the San Francisco-based AI titan “doubled down on a strategy to secretly harvest massive amounts of personal data from the internet, including private information and private conversations, medical data, information about children – essentially every piece of data exchanged on the internet it could take –without notice to the owners or users of such data, much less with anyone’s permission,” the plaintiffs contend. And in fact, the plaintiffs argue that but for the use of others’ personal data, which is being collected and used at an “unprecedented scope” to train/improve the underlying AI models, OpenAI’s products would not have reached the level of sophistication they have today. After all, they assert that “the large language models responsible for the [generative AI] products depend on consuming huge amounts of data.” This means that “personal data of any kind, including conversational data between humans,” is valuable, as it is “how the products develop what appear to be such human-like capabilities.”

TLDR: OpenAI allegedly uses private information from millions of internet users, including minors, without their knowledge or consent. OpenAI collects training data, the plaintiffs assert, by secretly scraping it from the internet, as well as “taking personal information from the products’ 100+ million registered users without their full knowledge and consent.”

The plaintiffs state that OpenAI is now worth almost $30 billion, and “yet, the individuals and companies that produced the data it is scraping from the internet have not been compensated.” This action “seeks to change that, and in the process, protect the privacy rights of millions.”

Against that background the plaintiffs accuse OpenAI of violating: The Electronic Communications Privacy Act; The Computer Fraud and Abuse Act; California’s Invasion of Privacy Act and Unfair Competition law; Illinois’s Biometric Information Privacy Act, Consumer Fraud and Deceptive Business Practices Act, and Consumer Fraud and Deceptive Business Practices Act; and New York General Business Law s. 349, which prohibits deceptive acts and practices unlawful. Beyond that, the plaintiffs also set out negligence, invasion of privacy, intrusion upon seclusion, larceny/receipt of stolen property, conversion, unjust enrichment, and failure to warn causes of action.

In addition to seeking monetary damages, the plaintiffs are angling for injunctive relief in the form of a temporary freeze on commercial access to and commercial development of the OpenAI products. The plaintiffs are looking to get the court to require the companies to establish an independent council to approve uses of the OpenAI products before they are released; implement cybersecurity safeguards and accountability and transparency protocols; establish of a fund to compensate class members for the alleged misconduct, among other things; and confirm that they have “deleted, destroyed, and purged” the personal information all relevant class members “unless [they] can provide reasonable justification for the retention and use of such information when weighed against the privacy interests of class members,” among other things.

The Computer Fraud and Abuse Act

One cause of action that worth noting here is the plaintiffs’ claim under the Computer Fraud and Abuse Act (“CFAA”). A federal anti-hacking statute, the CFAA prohibits the accessing of protected computer systems “without authorization” or in excess of “authorized access.” Hardly untested, the statute, which was first enacted in 1986, has been routinely relied upon by platforms – like LinkedIn – that are looking to put a stop to the unauthorized scraping of data from their websites. And in fact, the outcome in hiQ v. LinkedIn – in which the scope of the CFAA was narrowed by the Ninth Circuit – could prove relevant fr the parties here.

The case got its start as a declaratory judgment action back in 2017, with hiQ Labs filing suit after it received a cease-and-desist letter from LinkedIn, accusing it of violating the CFAA and LinkedIn’s terms by collecting publicly available data from the LinkedIn platform. Fast forward to 2022 and in a second decision (following a remand from the Supreme Court), the U.S. Court of Appeals for the Ninth Circuit held that hiQ’s activities did not run afoul of the CFAA. Specifically, the appeals court determined that hiQ was not acting “without authorization” since the data scraped from LinkedIn was publicly available.

In short: The Ninth Circuit held – for a second time – that the concept of “without authorization” (under the CFAA) does not apply to publicly-accessible information on websites, thereby, solidifying what has been characterized as “a monumental ruling that data scraping is legal in certain circumstances.” Among other things, the Ninth Circuit held that an overly broadly reading of the CFAA would enable companies like LinkedIn to exercise “free rein to decide, on any basis, who can collect and use [publicly available data],” which is not in the public interest.

While it is not clear what – exactly – all of the sources of OpenAI’s data are and whether they are all public-facing, it is worth noting that the Ninth Circuit’s opinion in the hiQ case is limited to publicly available information, which means that companies in the business of scraping the web “could still be liable under the CFAA if they are scraping information from websites that require authorization or access permission, such as password authentication.” Farella Braun + Martel’s Erik Olson and Sushila Chanana stated at the time of the court’s decision that the Ninth Circuit held that data aggregators “could bring other, non-CFAA claims, against scraping entities even when those companies are scraping public information, such as breach of contract and copyright infringement claims,” ultimately making scraping law something of a nuanced area and one that is still in its early stages.

The case is Plaintiffs P.M., K.S.., et al., v. OpenAI LP, et al., 3:23-cv-03199 (N.D.Cal.)