Generative AI systems – such as ChatGPT, GitHub’s Copilot, DALL-E – are becoming increasingly popular thanks to their applications in creating content, applying for jobs, and even eliminating recipe sagas. Trained on a vast body of copyright-protected data scraped from the web, these AI systems can create new works using users’ word prompts or “freestyle” based on the images uploaded by users (like Lensa AI’s magic avatar feature does). However practical, innovative, and accessible, many of these AI projects may sometimes be engaging in copyright infringement at two levels: at the machine learning level (see our article on that) and at the output stage (if AI’s creation is substantially similar to a copyright-protected piece of training data).
Infringement in General
By way of background, in accordance with U.S. law, copyright infringement takes place if: (1) there is a valid copyright in the original work; and (2) there was unauthorized copying of the original work (meaning that at least one of the exclusive rights under copyright was violated). The copying component of the copyright infringement test is proven if” (1) there is either evidence of factual copying or (2) there is a “substantial similarity” between the original and the infringing work.
Factual copying could be proven by direct (rarely available) or circumstantial evidence. Circumstantial evidence may include proof of AI’s access to the copyrighted work AND a “probative similarity” beyond independent creation between the original work and the AI’s output. A claimant in a copyright infringement case could obtain evidence that their copyrighted work was included in the machine training dataset. It may be readily available (there is a website that checks whether a popular text-image pair training dataset contains an image) or could be procured in a court-ordered discovery. Absent evidence of access to the copyrighted work, a “striking similarity” is enough to prove the copying.
The degree of similarity is a question of fact to be determined by the jury based on the evidence in the case, which may include expert evidence). In assessing the degree of similarity between the (allegedly) infringed and infringing works, courts consider whether the similar elements are unique, intricate, or unexpected; whether the two works contain the same errors; and whether it appears that there were blatant attempts to cover up the similarities. The existence of something that closely resembles the particular claimant’s artist signature in an AI output or a company’s watermark could potentially be an example of such evidence. Courts can use other criteria, such as “the total concept and feel,” which combines “objective” extrinsic and “subjective” intrinsic tests. All in all, the examination is factual and case specific.
But Who is to Blame?
In general, under the doctrine of direct infringement, the actor committing copyright infringement is the one most proximately positioned to the cause of the infringing event. Secondary infringement, on the other hand, occurs when there is a direct infringer, but a second party induces, contributes to, encourages, or profits from the infringement. The latter type of infringement is rooted in case law and takes the forms of contributory and vicarious infringement, with contributory infringement occuring when someone knows of the direct infringement and encourages, induces, causes, or materially contributes to it, and vicarious liability arising when someone has the authority and ability to control the direct infringer and directly profits from the infringement.
With most generative AI systems, end users do not make expressive choices but rather provide abstract verbal prompts, leaving the “creative” work to the AI. So, it appears the end user is unlikely to be the direct copyright infringer if the output is infringing.
Usually, the verbal prompts will take forms of ideas not subject to copyright protection (“Create a pop-art portrait of a blond actress”). On the other hand, users may either input requests that contain copyrightable material on which the output will be based (“combine these two actual paintings by Yayoi Kusama I uploaded”) or otherwise intentionally target a copyrighted work (“summarize Martin Luther King Jr.’s “I have a dream” speech). So, when the output work turns out to be substantially similar to a copyrighted work or otherwise passes the copyrighted infringement threshold, the end user may or may not be causing the infringement and thus, be liable.
In the case that the end user is directly liable, the AI company might be secondarily liable since: (1) it provided a product that is capable of producing infringing work and (2) it benefits from the infringing activity (for example, if the service is subscription-based). Sometimes, however, AI may return infringing outputs even when not reasonably expectable by the end user. In such a case, the AI company would be the only actor capable of exercising control over the infringing AI system since it conducted the machine learning and chose/built the datasets. Consequently, the AI company would likely be the direct infringer.
Apparently aware of the AI’s ability to produce output containing recognizable portions of training data used at the ingestion phase, many generative AI services include provisions in their terms and conditions disclaiming the companies’ copyright ownership of the output data and shifting the risk of liability for any infringement on to the end user of the AI. But is that enforceable in court?
Generally, contract provisions shifting the risk of civil liability (i.e., exculpatory and indemnification clauses) are commonplace and enforceable – provided they are not ambiguous in scope and do not violate public policy. Exculpatory clauses are contract provisions that generally absolve one party (the AI company in this case) from claims by another party (the end user). In contrast, indemnification clauses obligate one party (the end user) to compensate another party (the AI company) for third-party claims (here, those would be the claims of the training data copyright owners).
Generative AI is a powerful tool helping human creators cut down on content-generating costs, save time for more complex work, and conceive of new ideas. If AI creates something closely resembling a copyrighted piece of data it trained on, absent fair use, the copyright holder has a case against the person or entity who caused the infringing act.
The intersections of generative AI and copyright are exciting new domains with a potential for policymaking. In the meantime, we must proceed cautiously and mitigate the legal risks for AI companies and end users.
Diana Bikbaeva is a tech and intellectual property law attorney.