Does Training a Generative AI Model Using Copyrighted Material Require a License?
The rise of generative AI tools in the past year has resulted in important questions involving copyright law that have not yet been addressed by courts, although there are several cases of first impression on these issues currently pending before various courts, including lawsuits pending against Stability AI, Midjourney, and DeviantArt.
One of the key issues involves the fact that generative AI relies on models that are trained using large amounts of data, and in the process of that training, make interim copies of the training materials. For example, for generative AI to be able to generate a picture of a horse, it is trained using interim copies of thousands (or millions) of pictures of horses to gain an understanding of what a horse looks like so that it can generate a new picture of a horse. Most or all of the pictures that are copied for training purposes are protected by copyright.
This process raises the as-yet unresolved question of whether these “interim copies” of copyrighted material infringe the rights of the copyright holder and thus require a license to use.
In determining whether this use constitutes copyright infringement, courts will likely have to determine whether the interim copies that are made to train the AI model infringe the copyright owner’s exclusive right to reproduce the training data and whether the output of the generative AI constitutes a derivative work of the copyrighted training data. In making that determination, courts will likely consider factors such as whether this use is considered to be a de minimis use and whether any similarities between the output of the AI and the underlying work are due to similarities in protected or unprotected elements of the underlying work.
Proponents of generative AI tools will also likely argue that such use continues fair use because the copies are made for the purpose of gaining an understanding of unprotected elements of the training materials rather than to copy protected elements. An analysis of a fair use defense is complex and fact intensive and requires that the court examine four factors set forth in the Copyright Act (including the effect of the use on the market or potential market for the original work). Proponents will also likely point to other areas where courts have held that interim copying is fair use, such as in the context of reverse engineering video games. In contrast, copyright holders will argue that it is not fair use because the AI tools are generating works that are of a similar nature to the copyrighted works used to train the AI model and thus are not sufficiently transformative under the fair use test.
These and other legal issues involving generative AI will be an emerging and evolving area of law in the coming years. In the meantime, those involved in using copyrighted material to train generative AI models will be required to consider the current state of the law on those issues and the legal risk involved in various courses of action. If you need assistance navigating these unclear legal waters, please feel free to contact us to discuss.