Can my work be used as training data for an AI system?
Of course, the training data used to train an AI system must not be illegal. That means not only that personal data may not be processed without permission, or that child pornography or terrorist content may not be included in the database, but also that copyright must be taken into account.
It has been said that what AI does, creating something based on what it has taken to itself, is very similar to what human creators do. In fact, just like a human creator who is inspired by its environment, the AI is "inspired" by what it takes to itself. Yet this situation is quite different. AI is not inspired, but processes and thus reproduces images as being data, restructures them, and then uses it in a different way. These are copyright relevant acts, unlike a human creator who is inspired by his environment to create something new: he does not necessarily make one-to-one reproductions to achieve this creation.
Hence, the lawmakers have already spoken out about this. Based on a European directive, two exceptions to copyright were introduced in 2021 that deal with text and data mining (TDM). According to the law, TDM is "an automated analysis technique aimed at decomposing text and data in digital form to generate information such as, but not limited to, patterns, trends and interrelationships." Such a technique is also used to train AI to generate certain output. These exceptions in the law make clear in which cases text and data mining of copyrighted work is allowed.
This is primarily when it is done by research organizations and cultural heritage institutions for the purpose of scientific research. They are likely to be allowed under this exception to use copyrighted works to train AI when they have lawful access to them. They may also create a database of their training data if they are properly protecting it.
The second exception covers use for other purposes, including commercial purposes. Text and data mining is also allowed for this use when the user has lawful access to the copyrighted content, but rights holders have the ability to "opt out" of use for these purposes. When they explicitly reserve copyright and thus it is clearly stated by them that the work may not be used for text and data mining, this exception cannot be invoked. Moreover, no database of the training data may be created. Thus, under these conditions, commercial companies may probably also train AI using copyrighted content.