Data poisoning is a growing concern in the field of text-to-image generators. These generators are trained on large datasets of images to create suitable visuals based on user prompts. However, some generators have been trained using indiscriminately scraped online images, many of which may be under copyright. This has led to copyright infringement cases and accusations against big tech companies.
To combat this issue, researchers have developed a tool called “Nightshade.” Nightshade subtly alters an image’s pixels in a way that disrupts computer vision but remains undetectable to the human eye. When organizations scrape these “poisoned” images for training AI models, their data pool becomes contaminated. As a result, the generator may produce unpredictable and unintended results, such as turning a balloon prompt into an image of an egg.
The more “poisoned” images in the training data, the greater the disruption. Prompt results for related terms and keywords can also be affected. For example, if a “poisoned” image of a Ferrari is used in training data, prompt results for other car brands and related terms like vehicle and automobile may also be impacted.
To address this issue, stakeholders have proposed various solutions. One approach is to pay closer attention to the source and usage rights of input data, reducing indiscriminate data harvesting. Technological fixes like ensemble modeling, where different models are trained on different subsets of data to detect outliers, can also help identify and discard “poisoned” images. Audits and test batteries can be used to examine model accuracy.
Data poisoning is not a new concept in adversarial approaches to AI systems. It shares similarities with other strategies like using make-up and costumes to evade facial recognition systems. Concerns about the indiscriminate use of machine vision and facial recognition have prompted activists to develop adversarial make-up patterns that prevent accurate identification by surveillance systems.
In conclusion, data poisoning is a significant issue that raises questions about technological governance and the moral rights of artists and users. While some may view it as a nuisance to be solved technologically, others see it as an innovative solution to protect fundamental rights.