A recent report from Human Rights Watch has revealed that photos of Australian children have been used without consent to train artificial intelligence (AI) models that generate images. The photos were found in a large data set called LAION-5B, which contains billions of images paired with captions and was created using publicly available internet content. Companies use data sets like LAION-5B to teach their generative AI tools what visual content looks like. However, the use of these photos without consent raises concerns about data protection and consumer protection laws. The German nonprofit organization LAION, which maintains the data set, has pledged to remove the photos identified by Human Rights Watch. However, AI developers that have already used this data cannot undo its use in their AI models. The issue of privacy breaches also remains a broader concern.
The misconception that publicly available information is exempt from privacy laws is addressed in the article. In Australia, publicly available information can still be considered personal information under the Australian Privacy Act. The case of Clearview AI, a facial recognition platform that scraped people’s images from websites without consent, serves as an example. The Office of the Australian Information Commissioner ruled that even though the photos were already on websites, they still constituted personal information and sensitive information. Clearview AI was found to have breached privacy laws by collecting personal information without consent.
The article emphasizes the need for AI developers to be cautious about the origin of the data sets they use. It also discusses the potential enforcement of privacy laws in relation to LAION’s actions. If it is determined that Australian privacy laws have been violated, strong enforcement action by the privacy commissioner may be necessary.
The article mentions upcoming amendments to the Privacy Act in Australia, including proposals for a children’s privacy code. This recognizes that children are particularly vulnerable when it comes to the misuse of their personal information.
For parents, there are multiple reasons to avoid publishing pictures of their children on the internet, including the risk of unwanted surveillance, identification by individuals with criminal intentions, and the use of their images in deepfake images or child pornography. However, even if parents choose not to publish photos, their children’s images can still be captured and shared by others, such as daycare centers, schools, or sporting clubs. Therefore, it is important to hold tech companies accountable for their use of these images in AI training data rather than solely blaming parents.