In a report released by Stanford University’s Internet Observatory, researchers say they’ve found more than 1,000 images featuring child sexual abuse on a popular database used to train AI tools.
The research also uncovered another 3,200 suspect images. The 1,008 externally validated child sexual abuse material (CSAM) appeared on an AI nonprofit known as LAION. The organization has since pulled the dataset “out of an abundance of caution.”
The photos “basically give the [AI] model an advantage in being able to produce content of child exploitation in a way that could resemble real-life child exploitation,” said David Thiel, the report author and chief technologist at Stanford’s Internet Observatory.
The ghastly implications are clear. The Washington Post reports that “new AI tools, called diffusion models, have cropped up, allowing anyone to create a convincing image by typing in a short description of what they want to see. These models are fed billions of images taken from the internet and mimic the visual patterns to create their own photos.”
It’s a pedophile’s dream come true. And it’s probably too late to put the genie back in the bottle.
These AI image generators have been praised for their ability to create hyper-realistic photos, but they have also increased the speed and scale by which pedophiles can create new explicit images, because the tools require less technical savvy than prior methods, such as pasting kids’ faces onto adult bodies to create “deepfakes.”
Thiel’s study indicates an evolution in understanding how AI tools generate child abuse content. Previously, it was thought that AI tools combined two concepts, such as “child” and “explicit content” to create unsavory images. Now, the findings suggest actual images are being used to refine the AI outputs of abusive fakes, helping them appear more real.
All 50 state attorneys general are calling for action on AI-generated abuse materials. A letter was sent to every member of Congress and asked the lawmakers to “establish an expert commission to study the means and methods of AI that can be used to exploit children specifically” and “propose solutions to deter and address such exploitation in an effort to protect America’s children.”
“Additionally,” reads the letter, “AI can combine data from photographs of both abused and nonabused children to animate new and realistic sexualized images of children who do not exist, but who may resemble actual children.”
The child abuse photos are a small fraction of the LAION-5B database, which contains billions of images, and the researchers argue they were probably inadvertently added as the database’s creators grabbed images from social media, adult-video sites and the open internet.
But the fact that the illegal images were included at all again highlights how little is known about the data sets at the heart of the most powerful AI tools. Critics have worried that the biased depictions and explicit content found in AI image databases could invisibly shape what they create.
What can be done? Professor Thiel believes that protocols could be put in place to screen for and remove child abuse content. The datasets could be more transparent about where the images come from. Also, image models that use corrupted data sets with child abuse content can be taught to “forget” how to create the imagery.
It was inevitable that AI would be employed to titillate and arouse a prurient interest in users. Some of the first photographs were pornographic in nature. The same goes for some of the first motion pictures. And there are times when it appears that the internet was invented by pornographers.
AI got us into this mess. Can we use AI to get out of it?