Training a facial-recognition system to identify people requires a slew of photos of faces — photos that are often gathered from the internet. It’s usually impossible to figure out if images you’ve uploaded are among them.
“It’s easiest to understand when it becomes more personal,” said Adam Harvey, an artist and researcher, who created the site with fellow artist and programmer Jules LaPlace, and in collaboration with the non-profit Surveillance Technology Oversight Project (STOP). “Sometimes it helps to have visual proof.”
To use the site, you must type in your Flickr username, the URL of a specific Flickr photo, or a hashtag (such as “#wedding”) to find out whether your photos are included. If photos are found, Exposing.ai will show you a thumbnail of each, along with the month and year that they were posted to your Flickr account and the number of images that are in each dataset.
A search of this author’s Flickr username turned up nothing. However, a search for some common hashtags yielded tons of results, but for unknown people: “#wedding” returned more than 103,000 photos used in facial-recognition datasets, while searches for “#birthday” and “#party” yielded tens of thousands of included images, with children’s faces in many of the first results.
As Harvey is quick to point out, Exposing.ai examines just a smidgen of the facial data that’s in use, as many companies don’t publicly reveal how they obtained the data used to train their facial-recognition systems. “It’s the tip of the iceberg,” he said.
For years researchers and companies have turned to the internet to collect and annotate photos of all kinds of objects — including many, many faces — in hopes of making computers better able to make sense of the world around them. This frequently includes using images from Flickr that carry Creative Commons licenses — these are special kinds of copyright licenses that clearly state the terms under which such images and videos can be used and shared by others — as well as pulling images from Google Image search, snagging them from public Instagram accounts or other methods (some legitimate, some perhaps not).
More datasets coming soon
Harvey initially planned to use facial-recognition technology to let you search for your own photos, but then realized it could surface photos of other folks that simply look similar to you. A text-based search of things like Flickr usernames and hashtags may be “less impressive” to people, he said, but it’s a more surefire way of showing whether or not your photos are included in datasets.
It’s unclear how people will react to learning more about how their photos are used. Casey Fiesler, an assistant professor at the University of Colorado Boulder who studies the ethics of using public data, has found that people have mixed responses to, say, learning their Twitter posts were used for research. They might be baffled, find it creepy, or not care at all. In the case of photos used for training facial-recognition systems, however, she suspects people won’t know what to do with the revelation that their images have been included.
“You see that your face is in there,” she said. “Then what?”
“There’s no real best-case scenario,” Harvey said. “There are only less-worse-case scenarios.”