The Casual Conversations data set is composed of the same group of paid actors that Facebook previously used when it commissioned the creation of Deepfake videos
for another open-source data set (Facebook hoped people in the artificial-intelligence community would use that one to come up with new ways to spot technologically manipulated videos online and stop them from spreading). Cristian Canton Ferrer, research manager at Facebook AI, told CNN Business that the Casual Conversations data set includes some information that was not used when Facebook created the Deepfake data set.
Canton said paying participants — who had to spend several hours being recorded in a studio — seemed fair given what Facebook got in return. Participants in this data set can also tell Facebook to remove their information in the future for any reason, he said.
Canton knows much more work needs to be done to make AI systems fair. He said he hopes to get feedback from academic researchers and companies so that, over time, fairness can be better measured.
One area he is considering expanding on in the future is the way gender tends to be defined in data sets. Computers are typically tasked with looking at gender in a very narrow way
— as binary labels of “male” or “female” that may be applied automatically — while humans increasingly recognize gender with a growing number of terms that may change over time. In the Casual Conversations data set, participants were asked to self-identify as “male,” “female,” or “other,” Canton said.
“‘Other’ encapsulates a huge gamut of options there,” he said.