Biased BERT of Sesame Street

Bias is the topic of the day in AI, but bias have been here with us since forever. When modern statistics was born in breweries, bias in data wasn’t a big issue, but as soon as it entered the arena of what people are like and how they think, it had become the scary and elusive monster of the field. A textbook example of biased data ruining a survey is Ann Landers 1975 survey, which found that 70 percent of Americans would not have children, if they had to do it over again. The bias in that survey, as is the case in contemporary online surveys, stems from the fact that people who bother to respond are only those who feel strongly enough about the topic. Then you have a bias in your research: you thought you were learning something about the society at large, but instead only learned something about a bunch of weirdos who care respond to these surveys.

Moving forward in time, bias is now ubiquitous in AI, which is often considered an extension of statistics. The AI technology is rarely based on training and evaluation in situations which reflect the actual usage. Self driving cars are fed tons of data from California and Arizona, out of all places. Language models trying to understand your conversations are based on public dataset of emails from Enron, out of all places. Images on ImageNet which are used to power most of vision AI don’t even try to claim to represent any kind of population.

This statistical history of the concept of bias now conflicts with the meaning of “bias” as synonymous with prejudice when talking about human biases. This creates a lot of confusion. Take for example a typical machine learning application: credit scoring. An unbiased (in the statistical sense) credit scoring system will assign bigger credit risk to people in some neighborhoods. Based on that, a bank will be biased against the people living there and will refuse to extend credit to them. That’s a real and difficult problem.

Now, The New York Times weighed in with We Teach A.I. Systems Everything, Including Our Biases, scolding Google’s BERT and other similar systems for being biased (Disclaimer: I work at Google Research, but not on BERT):

But tools like BERT pick up bias, according to a recent research paper from a team of computer scientists at Carnegie Mellon University. The paper showed, for instance, that BERT is more likely to associate the word “programmer” with men than with women.

Which sense of the bias is this? In the statistical sense, I’d say it’s not biased, it truthfully reflects the fact that overall such correlation exists. In the sense of having a prejudice, I think it’s unfair to judge an all-purpose model trying to encode all sorts of useful statistics for doing just that. It should be the responsibility of the people using these models to make sure they are not prejudiced against someone. Publishing a model that calculates which words are most likely to appear in which context is like publishing the census showing that some neighborhoods are richer than others. It’s not the census fault if it’s misused.