Findings of EMNLP, 2020.
Abstract
While language embeddings have been shownto have stereotyping biases, how these biases affect downstream question answering (QA)models remains unexplored. We present UN-QOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.
Links
- Link to paper
- The code repository for this paper
- A visualization of the ideas in the paper
- See on Google Scholar
Bib Entry
@inproceedings{li2020unqovering, author = {Li, Tao and Khot, Tushar and Khashabi, Daniel and Sabarwal, Ashish and Srikumar, Vivek}, title = {{UNQOVERing Stereotyping Biases via Underspecified Questions}}, booktitle = {Findings of EMNLP}, year = {2020} }