Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords

Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend and Vivek Srikumar

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.

Abstract

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized "pseudoword" vector as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting highly ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally "sense voids"—regions that do not correspond to any intelligible sense.

Bib Entry

@inproceedings{karidi2021putting,
  author = {Karidi, Taelin and Zhou, Yichu and Schneider, Nathan and Abend, Omri and Srikumar, Vivek},
  title = {{Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords}},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2021}
}

Vivek Srikumar