Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend and Vivek Srikumar
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.

Abstract

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized "pseudoword" vector as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting highly ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally "sense voids"—regions that do not correspond to any intelligible sense.

Links

Bib Entry

@inproceedings{karidi2021putting,
  author = {Karidi, Taelin and Zhou, Yichu and Schneider, Nathan and Abend, Omri and Srikumar, Vivek},
  title = {{Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords}},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2021}
}