InfoTabS: Inference on Tables as Semi-structured Data
Vivek Gupta, Pegah Nokhiz, Maitrey Mehta and Vivek Srikumar
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
Abstract
In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them. We argue that such data can prove as a testing ground for understanding how we reason about information. To study this, we introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes. Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning. Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task, suggesting that reasoning about tables can pose a difficult modeling challenge.
Links
- Link to paper
- Code for experiments and data preprocessing
- Download the data
- Project homepage
- See on Google Scholar
Bib Entry
@inproceedings{gupta2020infotabs,
author = {Gupta, Vivek and Nokhiz, Pegah and Mehta, Maitrey and Srikumar, Vivek},
title = {{InfoTabS: Inference on Tables as Semi-structured Data}},
booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year = {2020}
}