Coarse Sense Inventory

A Coarse Sense Inventory for 85% Word Sense Disambiguation

Abstract

Word Sense Disambiguation (WSD) is the task of associating a word in context with one of its meanings. While many works in the past have focused on raising the state of the art, none has even come close to achieving an F-score in the 80% ballpark when using WordNet as its sense inventory. We contend that one of the main reasons for this failure is the excessively fine granularity of this inventory, resulting in senses that are hard to differentiate between, even for an experienced human annotator. In this paper we cope with this long-standing problem by introducing Coarse Sense Inventory (CSI), obtained by linking WordNet concepts to a new set of 45 labels. The results show that the coarse granularity of CSI leads a WSD model to achieve 85.9% F1, while maintaining a high expressive power. Our set of labels also exhibits ease of use in tagging and a descriptiveness that other coarse inventories lack, as demonstrated in two annotation tasks which we performed. Moreover, a few-shot evaluation proves that the class-based nature of CSI allows the model to generalise over unseen or under-represented words.

Reference

CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation

@inproceedings{lacerraetal:2020,
  title={ {CSI}: A Coarse Sense Inventory for 85\% Word Sense Disambiguation},
  author={Lacerra, Caterina and Bevilacqua, Michele and Pasini, Tommaso and Navigli, Roberto},
  booktitle={Proceedings of the 34th Conference on Artificial Intelligence},
  publisher={Association for the Advancement of Artificial Intelligence},
  year={2020}
}

Authors

Caterina Lacerra
PhD student @ Sapienza
lacerra [at] di.uniroma1.it

Michele Bevilacqua
PhD student @ Sapienza
bevilacqua [at] di.uniroma1.it

Tommaso Pasini
Post-Doc @ Sapienza
pasini [at] di.uniroma1.it

Roberto Navigli
Full Professor @ Sapienza
navigli [at] di.uniroma1.it

Acknowledgements

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.

This work was supported in part by the MIUR under grant “Dipartimenti di eccellenza 2018-2020” of the Department of Computer Science of the Sapienza University of Rome.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You are free to:

Share - copy and redistribute the material in any medium or format
Adapt - remix, transform, and build upon the material

Under the following terms:

Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial - You may not use the material for commercial purposes.
ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.