MuLaN

Abstract

The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.

Reference

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation

@inproceedings{barba-et-al-2020-mulan,
  title     = {Mu{L}a{N}: Multilingual Label propagatio{N} for Word Sense Disambiguation},
  author    = {Barba, Edoardo and Procopio, Luigi and Campolungo, Niccolò and Pasini, Tommaso and Navigli, Roberto},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  pages     = {3837--3844},
  year      = {2020},
  month     = {7},
  doi       = {10.24963/ijcai.2020/531},
  url       = {https://doi.org/10.24963/ijcai.2020/531},
}

Authors

Edoardo Barba
PhD student @ Sapienza
barba [at] di.uniroma1.it

Luigi Procopio
PhD student @ Sapienza
procopio [at] di.uniroma1.it

Niccolò Campolungo
PhD student @ Sapienza
campolungo [at] di.uniroma1.it

Tommaso Pasini
Post-Doc @ Sapienza
pasini [at] di.uniroma1.it

Roberto Navigli
Full Professor @ Sapienza
navigli [at] di.uniroma1.it

Acknowledgements

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.

This work has also been supported by the PerLIR project (Personal Linguistic resources in Information Retrieval) funded by the MIUR Progetti di ricerca di Rilevante Interesse Nazionale programme (PRIN 2017).

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You are free to:

Share - copy and redistribute the material in any medium or format
Adapt - remix, transform, and build upon the material

Under the following terms:

Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial - You may not use the material for commercial purposes.
ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.