Link Search Menu Expand Document

XL-WSD

An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation.

Paper Data Code Pretrained Models


Table of contents
  1. Cite
  2. Cite Single Resources
  3. Acknowledgments
  4. License

About

Transformer-based architectures brought a breeze of change to Word Sense Disambiguation (WSD) improving models’ performances by a large margin. The fast development of new approaches has been further encouraged by a well-framed evaluation suite for English, which allowed to keep track and fairly compare their performances. However, other languages remained mostly unexplored as testing data are available for a few languages only and the evaluation setting is rather matted. In this paper, we untangle this situation by proposing XL-WSD, a cross-lingual evaluation benchmark for the WSD task featuring sense-annotated development and test sets in 18 languages from six different linguistic families, together with language-specific silver training data. We leverage XL-WSD datasets to conduct an extensive evaluation of neural and knowledge-based approaches, including the most recent multilingual language models. Results show that the zero-shot knowledge transfer across languages is a promising research direction within the WSD field, especially when considering low-resourced languages where large pretrained multilingual models still perform poorly.


Cite

@inproceedings{pasini-etal-xl-wsd-2021,
  title={ {XL-WSD}: An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation.},
  author={Pasini, Tommaso and Raganato, Alessandro and Navigli, Roberto},
  booktitle={Proc. of AAAI},
  year={2021}
}

Cite Single Resources

  • English: WordNet \cite{miller-etal-1990-wordnet}, SemCor \cite{miller-etal-93-semcor}, English WSD Framework \cite{raganato-etal-2017-word}, Senseval-2 \cite{edmonds-cotton-2001-senseval2}, Senseval-3 \cite{snyder-palmer-2004-senseval3}, SemEval-2007 Task 17 \cite{pradhanetal-etal-2007-semeval2007}, SemEval-2007 Coarse Task 7 \cite{navigli-etal-2007-coarse}, SemEval-2010 Task 17 \cite{agirre-etal-2010-domain}, SemEval-2013 Task 12 \cite{navigli-etal-2013-semeval2013} and SemEval-2015 Task 13 \cite{moro-navigli-2015-semeval2015}.
  • Basque: WordNet \cite{pociello-etal-2008-wnterm},
  • Bulgarian: WordNet \cite{simovandpetya-2010-wordnet-bg},
  • Catalan: WordNet \cite{benitez-et-al-1998-wordnet-ca},
  • Chinese: WordNet \cite{huangetal-2014-wordnet-zh},
  • Croatian: WordNet \cite{raffaelli-etal-2008-wordnet-ch},
  • Danish: WordNet \cite{pedersen-et-al-2009-wordnet-da},
  • Dutch: WordNet \cite{postma-et-al-2016-wordnet-nl},
  • Estonian: WordNet \cite{vider-and-orav-2002-wordnet-et},
  • French: SemEval-2013 Task 12 \cite{navigli-etal-2013-semeval2013},
  • Galician: WordNet \cite{guinovart-2011-wordnet-eu},
  • German: SemEval-2013 Task 12 \cite{navigli-etal-2013-semeval2013},
  • Hungarian: WordNet \cite{mihaltz-et-al-2008-wordnet-hu},
  • Italian: SemEval-2013 Task 12 \cite{navigli-etal-2013-semeval2013} and SemEval-2015 Task 13 \cite{moro-navigli-2015-semeval2015}
  • Japanese: WordNet \cite{isahara-et-al-2008-wordnet-ja},
  • Korean: WordNet \cite{yoon-et-al-2009-wordnet-ko},
  • Slovenian: WordNet \cite{fiser-etal-2012-wordnet-sl},
  • Spanish: SemEval-2013 Task 12 \cite{navigli-etal-2013-semeval2013} and SemEval-2015 Task 13 \cite{moro-navigli-2015-semeval2015}.
@inproceedings{pociello-etal-2008-wnterm,
  title = "{WNTERM}: {E}nriching the {MCR} with a {T}erminological {D}ictionary",
  author = "Pociello, Eli and Gurrutxaga, Antton and Agirre, Eneko and Aldezabal, Izaskun and Rigau, German",
  booktitle = "Proc. of the Sixth International Conference on Language Resources and Evaluation",
  month = may,
  year = 2008
}
@inproceedings{simovandpetya-2010-wordnet-bg,
  author = "Kiril Simov and Petya Osenova",
  title ="{C}onstructing of an {O}ntology-based {L}exicon for {B}ulgarian",
  booktitle = "Proc. of LREC",
  year = "2010",
  isbn = 2-9517408-6-7
}
@article{benitez-et-al-1998-wordnet-ca,
  title="{M}ethods and tools for building the {C}atalan {W}ord{N}et",
  author="Ben{\'\i}tez, Laura and Cervell, Sergi and Escudero, Gerard and L{\'o}pez, M{\`o}nica and Rigau, German and Taul{\'e}, Mariona",
  journal="Proc. of ELRA Workshop on Language Resources for European Minority Languages",
  year=1998 
}
@article{huangetal-2014-wordnet-zh,
  author = "Huang, Chu-Ren and Hsieh, Shu-Kai and Hong, Jia-Fei and Chen, Yun-Zhu and Su, I-Li and Chen, Yong-Xiang and Huang, Sheng-Wei",
  title = "{C}hinese {W}ord{N}et: {D}esign, {I}mplementation and {A}pplication of an {I}nfrastructure for {C}ross-{L}ingual
 {K}nowledge {P}rocessing",
  year = 2010,
  journal = "Journal of Chinese Information Processing",
  volume = 24,
  number = 2,
  eid = 14
}
@inproceedings{raffaelli-etal-2008-wordnet-ch,
  title="{B}uilding {C}roatian {W}ord{N}et",
  author="Raffaelli, Ida and Tadi{\'c}, Marko and Bekavac, Bo{\v{z}}o and Agi{\'c}, {\v{Z}}eljko",
  booktitle="Fourth global wordnet conference (gwc 2008)",
  year=2008
} 
@article{pedersen-et-al-2009-wordnet-da,
  title="{D}an{N}et: the challenge of compiling a wordnet for {D}anish by reusing a monolingual dictionary",
  author="Bolette S. Pedersen and Sanni Nimb and J{\o}rg Asmussen and Nicolai Hartvig S{\o}rensen and Lars Trap-Jensen and Henrik Lorentzen",
  journal="Language Resources and Evaluation",
  year="2009",
  volume=43
}
 @inproceedings{postma-et-al-2016-wordnet-nl,
  title="{O}pen {D}utch {W}ord{N}et",
  author="Postma, Marten and van Miltenburg, Emiel and Segers, Roxane and Schoen, Anneleen and Vossen, Piek",
  booktitle="Proc. of the Eight Global Wordnet Conference",
  year=2016
}
@inproceedings{vider-and-orav-2002-wordnet-et,
  author="Vider, Kadri and Orav, Heili",
  title="{E}stonian {W}ord{N}et and {L}exicography",
  booktitle="Proc. of the Eleventh International Symposium on Lexicography",
  year=2002, 
}
@article{guinovart-2011-wordnet-eu,
  title="{G}alnet: {W}ord{N}et 3.0 do galego",
  author="Guinovart, Xavier G{\'o}mez",
  journal="Linguam{\'a}tica",
  volume={3},
  number={1},
  year=2011
}
@inproceedings{mihaltz-et-al-2008-wordnet-hu,
  title = "{M}ethods and {R}esults of the {H}ungarian {W}ord{N}et {P}roject",
  author = "M{\'{a}}rton Mih{\'{a}}ltz and Csaba Hatvani and Judit Kuti and Gy{\"{o}}rgy Szarvas and J{\'{a}}nos Csirik and G{\'{a}}bor Pr{\'{o}}sz{\'{e}}ky and Tam{\'{a}}s V{\'{a}}radi",
  booktitle = "Proc. of The Fourth Global WordNet Conference",
  year = 2008
}
@inproceedings{isahara-et-al-2008-wordnet-ja,
  author = "Hitoshi Isahara and Francis Bond and Kiyotaka Uchimoto and Masao Utiyama and Kyoko Kanzaki",
  title = "{D}evelopment of the {J}apanese {W}ord{N}et",
  booktitle = "Sixth International conference on Language Resources and Evaluation",
  year = 2008
}
@article{yoon-et-al-2009-wordnet-ko,
  title="{C}onstruction of {K}orean {W}ord{N}et",
  author="Yoon, Ae-Sun and Hwang, Soon-Hee and Lee, Eun-Ryoung and Kwon, Hyuk-Chul",
  journal="Journal of KIISE: Software and Applications",
  volume=36,
  number=1,
  year=2009
}
@inproceedings{fiser-etal-2012-wordnet-sl,
  author = "Fi{\v{s}}er, Darja and Novak, Jernej and Erjavec, Toma{\v{z}}",
  title = "{S}lo{WN}et 3.0: development, extension and cleaning",
  booktitle = "Proc. of 6th International Global Wordnet Conference",
  year = 2012
}
@inproceedings{agirre-etal-2010-domain,
    title = "{S}em{E}val-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain",
    author = "Agirre, Eneko and Lopez de Lacalle, Oier and Fellbaum, Christiane and Hsieh, Shu-Kai and Tesconi, Maurizio and Monachini, Monica and Vossen, Piek and Segers, Roxanne",
    booktitle = "Proc. of SemEval",
    month = jul,
    year = "2010"
}
@inproceedings{navigli-etal-2007-coarse,
    title = "{S}em{E}val-2007 Task 07: Coarse-Grained {E}nglish All-Words Task",
    author = "Navigli, Roberto and Litkowski, Kenneth C. and Hargraves, Orin",
    booktitle = "Proc. of SemEval",
    month = jun,
    year = "2007"
}
@inproceedings{moro-navigli-2015-semeval2015,
    title = "{S}em{E}val-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking",
    author = "Moro, Andrea and Navigli, Roberto",
    booktitle = "Proc. of SemEval",
    month = jun,
    year = "2015"
}
@inproceedings{edmonds-cotton-2001-senseval2,
    title = "{SENSEVAL}-2: Overview",
    author = "Edmonds, Philip and Cotton, Scott",
    booktitle = "Proc. of {SENSEVAL}-2",
    month = jul,
    year = "2001"
}
@inproceedings{navigli-etal-2013-semeval2013,
    title = "{S}em{E}val-2013 Task 12: Multilingual Word Sense Disambiguation",
    author = "Navigli, Roberto  and Jurgens, David and Vannella, Daniele",
    booktitle = "Proc. of SemEval",
    month = jun,
    year = "2013"
}
@inproceedings{snyder-palmer-2004-senseval3,
    title = "The {E}nglish all-words task",
    author = "Snyder, Benjamin and Palmer, Martha",
    booktitle = "Proc. of Senseval",
    month = jul,
    year = "2004",
}
@inproceedings{pradhanetal-etal-2007-semeval2007,
    title = "{S}em{E}val-2007 Task-17: {E}nglish Lexical Sample, {SRL} and All Words",
    author = "Pradhan, Sameer  and
      Loper, Edward  and
      Dligach, Dmitriy  and
      Palmer, Martha",
    booktitle = "Proc. of SemEval",
    month = jun,
    year = "2007",
}
@inproceedings{raganato-etal-2017-word,
    title = "Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison",
    author = "Raganato, Alessandro and Camacho-Collados, Jose and Navigli, Roberto",
    booktitle = "Proc. of EACL",
    month = apr,
    year = "2017",
}
@inproceedings{miller-etal-1993-semcor,
  title="A semantic concordance",
  author="Miller, George A and Leacock, Claudia and Tengi, Randee and Bunker, Ross T",
  booktitle="Proc. of the workshop on Human Language Technology",
  year=1993,
}
@article{miller-etal-1990-wordnet,
  author = "George A. Miller and R.T. Beckwith and Christiane D. Fellbaum and D. Gross and K. Miller",
  title = "Introduction to {W}ord{N}et: an Online Lexical Database",
  journal = "International Journal of Lexicography",
  year = "1990",
  volume = "3",
  number = "4", 
  doi="https://doi.org/10.1093/ijl/3.4.235"
}

Acknowledgments

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.

The authors gratefully acknowledge the support of the ERC Consolidator Grant FoTran No. 771113 under the European Union’s Horizon 2020 research and innovation programme.

The authors gratefully acknowledge the support of the ELEXIS project No. 731015 under the European Union’s Horizon 2020 research and innovation programme.

Authors also thank the CSC - IT Center for Science (Finland) for the computational resources.

License

XL-WSD is distributed under a non-commercial license. This is a human-readable summary of (and not a substitute for) the license (you should read the full terms and conditions of the XL-WSD license before using the material).

If You belong to a research institution, You are free to

  • Share — copy and redistribute the material in any medium or format by making it available only to research institutions (alternatively to sharing the offline indices under the above conditions, and much simpler, a link to the BabelNet website can be provided for download of the official data).
  • Adapt — remix, transform, and build upon the material, provided that you make explicit that what you release contains or “is a processed version of XL-WSD [APPROPRIATE_VERSION_HERE, e.g. v1] downloaded from https://sapienzanlp.github.io/xl-wsd/, made available with the XL-WSD license (see https://sapienzanlp.github.io/xl-wsd/docs/license)”. Alternatively, a link to the XL-WSD website can be provided for download of the official data and code for the creation of the Adaptation can be provided to any user. XL-WSD is distributed by a CC BY-NC Licence. Please refer to the full license for read all terms.