A multimodal dataset for non-concrete concepts
Welcome to the page of BabelPic, a project of the Sapienza NLP Group, developed with the support of the awesome MOUSSE ERC project!
BabelPic is a dataset targeting non-concrete concepts, built by cleaning the image-synset associations found within BabelNet, a large multilingual encyclopedic dictionary. Our dataset was annotated manually and then extended using an automatic concept verification technique that exploits VLP.
To learn more, read our paper:
Agostina Calabrese, Michele Bevilacqua, and Roberto Navigli. 2020. Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts. Proceedings of ACL, pp. 4680-4686
EViLBERT embeddings are multimodal sense embeddings, which encode jointly gloss and image features, using as data the image-synset pairs in BabelPic.
For more details, we invite you to read the paper:
Agostina Calabrese, Michele Bevilacqua, and Roberto Navigli. 2020. EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings Proceedings of IJCAI, pp. 481-487
We release our dataset as a.tar.gz
archive containing the images in BabelPic. The filenames point to the corresponding BabelNet ids.
Download the gold BabelPic dataset at:
.tar.gz
(Google Drive)Download the silver BabelPic dataset at:
.tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive)Download the silver extension produced for EViLBERT (which also contains concrete concepts):
.tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive).tar.gz
(Google Drive)Download EViLBERT multimodal embeddings:
.vec
(Google Drive)Dataset | Images | Synsets |
---|---|---|
Gold | 14,931 | 2,733 |
Silver [split 0-3] | 65,497 | 10,013 |
Silver [split 0-13] | 327,248 | 42,769 |
Embeddings | Synsets |
---|---|
EViLBERT | 45,312 |
If you use our data, please cite our papers:
@inproceedings{calabrese-etal-2020-fatality,
title = "Fatality Killed the Cat or: {B}abel{P}ic, a Multimodal Dataset for Non-Concrete Concepts",
author = "Calabrese, Agostina and
Bevilacqua, Michele and
Navigli, Roberto",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.425",
pages = "4680--4686",
}
@inproceedings{calabrese-etal-2020-evilbert,
title = "{EV}i{LBERT}: {L}earning Task-Agnostic Multimodal Sense Embeddings",
author = "Calabrese, Agostina and Bevilacqua, Michele and Navigli, Roberto",
booktitle = "Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}",
publisher = "International Joint Conferences on Artificial Intelligence Organization",
pages = "481--487",
year = "2020",
month = "jul",
url = "https://doi.org/10.24963/ijcai.2020/67"
}
Agostina Calabrese | Michele Bevilacqua | Roberto Navigli |
@agostina_cal | @MicheleBevila20 | @rnavigli |
Both the BabelPic dataset and the EViLBERT embeddings are released under the CC-BY-NC 4.0 license.
The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.