The BabelPic Project

A multimodal dataset for non-concrete concepts

Logo

BabelPic

Welcome to the page of BabelPic, a project of the Sapienza NLP Group, developed with the support of the awesome MOUSSE ERC project!

The BabelPic Dataset

BabelPic is a dataset targeting non-concrete concepts, built by cleaning the image-synset associations found within BabelNet, a large multilingual encyclopedic dictionary. Our dataset was annotated manually and then extended using an automatic concept verification technique that exploits VLP.

To learn more, read our paper:

Agostina Calabrese, Michele Bevilacqua, and Roberto Navigli. 2020. Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts. Proceedings of ACL, pp. 4680-4686

The EViLBERT Sense Embeddings

EViLBERT embeddings are multimodal sense embeddings, which encode jointly gloss and image features, using as data the image-synset pairs in BabelPic.

For more details, we invite you to read the paper:

Agostina Calabrese, Michele Bevilacqua, and Roberto Navigli. 2020. EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings Proceedings of IJCAI, pp. 481-487

Download

We release our dataset as a.tar.gz archive containing the images in BabelPic. The filenames point to the corresponding BabelNet ids.

Download the gold BabelPic dataset at:

Download the silver BabelPic dataset at:

Download the silver extension produced for EViLBERT (which also contains concrete concepts):

Download EViLBERT multimodal embeddings:

Statistics

BabelPic

Dataset Images Synsets
Gold 14,931 2,733
Silver [split 0-3] 65,497 10,013
Silver [split 0-13] 327,248 42,769

EViLBERT

Embeddings Synsets
EViLBERT 45,312

Reference

If you use our data, please cite our papers:

BabelPic

@inproceedings{calabrese-etal-2020-fatality,
    title = "Fatality Killed the Cat or: {B}abel{P}ic, a Multimodal Dataset for Non-Concrete Concepts",
    author = "Calabrese, Agostina  and
      Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.425",
    pages = "4680--4686",
}

EViLBERT

@inproceedings{calabrese-etal-2020-evilbert,
  title     = "{EV}i{LBERT}: {L}earning Task-Agnostic Multimodal Sense Embeddings",
  author    = "Calabrese, Agostina and Bevilacqua, Michele and Navigli, Roberto",
  booktitle = "Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}",
  publisher = "International Joint Conferences on Artificial Intelligence Organization",             
  pages     = "481--487",
  year      = "2020",
  month     = "jul",
  url       = "https://doi.org/10.24963/ijcai.2020/67"
}

Authors

Agostina Calabrese Michele Bevilacqua Roberto Navigli
@agostina_cal @MicheleBevila20 @rnavigli

License

Both the BabelPic dataset and the EViLBERT embeddings are released under the CC-BY-NC 4.0 license.

Acknowledgement

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.