Dataset
We have chosen to split the dataset in two gzipped files. You can download the images from Wikipedia by using the "Images" links above and all the textual part by clicking on the "Text" links.
First, the dataset consists of:
- images from Wikipedia;
- captions created by humans in the English language;
- captions created by humans in at least one of the following languages: Portuguese, Spanish, French or German;
- automatic translations of these captions, from English into at least one of the four languages mentioned above;
- the URL of an ImageNet image obtained by means of the application of a bag-of-visual-words method (on Wikipedia's and ImageNet's images), selecting the closest image in ImageNet to each of the Wikipedia's images.
The textual part of this dataset consists of the following files:
- metadata.final.file-found
- local path to the Wikipedia image
- reference.final.file-found
- Wikipedia image caption in the reference language
- source.final.file-found
- Wikipedia image caption in the source language, i.e. English
- automatic-translation.final.file-found
- Wikipedia image caption in the reference language, obtained by machine translation of the source caption
- imagenet.final.file-found
- path to the ImageNet image entry online
The five files described above contain information for one sentence (image caption) per line.