Using Images as Context in Statistical Machine Translation


In this project we investigate whether information from images can be helpful as context for statistical machine translation (SMT). We target two well-known challenges in SMT, particularly when it is used to translate short texts: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslated). In order to do so, we automatically built a dataset containing:

  • Images from Wikipedia
  • Image captions in English
  • The machine translations of the captions into Portuguese, Spanish, German or French
  • A human (reference) translation as found in Wikipedia for each English caption
  • A similar image retrieved from ImageNet using basic computer vision methods
  • Keywords from the WordNet synset associated with the retrieved image


In order to understand whether exploiting each of these sources of information is worthwhile, we will collect human judgements on a sample of this dataset.

Cover template for Bootstrap, by @mdo.