Using Images as Context in Statistical Machine Translation
In this project we investigate whether information from images can be helpful as context for statistical machine translation (SMT). We target two well-known challenges in SMT, particularly when it is used to translate short texts: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslated). In order to do so, we automatically built a dataset containing:
In order to understand whether exploiting each of these sources of information is worthwhile, we will collect human judgements on a sample of this dataset.