Home:ALL Converter>Statistical Machine Translation from Hindi to English using MOSES

Statistical Machine Translation from Hindi to English using MOSES

Ask Time:2014-12-28T01:01:58         Author:AvinashK

Json Formatter

I need to create a Hindi to English translation system using MOSES. I have got a parallel corpora containing about 10000 Hindi sentences and corresponding English translations. I followed the method described in the Baseline system creation page. But, just in the first stage, when I wanted to tokenise my Hindi corpus and tried to execute

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l hi < ~/corpus/training/hi-en.hi> ~/corpus/hi-en.tok.hi

, the tokeniser gave me the following output:

Tokenizer Version 1.1
Language: hi
Number of threads: 1
WARNING: No known abbreviations for language 'hi', attempting fall-back to English version...

I even tried with 'hin' but it still didn't recognise the language. Can anyone tell the correct way to make the translation system.

Author:AvinashK,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/27669446/statistical-machine-translation-from-hindi-to-english-using-moses
yy