I need to create a Hindi to English translation system using MOSES. I have got a parallel corpora containing about 10000 Hindi sentences and corresponding English translations. I followed the method described in the Baseline system creation page. But, just in the first stage, when I wanted to tokenise my Hindi corpus and tried to execute
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l hi < ~/corpus/training/hi-en.hi> ~/corpus/hi-en.tok.hi
, the tokeniser gave me the following output:
Tokenizer Version 1.1
Language: hi
Number of threads: 1
WARNING: No known abbreviations for language 'hi', attempting fall-back to English version...
I even tried with 'hin'
but it still didn't recognise the language. Can anyone tell the correct way to make the translation system.