User Tools

Site Tools


gideon

Bilingual Markov Reordering Labels for Hierarchical SMT

Hierarchical statistical machine translation (Hiero) can be improved by means of bilingual reordering labels derived from word alignments. Hiero rules lack nonterminal labels. This gives them little context and makes their combination into full translations poorly coordinated, and strongly dependent on the language model. In the work for my thesis, bilingual labels were added to Hiero rules. These bilingual labels lead to more coherent translations with better word order, as demonstrated by extensive experiments on three language pairs. The proposed labels require no syntactic information, and use only the information from word alignments. This distinguishes them from various types of syntactic labels earlier proposed in the literature.

Bilingual labels are based on a newly proposed framework called hierarchical alignment trees (HATs). HATs are bilingual trees that represent the hierarchical translation equivalence structure induced from word alignments. HATs maximally decompose word alignments into phrase pairs, and provide an explicit description of the local reordering taking place within each phrase pair.

My experiments demonstrate that bilingual labels can not only improve Hiero, they are also competitive with syntactic labels. A final important outcome of the research is the importance of using a soft approach to label matching for getting the best results.

gideon.txt · Last modified: 2019/02/06 16:03 (external edit)