Applying interpolation on data to get mix and match

The limitation of using an additive smoothed bigram is that we back off to a state of ignorance when we deal with rare text. For example, the word captivating occurs five times in a training data: thrice followed by by and twice followed by the. With additive smoothing, the occurrence of a and new before captivating is the same. Both the occurrences are plausible, but the former is more probable as compared to latter. This problem can be rectified using unigram probabilities. We can develop an interpolation model in which both the unigram and bigram probabilities can be combined.

In SRILM, we perform interpolation by first training a unigram model with -order 1 and –order 2 used for the bigram model:

ngram - count - text / home / linux / ieng6 / ln165w / public / data / engand hintrain . txt  - vocab / home / linux / ieng6 / ln165w / public / data / engandhinlexicon . txt  - order 1 - addsmooth 0.0001 - lm wsj1 . lm
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset