Professional Documents
Culture Documents
Back-Off and Interpolation in Language Modeling
Back-Off and Interpolation in Language Modeling
In N-gram models, the probability of a word sequence is often estimated using the chain rule of
probability:
P(wn∣wn−1,...,w1)=λ1×P1(wn∣wn−1,...,w1)+λ2×P2(wn∣wn−1,...,w1)+...+λk×Pk(wn
∣wn−1,...,w1)
Here, Pi(wn∣wn−1,...,w1) represents the probability from the i-th language model, and λi is
the weight assigned to the i-th model. The weights are typically determined through training on
a held-out dataset.