So temperature & Good-Turing Estimation both introduce getting random output over a language model. It is fair to say temperature is more so applied to large language models while Good-Turing estimation is applied to simple language models such as n-gram models?

I caught this because I have read Foundations of Statistical Natural Language Processing in past and these seem similar in nature. (Pg. 212 is what I am looking at if curious)

Hi @Ryan_Llamas

That is a good question but I think youâ€™re mixing up two different things.

Good-Turing Estimation is a statistical technique used to estimate the probability of observing *rare* events and it was used as a smoothing technique for rare n-grams (in combination with Katz backoff).

In other words, itâ€™s a â€śsmoothingâ€ť technique and is more concerned about rare events (rare n-grams would get smoother probabilities). While the temperature parameter helps with â€śsamplingâ€ť and is more concerned about the *most probable* events (if temperature is 0, we would â€śtakeâ€ť the most probable word every time, and if temperature is letâ€™s say 0.7, we would sample some most probable words according to their relative probabilities).

2 Likes