Difference Between Temperature and Good Turing Estimation

So temperature & Good-Turing Estimation both introduce getting random output over a language model. It is fair to say temperature is more so applied to large language models while Good-Turing estimation is applied to simple language models such as n-gram models?

I caught this because I have read Foundations of Statistical Natural Language Processing in past and these seem similar in nature. (Pg. 212 is what I am looking at if curious)

Hi @Ryan_Llamas

That is a good question but I think you’re mixing up two different things.

Good-Turing Estimation is a statistical technique used to estimate the probability of observing rare events and it was used as a smoothing technique for rare n-grams (in combination with Katz backoff).
In other words, it’s a “smoothing” technique and is more concerned about rare events (rare n-grams would get smoother probabilities). While the temperature parameter helps with “sampling” and is more concerned about the most probable events (if temperature is 0, we would “take” the most probable word every time, and if temperature is let’s say 0.7, we would sample some most probable words according to their relative probabilities).

2 Likes