Week 2 Assignment | Gumbel Distribution Use-Case

Hey Guys,
Towards the end of Week 2’s Assignment, it introduces the Gumbel Distribution, and uses it to predict the characters. However, our model produces results in the form of probability distributions over the entire vocabulary.

So, instead of using the function gumbel_sample, why don’t we simply pick the character which is the most likely one? And if we want to generate different sequences for the same prefix, then we can pick characters as per the probability distributions produced, i.e., each character will be picked as per it’s probability.

I don’t understand the use case of the function gumbel_sample in this, like does it produce better results anyhow, and if yes, then how?


Hey @Elemento

One extreme - to pick the most likely character.
Another extreme - to pick according to all the probabilities from the model.
One more extreme - to pick totally randomly.

Temperature lets you pick between these extremes:

  • if you set temperature = 0, then you pick always the same most likely character (first case)
  • if you set temperature = 1, then you pick more randomly
  • if you set temperature = 10, then you pick almost uniformly randomly

You can try by setting the parameter yourself.
What gumbel_sample does it adjusts the probabilities for the prediction.


1 Like