DLS course 5 week 1


I was wondering if anyone could concisely explain the difference between “picking a word” and “sampling a word” based off of probability of an RNN output. Because it definitely feels like we’re getting an output probability and picking a word to pass to the next time step t. But it appears that the terminology uses sampling a word.

I’m wondering if this is because in practice we actually have words getting selected but then they are constantly changing as more and more information comes in. It’s not as clean as it appears when you just get “The Cat Jumped On The Porch.” Like typing it and imagining it is just picking the next word but maybe sampling means it’d have to iterate through a few wrong words at each t step due to being wrong even with high probability.


Sampling means that you randomly select the word according to the probability that the word appears at the current time step in the sequence. For example, the snippet below generates 1k random samples with P(a) = 0.6, P(b) = 0.3, and P(c) = 0.1. This is the method used for “sampling from an RNN”.

>>> x = np.random.choice(['a','b','c'], size=1000, p=[0.6,0.3,0.1])
>>> c = collections.Counter(x)
>>> c
Counter({'a': 612, 'b': 295, 'c': 93})

On the other hand, randomly picking means that every word is equally likely. For example, the snippet below also generates 1k random values but with each value equally likely e.g. P(a) = 0.33, P(b) = 0.33, and P(c) = 0.333.

>>> x = np.random.choice(['a','b','c'], size=1000)
>>> c = collections.Counter(x)
>>> c
Counter({'c': 339, 'b': 339, 'a': 322})
1 Like

Thank you, that makes sense.