In the instruction it says that "If you select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, use np.random.choice". And then instruction shows the usage of p=probs.
But when we supply p=probs, is it not forcing the same result ? A word given as input will have the same probability distribution for the next word. Right ?
I tried the printing the code snippet in the instruction, and it gives the same results no matter how many times you run it.
The point is that np.random.choice will not do the same thing each time. As long as you donāt set the random seed everytime, you donāt always get the same answer even with the same inputs. Delete the setting of the seed there and then run the cell multiple times and watch what happens. In a ārealā application you would never set the random seed for exactly this reason. The reason they do it everyplace in these notebooks is just for ease of writing test cases that produce results that can be checked.
Update: I tried the experiment and it is true that you donāt always get the same answer, but with the probability of 2 being .7, you do get quite a lot of 2s.
probs = np.array([0.1, 0.0, 0.7, 0.2])
totals = np.zeros((len(probs)))
for ii in range(100):
idx = np.random.choice(range(len(probs)), p = probs)
totals[idx] += 1
if idx != 2:
print(f"iter {ii} idx {idx}")
print(f"totals = {totals}")
When I run that, hereās what I get:
iter 0 idx 0
iter 1 idx 3
iter 5 idx 0
iter 11 idx 3
iter 13 idx 3
iter 15 idx 0
iter 17 idx 0
iter 19 idx 3
iter 23 idx 3
iter 25 idx 3
iter 28 idx 3
iter 44 idx 0
iter 48 idx 3
iter 49 idx 3
iter 56 idx 3
iter 58 idx 3
iter 63 idx 3
iter 64 idx 0
iter 65 idx 0
iter 74 idx 0
iter 75 idx 3
iter 77 idx 3
iter 78 idx 0
iter 81 idx 0
iter 82 idx 3
iter 83 idx 3
iter 90 idx 3
iter 91 idx 3
iter 92 idx 3
iter 94 idx 3
iter 95 idx 0
iter 98 idx 3
totals = [11. 0. 68. 21.]
Notice a couple of things there:
We get no 1s at all, because the probability of that is 0.
Weāre doing (simulated) randomness here, so the behavior is statistical: the actual observed frequencies donāt exactly match the probabilities, but they are in the ballpark. If you run the above cell more than once, you wonāt get the same sequence or the same frequencies every time.
The āmetaā lesson here is that python is an interactive language. You donāt have to just wonder what something does: you can try it and see.
Iām not sure I understand the question, but I demonstrated above how np.random.choice works. You can also read the documentation for more information.