Week 1 Dinosaurus Island : random.choice - same result?

In the instruction it says that "If you select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, use np.random.choice". And then instruction shows the usage of p=probs.

But when we supply p=probs, is it not forcing the same result ? A word given as input will have the same probability distribution for the next word. Right ?

I tried the printing the code snippet in the instruction, and it gives the same results no matter how many times you run it.

np.random.seed(5)
probs = np.array([0.1, 0.0, 0.7, 0.2])
idx = np.random.choice(range(len(probs)), p = probs)
print(idx)

Can somebody explain how its supposed to work ?

The point is that np.random.choice will not do the same thing each time. As long as you don’t set the random seed everytime, you don’t always get the same answer even with the same inputs. Delete the setting of the seed there and then run the cell multiple times and watch what happens. In a ā€œrealā€ application you would never set the random seed for exactly this reason. The reason they do it everyplace in these notebooks is just for ease of writing test cases that produce results that can be checked.

Update: I tried the experiment and it is true that you don’t always get the same answer, but with the probability of 2 being .7, you do get quite a lot of 2s. :nerd_face:

1 Like

Here’s a better version of the experiment:

probs = np.array([0.1, 0.0, 0.7, 0.2])
totals = np.zeros((len(probs)))
for ii in range(100):
    idx = np.random.choice(range(len(probs)), p = probs)
    totals[idx] += 1
    if idx != 2:
        print(f"iter {ii} idx {idx}")
print(f"totals = {totals}")

When I run that, here’s what I get:

iter 0 idx 0
iter 1 idx 3
iter 5 idx 0
iter 11 idx 3
iter 13 idx 3
iter 15 idx 0
iter 17 idx 0
iter 19 idx 3
iter 23 idx 3
iter 25 idx 3
iter 28 idx 3
iter 44 idx 0
iter 48 idx 3
iter 49 idx 3
iter 56 idx 3
iter 58 idx 3
iter 63 idx 3
iter 64 idx 0
iter 65 idx 0
iter 74 idx 0
iter 75 idx 3
iter 77 idx 3
iter 78 idx 0
iter 81 idx 0
iter 82 idx 3
iter 83 idx 3
iter 90 idx 3
iter 91 idx 3
iter 92 idx 3
iter 94 idx 3
iter 95 idx 0
iter 98 idx 3
totals = [11.  0. 68. 21.]

Notice a couple of things there:

  1. We get no 1s at all, because the probability of that is 0.
  2. We’re doing (simulated) randomness here, so the behavior is statistical: the actual observed frequencies don’t exactly match the probabilities, but they are in the ballpark. If you run the above cell more than once, you won’t get the same sequence or the same frequencies every time.

The ā€œmetaā€ lesson here is that python is an interactive language. You don’t have to just wonder what something does: you can try it and see. :nerd_face:

2 Likes

let input : abgz

why our model will predict ā€˜b’ after predicting ā€˜a’ chosing it randomly?

I’m not sure I understand the question, but I demonstrated above how np.random.choice works. You can also read the documentation for more information.