Week1-Assignment2-Sampling-np.random.choice()

Hello,

It is mentioned that we don’t want to do sampling using a uniform distribution as it will give random and unwanted results. We also don’t want to use the index with maximum probability distribution as it will make our result repititive. That’s why in code we use np.random.choice().
However, in that function, we provide 2 arguments:

  1. Array
  2. probability function
    Since we have not provided the size argument it will return only 1 value.

My thought process is that the choice function will ultimately return the max prob value only. Then what’s the point of using random.choice()?

Is my understanding incorrect?

Thanks!

I think random.choice() based on a probability distribution works quite well. Otherwise, it will be like what you say…

We also don’t want to use the index with maximum probability distribution as it will make our result repetitive.

Let’s look at the result of our unit test for sample(), and see how actual index was selected from a probability distribution given by Softmax. The output from sample() was;

Sampling:
list of sampled indices:
 [23, 16, 26, 26, 24, 3, 21, 1, 7, 24, 15, 3, 25, 20, 6, 13, 10, 8, 20, 12, 2, 0]

Then, I add a few lines to log probability distributions, and show how those indices were selected from given probability distributions.

Start from the top-left, and go to the right. Then, one row down, and start from the left.
See 3rd, 5th, 10th,… You can see np.random.choice() selected an index which is not the maximum value. It’s working well.

If we select the maximum value, the result of Exercise 4 will be;

j =  0 idx =  0
single_example = turiasaurus
single_example_chars ['t', 'u', 'r', 'i', 'a', 's', 'a', 'u', 'r', 'u', 's']
single_example_ix [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19]
 X =  [None, 20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19] 
 Y =        [20, 21, 18, 9, 1, 19, 1, 21, 18, 21, 19, 0] 

Iteration: 0, Loss: 23.087336

Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu


j =  1535 idx =  1535
j =  1536 idx =  0
Iteration: 2000, Loss: 27.884160

Aurus
Aurus
Aurus
Aurus
Aurus
Aurus
Aurus


Iteration: 4000, Loss: 25.901815

Angosaurus
Angosaurus
Angosaurus
Angosaurus
Angosaurus
Angosaurus
Angosaurus

This is not an interesting result.

Hope this clarifies.