C5-W1-A2 struggling to compute idx

Hello,

I’ve been struggling a bit with UNQ_C2. After reading below what I did, can you please help me finding out what I missed or did wrong? Thank you for your help.

After computing y, y.shape = (27, 100)
Preliminary question: I don’t understand the 100, what is the intuition for that?

My piece of code is:
probs = y.ravel() # probs.shape = (2700,)
idx = np.random.choice(a = range(len(probs)), p = probs)

First issue:
The np.random.choice issues an error: “ValueError: probabilities do not sum to 1
Indeed, the softmax was operated on each of the 100 channels independently, then after flattening, the sum is ~100, not ~1

Then I tried to normalize:
probs = y.ravel() / n_a
idx = np.random.choice(a = range(len(probs)), p = probs)

This leads to the 2nd error.

Second issue:
The “IndexError: index 2354 is out of bounds for axis 0 with size 27” is outputted

Indeed, idx should be within [0, vocab_size[, not within [0, len(probs)[

In np.random.choice(a, p), both a and p shall have the same dimension. I don’t know how to reduce p to [0, vocab_size[ so that a cal also be reduced to [0, vocab_size[.

What I can do tho is to reduce the output of np.random.choice from [0, len(probs)[ to [0, vocab_size[

probs = y.ravel() / n_a
idx = np.random.choice(a = range(len(probs)), p = probs) % vocab_size

This leads to the 3rd issue.

Third issue:
An assert is broken “AssertionError: Wrong values

At this point I’m clueless.

Note: in another thread, some people mentioned that the softmax function was not properly working. Unless the result of the softmax sum has to be precisely 1.0, there seems it works quite well.

for i,_ in enumerate(y):
print(f’y_{i} = {y[:,i].sum()}')
y_0 = 0.9999999999999998
y_1 = 0.9999999999999998
y_2 = 0.9999999999999998
y_3 = 0.9999999999999998
y_4 = 0.9999999999999998
y_5 = 0.9999999999999998
y_6 = 0.9999999999999998
y_7 = 0.9999999999999998
y_8 = 0.9999999999999998
y_9 = 0.9999999999999998
y_10 = 0.9999999999999998
y_11 = 0.9999999999999998
y_12 = 0.9999999999999998
y_13 = 0.9999999999999998
y_14 = 0.9999999999999998
y_15 = 0.9999999999999998
y_16 = 0.9999999999999998
y_17 = 0.9999999999999998
y_18 = 0.9999999999999998
y_19 = 0.9999999999999998
y_20 = 0.9999999999999998
y_21 = 0.9999999999999998
y_22 = 0.9999999999999998
y_23 = 0.9999999999999998
y_24 = 0.9999999999999998
y_25 = 0.9999999999999998
y_26 = 0.9999999999999998

1 Like

Here are some hints:

  1. There are 26 alphabets and a new line character which makes vocab_size 27. y, which stands the probability of the next character given the current character and context so far should have shape (vocab_size, 1).
  2. ravel flattens and an array. The number of elements should remain the same before and after flattening.
  3. Softmax is the correct activation to use to compute the probabilities. In fact, it’s used across multi-class classficiation problems as the activation function of the final layer. Since softmax normalizes the values, there’s no need for further normalization.

Based on y.ravel() / n_a, please read the markdown cells from the start.

Hey @bblanc,
Let me try to give my 2 cents as well, in addition to @balaji.ambresh’s excellent response.

Here, this 100 is nothing but representing n_a, as you can see in the sample_test function. n_a represents the number of hidden units in a RNN cell.


If you are getting the above shape, then I believe you have done a mistake before this point, for starters. This is because y represents the probabilities over the possible characters, which are 27 in number. Here, the shape of y should be (27, 1).

Please make sure that you have initialized x and a_prev correctly. Also, please ensure that you have correctly computed a and z inside the while loop.


Once you solve your mistake that I mentioned above, your probabilities should sum up to 1.


There is no such thing as channels here. I guess you are referring to the hidden units as “channels”. If you feel confused about how different units in a single RNN cells work, I urge you to go through the lecture videos once again.


Since, the probabilities will sum up to 1, you won’t need to normalize, and you won’t get to the second error, and in turn, it won’t lead you to the third error.

Let us know if this helps.

Cheers,
Elemento

Here is what happened.
Everything was initialized properly for a_prev and x, and properly coded for x, a, y and idx (without any normalization).

The shape trace below is actually highlighted the issue over a couple of iterations:
Iteration #1:
Wax: (100, 27)
x: (27, 1)
Waa: (100, 100)
a_prev: (100, 1)
b: (100, 1)
Wya: (100, 27)
a: (100, 1)
by: (27, 1)
z: (27, 1)
y: (27, 1) ← correct shape

Iteration #2:
Wax: (100, 27)
x: (27,) ← bad!!!
Waa: (100, 100)
a_prev: (100, 1)
b: (100, 1)
Wya: (100, 27)
a: (100, 100)
by: (27, 1)
z: (27, 100) ← incorrect shape
y: (27, 100) ← incorrect shape

The update of x was erroneous, after I corrected x to the proper shape (vocab_size,1), I got the correct trace:
Iteration #1:
Wax: (100, 27)
x: (27, 1)
Waa: (100, 100)
a_prev: (100, 1)
b: (100, 1)
Wya: (100, 27)
a: (100, 1)
by: (27, 1)
z: (27, 1)
y: (27, 1)
Iteration #2:
Wax: (100, 27)
x: (27, 1) ← correct shape
Waa: (100, 100)
a_prev: (100, 1)
b: (100, 1)
Wya: (100, 27)
a: (100, 1)
by: (27, 1)
z: (27, 1) ← correct shape
y: (27, 1) ← correct shape

Consequently, the code properly completed.

2 Likes

In my case also y had the shape of (27,100).
Thank you for the help. Kudos.

Yup, can confirm. I also didn’t initialized x properly.