W1, A2, Ex.2 I don't understand Step 4

I have been trying to understand and test options for the following definitions:

 # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
        x =
        x[idx] = 

As instructed, I have generated a one-hot vector of zeros and length of vocab_size. Then assigned every x of that index to 1.

I am having wrong values as outputs. Actually I am having a list of 50 sampling characters, so in no point of my output there’s a /n to delimitate the generated word.

If it is necessary, my lab ID is dwdsgjqfrckq. Thank you in advance!

Hi @Miquel_Ferre ,

The indexError is raised because the size of x is found to be 23 in your case. In python, the indexing starting from 0, if the size of a vector is 23, then the last index should be 22. Accessing elements outside of this limit would cause indexError.
You need to trace back the code and find out why the size of x is 23. This vector, x, is the one-hot vector, it should has the size of the vacab_size.

Hello, I don’t have an index error anymore, but I just got the wrong values. I guess it’s because the formula for 𝑎⟨𝑡+1⟩ is wrong, because I had to transpose the weights matrix in order to work out, otherwise I get the kind of error shapes (27,100) and (27,1) not aligned: 100 (dim 1) != 27 (dim 0).
My expression for 𝑎⟨𝑡+1⟩ is the following, but in the instruction there’s no transposed matrix:

a = np.tanh(np.dot(Wya.T, x) + np.dot(Waa.T, a_prev) + b)

As I said this is the only way I found in order to make the product. I guess the operand is dot, but I also tried with element-wise product multiply but neither work.

They gave you the formula in the text of the assignment:

a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}

That’s not what you implemented, right? You are correct that the operations between the weight matrices and the vectors are dot products, but you have to use the correct weight matrices in order for it to work. No transposes should be required.

I added some print statements to my code to show the shapes and here’s what I see:

Wax (100, 27) x (27, 1) Waa (100, 100) a_prev (100, 1)
Wya (27, 100) x (100, 1) + by (27, 1)
y.shape (27, 1)
len(y) 27
len(y.ravel()) 27
type(y.ravel()) <class 'numpy.ndarray'>