W1, A2, Ex.2 I don't understand Step 4

Miquel_Ferre · July 13, 2023, 3:03pm

I have been trying to understand and test options for the following definitions:

 # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
        x =
        x[idx] =

As instructed, I have generated a one-hot vector of zeros and length of vocab_size. Then assigned every x of that index to 1.

I am having wrong values as outputs. Actually I am having a list of 50 sampling characters, so in no point of my output there’s a /n to delimitate the generated word.

If it is necessary, my lab ID is dwdsgjqfrckq. Thank you in advance!

Kic · July 13, 2023, 4:17pm

Hi @Miquel_Ferre ,

The indexError is raised because the size of x is found to be 23 in your case. In python, the indexing starting from 0, if the size of a vector is 23, then the last index should be 22. Accessing elements outside of this limit would cause indexError.
You need to trace back the code and find out why the size of x is 23. This vector, x, is the one-hot vector, it should has the size of the vacab_size.

Miquel_Ferre · July 15, 2023, 4:32pm

Hello, I don’t have an index error anymore, but I just got the wrong values. I guess it’s because the formula for 𝑎⟨𝑡+1⟩ is wrong, because I had to transpose the weights matrix in order to work out, otherwise I get the kind of error shapes (27,100) and (27,1) not aligned: 100 (dim 1) != 27 (dim 0).
My expression for 𝑎⟨𝑡+1⟩ is the following, but in the instruction there’s no transposed matrix:

a = np.tanh(np.dot(Wya.T, x) + np.dot(Waa.T, a_prev) + b)

As I said this is the only way I found in order to make the product. I guess the operand is dot, but I also tried with element-wise product multiply but neither work.

paulinpaloalto · July 15, 2023, 7:25pm

They gave you the formula in the text of the assignment:

a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}

That’s not what you implemented, right? You are correct that the operations between the weight matrices and the vectors are dot products, but you have to use the correct weight matrices in order for it to work. No transposes should be required.

I added some print statements to my code to show the shapes and here’s what I see:

Wax (100, 27) x (27, 1) Waa (100, 100) a_prev (100, 1)
Wya (27, 100) x (100, 1) + by (27, 1)
y.shape (27, 1)
len(y) 27
len(y.ravel()) 27
type(y.ravel()) <class 'numpy.ndarray'>

Topic		Replies	Views
C5 W1 P2 - Dinosaur Sequence Models	1	686	July 30, 2021
Week1, Dinosaurus, exercise2,sample, error Sequence Models	2	822	September 2, 2021
Week 1 Assignment 2 Exercise 2 - sample Sequence Models week-1	13	1899	August 29, 2024
DLS C5 W1 A2 - Help with ValueError: 'a' and 'p' must have same size Sequence Models week-1	7	36	January 20, 2025
Sequence model W1 Dinosaur Island-Character-Level Language Modeling Sequence Models week-1	4	279	February 3, 2024

W1, A2, Ex.2 I don't understand Step 4

Related topics