All above unit tests passed. But my codes on full encoder below just cannot get correct values. Error is “AssertionError: Wrong values case 1”.
{Moderator’s Edit: Solution Code Removed}
All above unit tests passed. But my codes on full encoder below just cannot get correct values. Error is “AssertionError: Wrong values case 1”.
{Moderator’s Edit: Solution Code Removed}
Hey @littlestrong,
Welcome to the community. If you take a close look at the markdown cell just above this code cell, in the 3rd instruction, you will find that, you have to add;
self.pos_encoding[:, :seq_len, :]
However, you have added;
self.pos_encoding[:, seq_len, :]
You see the difference? In the second position where you are indexing the positions, you are supposed to use a range of indices, but you are only using a single index. So, instead of returning seq_len
different positional encodings, it will return only a single positional encoding, the one at the index seq_len
. I hope this helps.
P.S. - It is against the community guidelines to post the solution code publicly. If any mentor needs to take a look at your code, he/she will ask you to DM your code.
Cheers.
Elemento
Oh sorry for posting them. Thanks for answering my question. But wrong codes will give unmatched shape, right? How can following add operation execute with that?
Hey @littlestrong,
Not necessarily! Here comes the little concept of broadcasting. If you use self.pos_encoding[:, seq_len, :]
, it will return an output having dimensions (1, embedding_dim)
which is added to x
having dimensions (batch_size, input_seq_len, embedding_dim)
. Since, the third dimension match exactly for both the variables, Python will simply broadcast the positional encoding in the first and second dimension, and will perform the addition. Let me present you with an example. Take a look at the following code:
import numpy as np
# x has dimensions (batch_size, seq_len, embedding_dim)
x = np.zeros((5, 3, 4))
print(x.shape)
seq_len = x.shape[1]
# Let's say pe is the `self.pos_encoding`
pe = np.random.rand(1, 5, 4)
print(pe.shape)
## First case
selection = pe[:, seq_len, :]
result = x + selection
print(selection.shape, result.shape)
## Second case
selection = pe[:, :seq_len, :]
result = x + selection
print(selection.shape, result.shape)
This leads to the following output
print(x.shape) -> (5, 3, 4)
print(pe.shape) -> (1, 5, 4)
print(selection.shape, result.shape) -> (1, 4) (5, 3, 4)
print(selection.shape, result.shape) -> (1, 3, 4) (5, 3, 4)
So you see the addition works perfectly fine in both the cases, and the output is also having the same dimensions in both the cases. However, the outputs will indeed be different from each other, since the matrices that are added to each other are different in both the cases. Let me know if this helps.
Cheers,
Elemento
Hey @littlestrong,
I had a misunderstanding in my previous reply. I have updated my post. Please do take a look at it. Thanks to @anon57530071 for letting me know of my misunderstanding.
Cheers,
Elemento