Positional Embedding: C5W4 Ex2 Deciding the shape of pos embeddings


I was going over the implementation of the positional embeddings.

Wherein the test example,

# Example
position = 4
d_model = 8
pos_m = np.arange(position)[:, np.newaxis]
dims = np.arange(d_model)[np.newaxis, :]
get_angles(pos_m, dims, d_model)

Is used to get the angles, my question has been on the choice of how we are deciding the dimensions of the pos_m argument and the dims argument. ie how did we decide the new axis for the pos_m should make the shape of pos_m be len(range(pos_m)),1 and dims be 1, len(range(d))

Following the note in the exercise
Note: In the lectures Andrew uses vertical vectors, but in this assignment all vectors are horizontal. All matrix multiplications should be adjusted accordingly.

My presumption was that then all vectors should be horizontal vectors, why is that not the case?

We want the final encoding of a row of input vector of word ids to be of form (num words, encoding dimension) in shape.
As far as the test is concerned, we want to encode a row containing 4 words where each word position is represented by an embedding of length 8.

The size of encoding is determined at the time of building the model. See this link on bert models to observe diffferent variations of the model created with different embedding dimensions.

Encoding size is kept fixed to ensure that we can make use of parallelization on GPU instead of encoding every row individually.

1 Like

@balaji.ambresh That makes sense, I suppose I still don’t understand why the dims.shape isn’t (1, len(range(d_model)) but instead (len(range(d_model), 1).

Please see how broadcasting applies to calculation of angles.