Doubt IN Week 4 transformer programming assignment

In week 4 programming assignment EX-3 what should be the initial value of dk which is the reducing parameter so that softmax does not burst up

See this markdown text for definition of \sqrt{d_k}:

  1. Scale your embedding by multiplying it by the square root of your embedding dimension. Remember to cast the embedding dimension to data type tf.float32 before computing the square root.
1 Like

sorry sir but where is this markdown text and what value should i assign for dk

You have to use the cast and shape from TensorFlow. Like tf.cast(tf.shape(...)...). Please check this guide.

1 Like

Thanks a lot sir. by the way , just curious was it explained in the lecture or was it meant to be read from paper

This is explained in the Notebook and maybe in the lecture as well.

2 Likes