In week 4 programming assignment EX-3 what should be the initial value of dk which is the reducing parameter so that softmax does not burst up
See this markdown text for definition of \sqrt{d_k}:
- Scale your embedding by multiplying it by the square root of your embedding dimension. Remember to cast the embedding dimension to data type
tf.float32
before computing the square root.
1 Like
sorry sir but where is this markdown text and what value should i assign for dk
You have to use the cast
and shape
from TensorFlow. Like tf.cast(tf.shape(...)...)
. Please check this guide.
1 Like
Thanks a lot sir. by the way , just curious was it explained in the lecture or was it meant to be read from paper