In week 4 programming assignment EX-3 what should be the initial value of dk which is the reducing parameter so that softmax does not burst up

See this markdown text for definition of \sqrt{d_k}:

- Scale your embedding by multiplying it by the square root of your embedding dimension. Remember to cast the embedding dimension to data type
`tf.float32`

before computing the square root.

1 Like

sorry sir but where is this markdown text and what value should i assign for dk

You have to use the `cast`

and `shape`

from TensorFlow. Like `tf.cast(tf.shape(...)...)`

. Please check this guide.

1 Like

Thanks a lot sir. by the way , just curious was it explained in the lecture or was it meant to be read from paper