Exercise 3 instruction is not informative at all, does not help and bring lots of frustration!
- what does
scale_matmul_qk
really mean? isn’t it easier to saydk = tf.shape(k)[-1] # seq_len_k
which btw returns error! - There were no clear explanations regarding
Multiply (1. - mask) by -1e9 before applying the softmax
. Why is it exactly(1. - mask)*-1e9
and not justmask*-1e9
? - Could have been better to add
tf.nn.sotfmax(..., axis=...)
in the additional hints to remind using softmax from tensorflow! - Could have been better to add
tf.cast(x, dtype, name=None)
in the additional hints to explain why it is required to changedk
’s type to ignore theInvalidArgumentError: Value for attr 'T' of int32 is not in the list of allowed values
!
Cheers,