C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention

I’m having major problems with this assignment. Mainly because I’m not a tensorflow guru and the tensorlfow online documentation is one of the worst documentation I have seen in my 25 years as a software developer.

Currently I have these problems, this is my code:

mentor edit: code removed

assert tf.is_tensor(attention), “Output must be a tensor”

Please help. Also, could the assignment have a little more handholding where TF is concerned. Thank you and best regards.

Starting with:
In “matmul_qk = …”, you need to transpose the ‘k’ matrix.
From the instructions:


In “attention_weights = …”, I recommend you use tf.nn.softmax() on the scaled_attention_logits.

In “output = …”, I recommend you use tf.linalg.matmul(), with the appropriate axis argument.

Tf.linalg.matmul has no axis argument

Also, now when I do:
{mentor edit: code removed}

I get the error:
ipython-input-26-1d9a3bc7cf92> in scaled_dot_product_attention(q, k, v, mask)
31 if mask is not None: # Don’t replace this None
32 #scaled_attention_logits += (1 - mask) * -1.0e9
—> 33 scaled_attention_logits += mask
35 # softmax is normalized on the last axis (seq_len_k) so that the scores

ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)

Tf.linalg.matmul has no axis argument

You are correct, my mistake.

The line of code you commented out is correct.

I uncommented the line of code but I still get the broadcast error.

I am currently using this:

{mentor edit: code removed}

I was stuck in this part for a while - I use the tf.matmul() function and it works.

1 Like

Thanks. That totally fixed it.

Sorry, I missed catching the use of np.matmul().