C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention

Hello,
I’m having major problems with this assignment. Mainly because I’m not a tensorflow guru and the tensorlfow online documentation is one of the worst documentation I have seen in my 25 years as a software developer.

Currently I have these problems, this is my code:

mentor edit: code removed

Error:
assert tf.is_tensor(attention), “Output must be a tensor”

Please help. Also, could the assignment have a little more handholding where TF is concerned. Thank you and best regards.

Starting with:
In “matmul_qk = …”, you need to transpose the ‘k’ matrix.
From the instructions:
image

2 Likes

In “attention_weights = …”, I recommend you use tf.nn.softmax() on the scaled_attention_logits.

In “output = …”, I recommend you use tf.linalg.matmul(), with the appropriate axis argument.

Tf.linalg.matmul has no axis argument

Also, now when I do:
{mentor edit: code removed}

I get the error:
ipython-input-26-1d9a3bc7cf92> in scaled_dot_product_attention(q, k, v, mask)
31 if mask is not None: # Don’t replace this None
32 #scaled_attention_logits += (1 - mask) * -1.0e9
—> 33 scaled_attention_logits += mask
34
35 # softmax is normalized on the last axis (seq_len_k) so that the scores

ValueError: non-broadcastable output operand with shape (3,4) doesn’t match the broadcast shape (1,3,4)

Tf.linalg.matmul has no axis argument

You are correct, my mistake.

The line of code you commented out is correct.

I uncommented the line of code but I still get the broadcast error.

I am currently using this:

{mentor edit: code removed}

I was stuck in this part for a while - I use the tf.matmul() function and it works.

1 Like

Wow.
Thanks. That totally fixed it.

Sorry, I missed catching the use of np.matmul().