I have stuck with Course 5 Week 4 Assignments1 Ex3

nananana · August 8, 2021, 4:26am

{moderator edit - solution code removed}

I made the code above,and get the following results.

It seems to me that something is wrong with my code,and as a result,the attention weights is different from the right result,but I can’t tell where is the wrong point in my code. Please help.

TMosh · August 8, 2021, 4:56am

I recommend you read the instructions for Exercise 3 carefully for how to apply the mask.

TMosh · August 8, 2021, 4:58am

For the attention_weights, I recommend you use the tf.keras.activations.softmax(…) function.

For output, you should compute the matrix product of attention_weights and v.

nananana · August 8, 2021, 5:13am

Thank you for your suggestion.

sushantnair · August 17, 2024, 12:32pm

Hello,

I apologize for reviving a solved topic, but I feel I need clarification on something and in my opinion the best way would be to ask right here so that the context is clear.

My doubt is about the matmul_qk and dk code. I had absolutely no clue how to do it until by God’s grace I stumbled upon this topic, where in the screenshot the code for those variables is given.

What I want to know is, given the instructions in the notebook, how on earth would someone figure out how to code for those two variables, i.e., matmul_qk and dk? Sure, the tf.matmul hint might help with the first variable, but I feel it would be next to impossible to figure out the second variable. So please explain how one is supposed to be able to code the second variable (dk) using only the instructions provided. Thank you and sorry again for reviving an old topic.

On thinking more closely, it is more clear about what matmul_qk and dk is. Yet, it is not clear how dk would be computed. Please shed light on that.

paulinpaloalto · August 17, 2024, 3:27pm

For matmul_qk, they literally write out the formula for you: QK^T. You just have to remember Prof Ng’s notational conventions which have been absolutely consistent from the very beginning of DLS Course 1: when he writes two array, vector or tensor arguments adjacent with no explicit operator, that means matrix multiplication (real “dot product” style). If he meant elementwise multiply, he consistently uses the operator *.

For d_k, the instructions say this:

𝑑𝑘
  is the dimension of the keys, which is used to scale everything down so the softmax doesn't explode

So maybe there is a bit of ambiguity there about which dimension they mean, but it is the sequence length. I used index -2 for that, just to be careful whether the “samples” dimension is present or not.

sushantnair · August 17, 2024, 3:31pm

Hi, I have some implementation doubts and have sent an invite to you for a DM. Please help me by looking at my code.

paulinpaloalto · August 17, 2024, 4:19pm

Sure, I will respond in about an hour. Am away from my computer now. But if you copied all the code from that other thread, there are a number of problems with it. E.g. there should be no calls to the normalize function.

paulinpaloalto · August 17, 2024, 4:22pm

Also note that I can’t directly see your code. Please download your notebook and attach it to a reply on your DM thread.

sushantnair · August 17, 2024, 4:24pm

Hi Paul,

Thank you so much for helping, but my problem got solved:

So glad it got solved…

Topic		Replies	Views
C5_W4_A1 assignment Exercise 3 Sequence Models coursera-platform	5	424	February 8, 2024
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3216	March 24, 2025
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4 , coursera-platform	6	68	September 6, 2024
C4W2_Assignment in Natural Language Processing with Attention NLP with Attention Models week-3	2	64	September 2, 2024
Wrong masked weights error Sequence Models coursera-platform	2	816	September 22, 2021

I have stuck with Course 5 Week 4 Assignments1 Ex3

Related topics