Natural Language Processing Specialization - C4W2_Assignment - Transformer Summarizer 14-June-2024 version

Prashant_Mahajan · June 14, 2024, 7:34pm

Specialization: Natural Language Processing Specialization
Course: Natural Language Processing with Attention models
Week: 2
Assignment: C4W2
Function: DecoderLayer.call / Block2

I am not able to code the “GRADED FUNCTION: DecoderLayer” - I understand the Encoder Layer from previous steps.

I reached so far as the first multiheaded attention, followed by the normalization. I am not able to do after this for second multiheaded attention.

The instruction is "calculate self-attention using the Q from the first block and K and V from the encoder output. ".

Parameter: enc_output (tf.Tensor): Tensor of shape(batch_size, input_seq_len, fully_connected_dim)

In that understand what the Q is, I understand there is “encoder output” passed as parameter to the Decoder Layer. How do I get K and V from the encoder output? When “Encoder(…)” is called it returns Tensor (batch_size, input_seq_len, embedding_dim) - how to derive K and V from this?

I also doubt I have coded correctly for first MHA. What values and how to pass those for query, key and value for first MHA? There is parameter “x”, I don’t understand how to derive query/key/value from this.

x (tf.Tensor): Tensor of shape (batch_size, target_seq_len, fully_connected_dim)

Regards,
Prashant

Anna_Kay · June 14, 2024, 9:18pm

Hello @Prashant_Mahajan!

From arXiv:1706.03762,
that’s the original transformer paper architecture image, which is actually the same with the assignment, but is easier to see here how encoder and decoder are connected, and where K and V come from.

Getting K and V from the “encoder output” is as straightforward as depicted in the architecture. Not much is need

“I also doubt I have coded correctly for first MHA. What values and how to pass those for query, key and value for first MHA? There is parameter “x”, I don’t understand how to derive query/key/value from this.”

For this you can check out the implementation in the EncoderLayer class, in the DecoderLayer class it should be done in a similar manner.

For the Encoder it represents this part:

Lastly, if you have coded mha1 correctly the value of Q1 after the normalization should be:

Q1: tf.Tensor(
[[[ 0.4669652 -0.9565742 1.4485569 1.8931961 -1.232835
-0.68775105 -0.05015087 0.54335546 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.4669652 -0.9565742 1.4485569 1.8931961 -1.232835
-0.68775105 -0.05015087 0.54335546 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.46696472 -0.9565742 1.4485574 1.8931956 -1.232835
-0.68775105 -0.05015087 0.543355 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.4669652 -0.9565742 1.4485569 1.8931961 -1.232835
-0.68775105 -0.05015087 0.54335546 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.4669652 -0.9565742 1.4485579 1.8931961 -1.232835
-0.68775105 -0.05015087 0.543355 0.89909315 -1.1035085
-0.36235237 -0.857991 ]
[ 0.4669652 -0.95657444 1.4485574 1.8931961 -1.2328348
-0.68775105 -0.05015087 0.543355 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.4669652 -0.95657396 1.4485579 1.8931961 -1.2328348
-0.68775105 -0.05015087 0.54335546 0.89909315 -1.1035085
-0.36235285 -0.857991 ]
[ 0.4669652 -0.95657444 1.4485579 1.8931956 -1.2328348
-0.68775105 -0.05015087 0.543355 0.89909315 -1.1035087
-0.36235237 -0.85799074]
[ 0.46696424 -0.9565742 1.4485579 1.8931961 -1.2328348
-0.6877508 -0.05015135 0.543355 0.89909315 -1.1035085
-0.36235285 -0.857991 ]
[ 0.4669652 -0.95657396 1.4485569 1.8931961 -1.232835
-0.68775105 -0.05015087 0.54335546 0.89909315 -1.1035087
-0.36235237 -0.857991 ]
[ 0.4669652 -0.9565742 1.4485574 1.8931956 -1.2328348
-0.6877513 -0.05015135 0.543355 0.89909315 -1.1035087
-0.36235237 -0.8579912 ]
[ 0.46696472 -0.9565742 1.4485574 1.8931961 -1.232835
-0.68775105 -0.05015087 0.543355 0.89909315 -1.103509
-0.36235237 -0.85799074]
[ 0.46696472 -0.95657444 1.4485574 1.8931961 -1.232835
-0.6877508 -0.05015087 0.543355 0.89909315 -1.1035087
-0.36235237 -0.85799146]
[ 0.46696472 -0.9565742 1.4485574 1.8931961 -1.2328348
-0.6877508 -0.05015087 0.54335546 0.89909315 -1.1035087
-0.36235237 -0.85799146]
[ 0.4669652 -0.9565742 1.4485569 1.8931961 -1.2328355
-0.68775105 -0.05015087 0.543355 0.89909315 -1.1035085
-0.36235285 -0.85799074]]], shape=(1, 15, 12), dtype=float32)

You can also check out other questions marked with ‘C4W2 Decoder’ to get a general sense.

Best

Topic		Replies	Views
Week 4 exercise 6 decoderlayer (ed: potentially obsolete information from 2022) Sequence Models coursera-platform	11	900	January 27, 2024
Question about C4_W2_Assignment on Exercise 2 - DecoderLayer NLP with Attention Models week-module-2	8	458	February 19, 2024
C4W2_Assignment_Transformer Summarizer_Exercise 2 - DecoderLayer NLP with Attention Models week-module-4	4	65	December 3, 2024
Understanding multi-headed attention - C5W4A1 Sequence Models coursera-platform	1	674	May 10, 2022
Conceptual Questions about Transformers Sequence Models coursera-platform	13	675	April 23, 2023

Natural Language Processing Specialization - C4W2_Assignment - Transformer Summarizer 14-June-2024 version

Related topics