Week 4_Ex4_Transformer_Subclass_Encoderlayer

Sheila_Murunga · February 29, 2024, 9:56am

Dear all,

Facing a challenge on the encoder block function where the results show:

I tried adding the return_attention_scores = True and training = trainingargument in the MHA function, but in vain

Please help.

Sheila

Deepti_Prasad · February 29, 2024, 10:07am

you do not need to use return attention_score=True for exercise 5, for the line
Pass the encoded embedding through a dropout layer

but surely use training=training.

Note this error can also happen, if in exercise 4 you have included training=training in correct layer other than stated otherwise.

If the above things are followed correctly, then share the codes screenshot for exercise 4 and exercise 5 via personal DM. Click on my name and then message.

Regards
DP

Sheila_Murunga · February 29, 2024, 10:26am

Hi @Deepti_Prasad,

Thanks for getting back to me on this. Its exercise 4 that am struggling with, I made a mistake pointing out exercise 5.

In exercise 4, I added the ‘training = training’ argument in the MHA function as well as the dropout function. Though am not getting it correct

What I also noticed is, the tensor results keep changing every time I run the ‘# UNIT TEST
EncoderLayer_test(EncoderLayer)’

Let me share the code to you

Thanks
Sheila

Deepti_Prasad · February 29, 2024, 11:05am

Hello @Sheila_Murunga

Corrections required

For exercise 1 def get angles, while calculating angles there needs a correction as the formula for denominator is incorrect, please use numpy function (np.power) for
Calculate the angles using pos, i and d(HINT:Make sure the formula states 10000 raise to the power and not 10000 x to (2 *I/d)
In exercise 3, def scaled_dot_product_attention, for
matmul qk matrix multiplication, you need to use transpose_b to be true.

3…In the same exercise 3, remainder instructions was told to use

Multiply (1. - mask) by -1e9 before applying the softmax.

but you used -1.0e9

for code line
softmax is normalized on the last axis (seq_len_k) so that the scores

kindly use tf.nn.softmax instead of tf.keras.activations.softmax

Let me know if it resolved your issue.

Regards
DP

Deepti_Prasad · March 1, 2024, 6:23am

Hello @Sheila_Murunga

The reason you are still getting the error because you didn’t do all the corrections I mentioned

I am pointing out again the corrections you need to do

Regards
DP

Sheila_Murunga · March 1, 2024, 8:09pm

Hi @Deepti_Prasad,

Have made the changes as suggested but still getting the same results.

Something else, I have noticed that I mentioned earlier was everytime you run the following code:

EncoderLayer_test(EncoderLayer)

I get different results when I print x. For example:

1st Run of the code above:

tf.Tensor(
[[[ 0.32932317 -1.4980435 -0.1100653 1.2787857 ]
[-1.294352 -0.37868017 0.2113227 1.4617096 ]
[ 0.7667096 -0.8656752 -1.1033096 1.2022753 ]]], shape=(1, 3, 4), dtype=float32)

2nd Run of the code above:
tf.Tensor(
[[[ 0.9128431 -1.6912671 0.43020165 0.3482222 ]
[-1.5253356 0.32237953 -0.04885508 1.2518111 ]
[ 1.328844 -1.4135731 -0.29851308 0.3832423 ]]], shape=(1, 3, 4), dtype=float32)

2nd Run of the code above:

tf.Tensor(
[[[-0.52021545 -1.3031464 1.3411331 0.48222876]
[-1.5430954 -0.09155926 1.1877482 0.4469065 ]
[ 0.9904138 -1.375323 -0.5333159 0.9182252 ]]], shape=(1, 3, 4), dtype=float32)

not why is that?

Thanks
Sheila

Sheila_Murunga · March 1, 2024, 9:04pm

Hi Deepti,

Correct me if am wrong but, besides making the corrections suggested, scaled_dot_product_attention() function only delineates how the EncoderLayer() works when estimating the multiHead attention outputs, and not necessarily used in the EncoderLayer(). Will the correction affect the output of the EncoderLayer(). Please advice

Sheila

TMosh · March 1, 2024, 9:29pm

You are correct, that specific “scaled_dot_product_attention()” function isn’t used in the Transformer encoder or decoder.

There is a multi-head-attention method in the Encoder and Decoder, but there we use a pre-written TensorFlow Keras layer.

Deepti_Prasad · March 1, 2024, 11:25pm

Are you till getting the assertion error?

Can you DM me the codes on how you corrected for point 1

Sheila_Murunga · March 2, 2024, 8:11am

Thanks @TMosh for confirming.

I still want to ask, why do the values change after every run of the EncodeLayer() function with results below:

1st Run of the code above:

tf.Tensor(
[[[ 0.32932317 -1.4980435 -0.1100653 1.2787857 ]
[-1.294352 -0.37868017 0.2113227 1.4617096 ]
[ 0.7667096 -0.8656752 -1.1033096 1.2022753 ]]], shape=(1, 3, 4), dtype=float32)

2nd Run of the code above:
tf.Tensor(
[[[ 0.9128431 -1.6912671 0.43020165 0.3482222 ]
[-1.5253356 0.32237953 -0.04885508 1.2518111 ]
[ 1.328844 -1.4135731 -0.29851308 0.3832423 ]]], shape=(1, 3, 4), dtype=float32)

2nd Run of the code above:

tf.Tensor(
[[[-0.52021545 -1.3031464 1.3411331 0.48222876]
[-1.5430954 -0.09155926 1.1877482 0.4469065 ]
[ 0.9904138 -1.375323 -0.5333159 0.9182252 ]]], shape=(1, 3, 4), dtype=float32)

Thanks
Sheila

Sheila_Murunga · March 2, 2024, 8:16am

I made the changes on the positional_encoding() function, you suggested as below:

pow = 2 * i/d
    angles = pos / np.power(10000, pow)

I know the positional_encoding() function is used in Exercise 5 when defining the full Encoder but not used in Exercise 4.

Please let us tackle the EncodeLayer() function in Exercise 4 first. Why are the results changing everytime I run the function that has a fixed seed indicated in the public_test.py file

Thanks
Sheila

Deepti_Prasad · March 2, 2024, 8:20am

avoid hard coding your path when writing the codes. Please check your DM

paulinpaloalto · March 2, 2024, 3:48pm

Please show us how you are invoking the function to get different outputs. I added the print statements to show the results in the EncoderLayer function and I added another cell invoking the test function and I get consistent results every time.

Sheila_Murunga · March 3, 2024, 8:39am

Thanks @paulinpaloalto. I also ran the same code but directly on the Coursera platform and I passed it.

However, if I download the same folder and run it on my local computer, I encounter error results as shown below:

Why is that, is there a difference between the Coursera platform and the local computers? Naive question, though but would like to know

Thanks
Sheila

TMosh · March 3, 2024, 8:51am

The difference is probably the versions of the packages you installed on your local computer.

Sheila_Murunga · March 3, 2024, 10:05am

Thanks

Deepti_Prasad · March 3, 2024, 1:57pm

Hello @Sheila_Murunga

and I was wondering why is your code still showing error. Did you download all the necessary files required you to the assignment on your local computer??

The reason you will see a different output every time you run the encoder layer() function code because the weights have been initialized randomly. Word embeddings. This output will be different every time you run the code because of the random numbers.

As Tom already mentioned to you this could be keras layer difference too.

I am sharing a link which explains about the same, you can go through in detail.

https://machinelearningmastery.com/the-transformer-positional-encoding-layer-in-keras-part-2/#:~:text=Note%20that%20you%20will%20see,of%20the%20random%20numbers%20involved.

Regards
DP

Sheila_Murunga · March 4, 2024, 5:31am

Thank you so much for your help @Deepti_Prasad. I really appreciate your help

Kind regards,
Sheila

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_V1 UNQ_C4 function failure Sequence Models week-module-4 , coursera-platform	18	354	April 23, 2024
Ex4 fails on my local PC environment, while same code passes on Jupyter cloter Sequence Models week-module-4 , coursera-platform	15	234	May 22, 2024
Sequence Models Week 4 Assignment Sequence Models coursera-platform	26	677	September 18, 2023
C5_W4_A1_Transformer_Subclass_v1 grader output Sequence Models coursera-platform	4	545	February 19, 2022
C5 W4 class Encoder Sequence Models coursera-platform	4	816	October 25, 2021

Week 4_Ex4_Transformer_Subclass_Encoderlayer

Related topics