Dear all,
Facing a challenge on the encoder block function where the results show:
I tried adding the return_attention_scores
= True and training = training
argument in the MHA function, but in vain
Please help.
Sheila
Dear all,
Facing a challenge on the encoder block function where the results show:
I tried adding the return_attention_scores
= True and training = training
argument in the MHA function, but in vain
Please help.
Sheila
Hello @Sheila_Murunga
you do not need to use return attention_score=True for exercise 5, for the line
Pass the encoded embedding through a dropout layer
but surely use training=training.
Note this error can also happen, if in exercise 4 you have included training=training in correct layer other than stated otherwise.
If the above things are followed correctly, then share the codes screenshot for exercise 4 and exercise 5 via personal DM. Click on my name and then message.
Regards
DP
Hi @Deepti_Prasad,
Thanks for getting back to me on this. Its exercise 4 that am struggling with, I made a mistake pointing out exercise 5.
In exercise 4, I added the ‘training = training’ argument in the MHA function as well as the dropout function. Though am not getting it correct
What I also noticed is, the tensor results keep changing every time I run the ‘# UNIT TEST
EncoderLayer_test(EncoderLayer)’
Let me share the code to you
Thanks
Sheila
Hello @Sheila_Murunga
Corrections required
For exercise 1 def get angles, while calculating angles there needs a correction as the formula for denominator is incorrect, please use numpy function (np.power) for
Calculate the angles using pos, i and d(HINT:Make sure the formula states 10000 raise to the power and not 10000 x to (2 *I/d)
In exercise 3, def scaled_dot_product_attention, for
matmul qk matrix multiplication, you need to use transpose_b to be true.
3…In the same exercise 3, remainder instructions was told to use
Multiply (1. - mask) by -1e9 before applying the softmax.
but you used -1.0e9
kindly use tf.nn.softmax instead of tf.keras.activations.softmax
Let me know if it resolved your issue.
Regards
DP
Hello @Sheila_Murunga
The reason you are still getting the error because you didn’t do all the corrections I mentioned
I am pointing out again the corrections you need to do
Regards
DP
Hi @Deepti_Prasad,
Have made the changes as suggested but still getting the same results.
Something else, I have noticed that I mentioned earlier was everytime you run the following code:
EncoderLayer_test(EncoderLayer)
I get different results when I print x. For example:
1st Run of the code above:
tf.Tensor(
[[[ 0.32932317 -1.4980435 -0.1100653 1.2787857 ]
[-1.294352 -0.37868017 0.2113227 1.4617096 ]
[ 0.7667096 -0.8656752 -1.1033096 1.2022753 ]]], shape=(1, 3, 4), dtype=float32)
2nd Run of the code above:
tf.Tensor(
[[[ 0.9128431 -1.6912671 0.43020165 0.3482222 ]
[-1.5253356 0.32237953 -0.04885508 1.2518111 ]
[ 1.328844 -1.4135731 -0.29851308 0.3832423 ]]], shape=(1, 3, 4), dtype=float32)
2nd Run of the code above:
tf.Tensor(
[[[-0.52021545 -1.3031464 1.3411331 0.48222876]
[-1.5430954 -0.09155926 1.1877482 0.4469065 ]
[ 0.9904138 -1.375323 -0.5333159 0.9182252 ]]], shape=(1, 3, 4), dtype=float32)
not why is that?
Thanks
Sheila
Hi Deepti,
Correct me if am wrong but, besides making the corrections suggested, scaled_dot_product_attention() function only delineates how the EncoderLayer() works when estimating the multiHead attention outputs, and not necessarily used in the EncoderLayer(). Will the correction affect the output of the EncoderLayer(). Please advice
Sheila
You are correct, that specific “scaled_dot_product_attention()” function isn’t used in the Transformer encoder or decoder.
There is a multi-head-attention method in the Encoder and Decoder, but there we use a pre-written TensorFlow Keras layer.
Are you till getting the assertion error?
Can you DM me the codes on how you corrected for point 1
Thanks @TMosh for confirming.
I still want to ask, why do the values change after every run of the EncodeLayer() function with results below:
1st Run of the code above:
tf.Tensor(
[[[ 0.32932317 -1.4980435 -0.1100653 1.2787857 ]
[-1.294352 -0.37868017 0.2113227 1.4617096 ]
[ 0.7667096 -0.8656752 -1.1033096 1.2022753 ]]], shape=(1, 3, 4), dtype=float32)
2nd Run of the code above:
tf.Tensor(
[[[ 0.9128431 -1.6912671 0.43020165 0.3482222 ]
[-1.5253356 0.32237953 -0.04885508 1.2518111 ]
[ 1.328844 -1.4135731 -0.29851308 0.3832423 ]]], shape=(1, 3, 4), dtype=float32)
2nd Run of the code above:
tf.Tensor(
[[[-0.52021545 -1.3031464 1.3411331 0.48222876]
[-1.5430954 -0.09155926 1.1877482 0.4469065 ]
[ 0.9904138 -1.375323 -0.5333159 0.9182252 ]]], shape=(1, 3, 4), dtype=float32)
Thanks
Sheila
I made the changes on the positional_encoding() function, you suggested as below:
pow = 2 * i/d
angles = pos / np.power(10000, pow)
I know the positional_encoding() function is used in Exercise 5 when defining the full Encoder but not used in Exercise 4.
Please let us tackle the EncodeLayer() function in Exercise 4 first. Why are the results changing everytime I run the function that has a fixed seed indicated in the public_test.py file
Thanks
Sheila
avoid hard coding your path when writing the codes. Please check your DM
Please show us how you are invoking the function to get different outputs. I added the print statements to show the results in the EncoderLayer
function and I added another cell invoking the test function and I get consistent results every time.
Thanks @paulinpaloalto. I also ran the same code but directly on the Coursera platform and I passed it.
However, if I download the same folder and run it on my local computer, I encounter error results as shown below:
Why is that, is there a difference between the Coursera platform and the local computers? Naive question, though but would like to know
Thanks
Sheila
The difference is probably the versions of the packages you installed on your local computer.
Thanks
Hello @Sheila_Murunga
and I was wondering why is your code still showing error. Did you download all the necessary files required you to the assignment on your local computer??
The reason you will see a different output every time you run the encoder layer() function code because the weights have been initialized randomly. Word embeddings. This output will be different every time you run the code because of the random numbers.
As Tom already mentioned to you this could be keras layer difference too.
I am sharing a link which explains about the same, you can go through in detail.
Regards
DP
Thank you so much for your help @Deepti_Prasad. I really appreciate your help
Kind regards,
Sheila