C5 W4 A1 DecoderLayer Ex6

balaji.ambresh · August 2, 2023, 1:23pm

Please look at the architecture of the decoder layer.

You have changed the comment manually which explains the additional dropout layers in your implementation. Dropout needs to be applied only once i.e. to the output of the feed forward network.

From starter code:

# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1.
# Dropout will be applied during training (~1 line).

Yours:

# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1 (~1 line)
# LINE OF CODE
# apply dropout layer on the attention output (~1 line)
# LINE OF CODE APPLYING DROPOUT

Please follow these steps to refresh your workspace if required. Change code at places only where required. See the section Important Note on Submission to the AutoGrader in the notebook as well.

Topic		Replies	Views
Weak4-Assignment 1 Sequence Models	1	414	July 19, 2023
Decoder Layer "Wrong values in out" Sequence Models	3	459	August 23, 2023
C5_W4_A6 Decoder Layer Sequence Models	11	1140	August 6, 2021
C5W4 - Cell #20. Can't compile the student's code. Error: AssertionError('Wrong values in attn_w_b2. Check the call to self.mha2') Sequence Models	5	1627	October 2, 2022
Programming Assignment: Transformers Architecture with TensorFlow-Exercise 6 - DecoderLayer Deep Learning Resources	3	325	February 13, 2024

C5 W4 A1 DecoderLayer Ex6

Related topics