# calculate self-attention using mha(~1 line). Dropout will be applied during training
self_attn_output = self.mha(x, x, x, mask) # Self attention (batch_size, input_seq_len, embedding_dim)
# apply dropout layer to the self-attention output (~1 line)
self_attn_output = self.dropout_ffn(self_attn_output, training=training)
# apply layer normalization on sum of the input and the attention output to get the
# output of the multi-head attention layer (~1 line)
mult_attn_out = self.layernorm1(x + self_attn_output) # (batch_size, input_seq_len, embedding_dim)
# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = self.ffn(mult_attn_out) # (batch_size, input_seq_len, embedding_dim)
# apply dropout layer to ffn output (~1 line)
ffn_output = self.dropout_ffn(ffn_output, training=training)
# apply layer normalization on sum of the output from multi-head attention and ffn output to get the
# output of the encoder layer (~1 line)
encoder_layer_out = self.layernorm2(ffn_output + mult_attn_out) # (batch_size, input_seq_len, embedding_dim)
# END CODE HERE
in this code i have’nt used ‘dropout1’ but the ouptu message is “Cell #16. Can’t compile the student’s code. Error: AttributeError(”‘EncoderLayer’ object has no attribute ‘dropout1’")"
what to do for this?
Where did you get your notebook from? The course notebook was updated on Dec. 3 2021. Your code for the second step seems to be the wrong instructions. It should say:
# apply layer normalization on sum of the input and the attention output