C5_W4 UNQ_C4: Incorrect Description of return_attention_scores

In the hints for Exercise 4, “EncoderLayer,” return_attention_scores is defined as:

A boolean to indicate whether the output should be attention output if True, or (attention_output, attention_scores) if False. Defaults to False.

This is the opposite of the definition given in the TF documentation for MHA:

A boolean to indicate whether the output should be (attention_output, attention_scores) if True, or attention_output if False. Defaults to False.

When this argument is used in Exercise 6, “DecoderLayer,” the given hint matches that given in Exercise 4:

The first two blocks are fairly similar to the EncoderLayer except you will return attention_scores when computing self-attention

But, in the code, it follows the TF definition, returning (attention_output, attention_scores):

mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True)

There are a lot of issues in this lab, hopefully there will be some updates soon.