Attention Output shape

Hi Sir,

In the Assignment notebook shape of the attention output mentioned as (batchsize, input_seq_length,fully_connected_dim) but in the tensorflow documentation. below is description mentioned for attention output shape. what is E here ?

The result of the computation, of shape (B, T, E) , where T is for target sequence shapes and E is the query input last dimension if output_shape is None . Otherwise, the multi-head outputs are project to the shape specified by output_shape.

How fully_connected_dim and E are related ?

I don’t see batchsize, input_seq_length, or fully_conected_dim used in the notebook.

Can you provide a screen capture images where you see those values being used?

It will also help if you mention which week and assignment you’re working on.

The assignment is course 5 week 4 Transformer architecture. In the below link, at the bottom shape of attention output mentioned

We dont know what is E means ? Because usually It should value depth as the last dimension .

Thanks for specifying which assignment you are asking about. I was looking in the Week 3 Assignment 1 notebook.

Since the assignment does not set the output_shape argument, it defaults to “None”. So ‘E’ should be the “query shape last dimension”.

Query Input means ? Also attention output shape last dimension should be value_depth as per lecuture, concepts and exercise 3 scaled dot product attention. can you please clarify why E ? instead of value_depth dimension

I do not know. You would have to ask the authors of the Keras documentation why they used ‘E’ for this.

Also, I think you will be interested in the discussion at this thread: