Attention Output shape

Anbu · May 8, 2022, 1:45pm

Hi Sir,

In the Assignment notebook shape of the attention output mentioned as (batchsize, input_seq_length,fully_connected_dim) but in the tensorflow documentation. below is description mentioned for attention output shape. what is E here ?

The result of the computation, of shape (B, T, E) , where T is for target sequence shapes and E is the query input last dimension if output_shape is None . Otherwise, the multi-head outputs are project to the shape specified by output_shape.

How fully_connected_dim and E are related ?

TMosh · May 9, 2022, 2:38am

I don’t see batchsize, input_seq_length, or fully_conected_dim used in the notebook.

Can you provide a screen capture images where you see those values being used?

TMosh · May 9, 2022, 2:43am

It will also help if you mention which week and assignment you’re working on.

Anbu · May 9, 2022, 4:59pm

The assignment is course 5 week 4 Transformer architecture. In the below link, at the bottom shape of attention output mentioned

Anbu · May 9, 2022, 5:00pm

We dont know what is E means ? Because usually It should value depth as the last dimension .

TMosh · May 10, 2022, 3:50am

Thanks for specifying which assignment you are asking about. I was looking in the Week 3 Assignment 1 notebook.

TMosh · May 10, 2022, 4:09am

Since the assignment does not set the output_shape argument, it defaults to “None”. So ‘E’ should be the “query shape last dimension”.

Anbu · May 10, 2022, 5:03pm

Query Input means ? Also attention output shape last dimension should be value_depth as per lecuture, concepts and exercise 3 scaled dot product attention. can you please clarify why E ? instead of value_depth dimension

TMosh · May 10, 2022, 8:32pm

I do not know. You would have to ask the authors of the Keras documentation why they used ‘E’ for this.

TMosh · May 11, 2022, 1:47am

Also, I think you will be interested in the discussion at this thread:

Topic		Replies	Views
Query Input Last Dimension Sequence Models	10	583	May 12, 2022
Q about keras doc of tf.keras.layers.MultiHeadAttention Sequence Models	6	560	July 18, 2021
C5W4 Assignment: Multi-head attention output dimension Sequence Models week-4	2	269	January 18, 2024
W4 Assignment-Exercise6; why shape after second Add&Norm Layer is (batch_size, n_target, full_connected_dim) not (batch_size, n_target, d_model)? Sequence Models	2	424	October 12, 2023
C5W4 Questions after finish the course Sequence Models	5	264	December 30, 2023

Attention Output shape

Related topics