In the Assignment notebook shape of the attention output mentioned as (batchsize, input_seq_length,fully_connected_dim) but in the tensorflow documentation. below is description mentioned for attention output shape. what is E here ?
The result of the computation, of shape (B, T, E) , where T is for target sequence shapes and E is the query input last dimension if output_shape is None . Otherwise, the multi-head outputs are project to the shape specified by output_shape.
Query Input means ? Also attention output shape last dimension should be value_depth as per lecuture, concepts and exercise 3 scaled dot product attention. can you please clarify why E ? instead of value_depth dimension