Hi @Mohammad_Atif_Khan
What you are missing is that LSTM does not “reduce the 2nd dimension”.
In your example LSTM would return you [10, 20, 128], then Mean would reduce “the second dimension” [10, 128] and finally Normalize would “act on” last dimension and would return [10, 128].
Also I’m not sure if you understand the inputs to the model, so just in case I will elaborate:
let’s say we input [2, 10, 20] for [pair of batches, batches, max_len]
In your case ([2, 10, 20]) is 10 pairs of sentences which each is padded/truncated to length 20.
For example, with a single pair (when comparing just a pair of sentences, not a batch of pairs, so the batch size is 1):
Input:
q1 - [‘When’, ‘will’, ‘I’, ‘see’, ‘you’, ‘?’]
q2 - [‘When’, ‘can’, ‘I’, ‘see’, ‘you’, ‘again’, ‘?’]
encoding:
Q1 - [585, 76, 4, 46, 53, 21]
Q2 - [585, 33, 4, 46, 53, 7280, 21]
data_generator - padding and batch dimension:
Q1 - array([[585, 76, 4, 46, 53, 21, 0, 0]])
Q2 - array([[ 585, 33, 4, 46, 53, 7280, 21, 0]])
in this case the input to the model would be a tuple:
(array([[585, 76, 4, 46, 53, 21, 0, 0]]), array([[ 585, 33, 4, 46, 53, 7280, 21, 0]]))
- Parallel would split the tuple
For each strand:
- Embedding - inputs.shape - (1, 8) → outputs.shape - (1, 8, 128)
- LSTM - inputs.shape - (1, 8, 128) → outputs.shape - (1, 8, 128)
- Mean - inputs.shape - (1, 8, 128) → outputs.shape - (1, 128)
- Normalize - inputs.shape - (1, 128) → outputs.shape - (1, 128)