I am not understanding much of anything on what to do to implement this function
NMTAttn(input_vocab_size=33300,
target_vocab_size=33300,
d_model=1024,
n_encoder_layers=2,
n_decoder_layers=2,
n_attention_heads=4,
attention_dropout=0.0,
mode=‘train’)
There is no reference to input or target tokens in the function arguments, and these are not subsequently defined in the function. Are we supposed to use the variables assigned in the prior statement in Section 1.5?
input_batch, target_batch, mask_batch = next(train_batch_stream)
If so, that is bad programming. This function is not a class method, so no variables defined outside the function should be used. It is unnecessarily time consuming to scroll up and review every single line of code in the assignment to look for what should be input to the function.
Then I see a statement like this:
Step 4: prepare queries, keys, values and mask for attention.
None('PrepareAttentionInput', None, n_out=4),
At first, I thought we would be calling prepare_attention_input(encoder_activations, decoder_activations, inputs) here. But ‘PrepareAttentionInput’ is not what I would pass in as the argument for encoder_activations, so I am not sure what to do here.
Would appreciate if someone could answer my questions and maybe refer me to where I could look up code for calls like None(‘PrepareAttentionInput’, None, n_out=4)