Week 4 Assignment Transformer Architecture: Linear Layer before Softmax

There is no specification on the number of units in the linear layer before the final softmax layer which has an impact on the output. Grateful if anyone can point out if I missed something?

Hey @luchungi,

are you using the latest version of the assignment?

I’ve just checked it out and for the final dense layer in the transformer decoder they specify the number of units equal to the target vocabulary size.

class Transformer(tf.keras.Model):
  def __init__(...):
    ...
    self.final_layer = Dense(target_vocab_size, activation='softmax')

Hi @manifest,

My mistake as I misinterpreted the instructions that mentioned a linear layer followed by a Softmax layer to mean a Dense layer before the final Softmax layer. I see that it means a Dense layer with a Softmax activation now.