Week 4 Assignment Transformer Architecture: Linear Layer before Softmax

luchungi · May 22, 2021, 7:16am

There is no specification on the number of units in the linear layer before the final softmax layer which has an impact on the output. Grateful if anyone can point out if I missed something?

manifest · May 22, 2021, 4:26pm

Hey @luchungi,

are you using the latest version of the assignment?

I’ve just checked it out and for the final dense layer in the transformer decoder they specify the number of units equal to the target vocabulary size.

class Transformer(tf.keras.Model):
  def __init__(...):
    ...
    self.final_layer = Dense(target_vocab_size, activation='softmax')

luchungi · May 24, 2021, 2:06am

Hi @manifest,

My mistake as I misinterpreted the instructions that mentioned a linear layer followed by a Softmax layer to mean a Dense layer before the final Softmax layer. I see that it means a Dense layer with a Softmax activation now.

Topic		Replies	Views
C5_W4_Ex- 8_class Transformer(tf.keras.Model) Sequence Models coursera-platform	12	1343	October 18, 2022
C5_W4_A1 exercise8 Sequence Models coursera-platform	5	521	June 26, 2022
Why is Units same as size of vocabulary for dense layer Doubt NLP with Attention Models week-module-1	6	248	April 2, 2024
C5_W4_A1_Transformer_Subclass_v1: UNQ_C8 Sequence Models coursera-platform	6	462	September 5, 2023
Dense layer with 5 units (since there are 5 categories) with a softmax activation Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	3	611	August 9, 2022

Week 4 Assignment Transformer Architecture: Linear Layer before Softmax

Related topics