Hello! I suppose that “logits” in some relationships with “softmax layer”, but need more information.
What do you mean?
In this exercise you’re asked to use tf.nn.log_softmax
activation for the last layer to get “log probabilities”. In other words, NOT softmax activation (which result in [0.0…1.0] values) which imitate “probabilities”.
Is this the relationship you’re asking about?
Regards
Thank you for reply!
In “def init” … activation=tf.nn.log_softmax(vocab_size-?) (logits as argument in library)
In “call”…# Compute the logits
logits = context - self.output_layer(context -?)
I do not understand how to connect this part…
It’s just activation=tf.nn.log_softmax
(no need to call it with any parameters).
to get logits
(the more correct variable name should have been log_probs) you just call the self.output_layer(x)
(no need for context
- the only placed it is used in the Decoder is in Cross attention).
With these parameters I got error… Where may be problem?
AttributeError: Exception encountered when calling layer ‘decoder_15’ (type Decoder).
‘Decoder’ object has no attribute ‘LSTM’
Call arguments received by layer ‘decoder_15’ (type Decoder):
• context=tf.Tensor(shape=(64, 14, 256), dtype=float32)
• target=tf.Tensor(shape=(64, 15), dtype=int64)
• state=None
• return_state=False
I guess somewhere in the code you used self.LSTM(..)
or similar? The decoder does not have this attribute. According to the skeleton code you were provided, it should only have these class attributes:
embedding
,pre_attention_rnn
,attention
(where you should use the context),post_attention_rnn
(this is your LSTM)- and
output_layer
And you only need these in the call(..)
Thank you!
And what expected as "initial_state"a parameter? (vector? bool?) in “self.pre_attention_rnn(x, initial_state=”
It defaults to None
and you don’t need it your case. In other words, you only need x
for “pre_attention_rnn”.
Thank you!
After all corrections Value error exist… where is ‘decoder_25’, “lstm_51” --inside library?
"—> 65 x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=None)
66
67 # Perform cross attention between the context and the output of the LSTM (in that order)
ValueError: Exception encountered when calling layer ‘decoder_25’ (type Decoder).
Input 0 of layer “lstm_51” is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (64, 14, 256, 256)
Call arguments received by layer ‘decoder_25’ (type Decoder):
• context=tf.Tensor(shape=(64, 14, 256), dtype=float32)
• target=tf.Tensor(shape=(64, 15), dtype=int64)
• state=None
• return_state=False"
You are probably embedding the context and not the target (which you should in the decoder’s case) in the first line of code (“# Get the embedding of the input
”).
Great help! Thank you
Hi! Sorry to hop in here after so many months, but if we can leave initial_state = None then why do the directions say “# - Pass in the state to the LSTM (needed for inference)” what is that "state"referring to?
Never mind I think I get it!