I’m going through the C4W2_Assignment notebook. I had gone throught this course before and re-enrolled following the new tensorflow code about which I’m interested.
I’m running the notebook and my code runs. There’s an issue with slighlyt different weights to which the unit tests expects - they vary every so slightly, but they are different.
Then I’m training the model but it’s not really learning. It predicts an SOS followed by an EOS token regardless of how long I train it.
My guesses are:
I might have an issue with my decoder layer implementation (or full decoder implementation).
There might be an issue with the inference function - around masking maybe.
I’ve really gone through it over and over and I can’t find the issue though. What’s the best way of getting a little help here?
Hi!
Lets start from the point where your code starts failing unit tests and you see a difference in your and expected output.
Also what do you mean your model is “not really learning”. It is possible to get an SOS token (although unlikely) after EOS. That is why a breaking condition in summarize(model, input_document) is to check for EOS.
The cell that checks the weights outputs slightly different weights to the one the unit test expects (~ 0.001 difference in each weight) bit I’m thinking that is significative in this case. Then also the cell that checks the ‘next_word’ function outputs a token and a word which are different to Predicted token: [[14859]] Predicted word: masses
What I mean by the model not learning is that during training the model predicts SOS followed immediately by EOS and it doesn’t change throughout all epocs I train. It means there’s something wrong with my implementation of course.
A slight variance in weights is acceptable. So an output like Predicted token: [[8410]] Predicted word: valentin will be fine too, since the model is not trained at this point yet.
Please share your output where it gives SOS after EOS during training. We will try to catch the error on this thread as much as possible first, so that others can learn too.
Epoch 1, Loss 7.897331
Time taken for one epoch: 247.514554977417 sec
Example summarization on the test set:
True summarization:
[SOS] hannah needs betty’s number but amanda doesn’t have it. she needs to contact larry. [EOS]
Predicted summarization:
[SOS] [EOS]
Epoch 2, Loss 6.726731
Time taken for one epoch: 237.40499997138977 sec
Example summarization on the test set:
True summarization:
[SOS] hannah needs betty’s number but amanda doesn’t have it. she needs to contact larry. [EOS]
Predicted summarization:
[SOS] [EOS]
Epoch 3, Loss 6.615131
Time taken for one epoch: 236.82289576530457 sec
Example summarization on the test set:
True summarization:
[SOS] hannah needs betty’s number but amanda doesn’t have it. she needs to contact larry. [EOS]
Predicted summarization:
[SOS] [EOS]
What was the output like for the 20th epoch?
Also, something tells me that you might be on an older version of this assignment. Can you please refresh it and try the latest version? You can do that by going to the Help option.