From week 1 assngment
Implementing this using just a Recurrent Neural Network (RNN) with LSTMs can work for short to medium length sentences but can result in vanishing gradients for very long sequences
I would like to clearlify, when I plan the NN when should I choose Attention models? From how long sequence?
1 Like
Oh, I’ve found an answer in the next reading
You can see how this will be an issue for very long sentences (e.g. 100 tokens or more) because the context of the first parts of the input will have very little effect on the final vector passed to the decoder.
Really? 100 tokens is not looks like very long amount of incomming data. Do I understand correct, that another incomming data (objects from database, for example) is included in this 100 tokens?
1 Like
A reasonable limit it says for one input is 100 tokens!
1 Like
so, I can’t pass to NN additional objects from database on more than 100 tokens?
1 Like
Yeah its better to use another type of model maybe transformers!
2 Likes