Hi, I was wondering how we can actually set which data to be fed to LSTM or GRU.
I am basing my inquiry on first week’s course, GRU and LSTM, but mainly lecture about GRU.
For instance, if there is a sentence : “the cat/cats , which is… , drinks/drink milk”
Due to vanishing gradient problem, RNN cannot handle this plural/singular matter.
So at the lecture with GRU,
Andrew Ng said that we can set update gate to 1 for “cat” and maintain update gate to 0 for words in a sentence in between commas, and we set 1 for “drinks/ drink” so that at the end memory cell can contain the info of noun and verb and learn the grammar itself. And he added that we can easily set update gate to 0 if (Wu[c,x] +bu) is negative since sigmoid will then return value close to 0.
My question is, how do we let it end up negative? Isn’t it algorithm’s own calculation that is conducted automatically? Or do we manually hardcode the update gate to 0 on each inputs that we want to mute? i thought algorithm learns itself what to forget or remember, but I am unsure if i understood correctly and i still wonder how LSTM or GRU decides what to keep(update) and what to forget.
It might be because I did not correctly undetstood the fundamental principle of algorithm calculation. So i will re study, but can you kindly explain how algorithms decide what information to keep or ignore?