How can LSTM or GRU decided what to forget or remember?

Hi, I was wondering how we can actually set which data to be fed to LSTM or GRU.
I am basing my inquiry on first week’s course, GRU and LSTM, but mainly lecture about GRU.

For instance, if there is a sentence : “the cat/cats , which is… , drinks/drink milk”
Due to vanishing gradient problem, RNN cannot handle this plural/singular matter.
So at the lecture with GRU,
Andrew Ng said that we can set update gate to 1 for “cat” and maintain update gate to 0 for words in a sentence in between commas, and we set 1 for “drinks/ drink” so that at the end memory cell can contain the info of noun and verb and learn the grammar itself. And he added that we can easily set update gate to 0 if (Wu[c,x] +bu) is negative since sigmoid will then return value close to 0.

My question is, how do we let it end up negative? Isn’t it algorithm’s own calculation that is conducted automatically? Or do we manually hardcode the update gate to 0 on each inputs that we want to mute? i thought algorithm learns itself what to forget or remember, but I am unsure if i understood correctly and i still wonder how LSTM or GRU decides what to keep(update) and what to forget.

It might be because I did not correctly undetstood the fundamental principle of algorithm calculation. So i will re study, but can you kindly explain how algorithms decide what information to keep or ignore?


Did you find an answer to your question?

I have the exact same question! It would be helpful if anyone could answer this!

Have you found an answer to this question yet? I think many students have the same question trying to figure out what Andrew meant. He didn’t mean annually set the gates to 1 or 0. Please note that every gate has weights (e.g., Wu). We train the GRU or LSTM models, and the gates will automatically function as a gate, knowing when to be 1 or 0 (actually can be any values between 1 and 0). Hope this helps though there are a lot of details to be explained and this is only a summary.