GRU relevant word to store in memory

During training process, how does GRU or LSTM decide which word in the sentence is relevant to store in memory for future use? For example let’s say our problem is sentence completion. What if the sentence is:

"My friend Daniel follows a plant based diet, so obviously the type of lunch he requested from the airline company was VEGAN
The completion is of course VEGAN. And of course the relevant words to remember are “plant based”. What makes the network remember those specific words, as a clue to the type of lunch, and not other words?

They identify words to remember using gates.

LSTM Networks

LSTMs have three types of gates:

Forget Gate: The forget gate decides which information is discarded from the cell state.
Input Gate: The input gate decides what new information is added to the cell state.
Output Gate: The output gate decides what information from the cell state is used to compute the output activation of the LSTM unit.

GRU Networks

GRUs are much simpler and use two gates:

Update Gate: The update gate determines how much of the past information (previous cell state) needs to be passed along to the future.
Reset Gate: The reset gate determines how much of the past information to forget.

All these have equations that are implemented and they decide how important each word is and whether it should be remembered or not.