Sampling and RNNs

Well… I think you are correct if I interpret your sentences correctly:

Intuitively, the reset gate controls how much of the previous state we might still want to remember. Likewise, an update gate would allow us to control how much of the new state is just a copy of the old state.

Not quite. There are different sampling techniques (during inference). The “greedy” sampling is as you said - choosing the best one. While there are other sampling techniques which have some parameter (usually called temperature), that helps you control how “greedy” you want to be. For example, if you had the probabilities of [0.1, 0.2, 0.7], the “greedy” version would always choose the character at the third position (0.7), while others, depending on the temperature (and other) settings might sample from the distribution [0.1, 0.2, 0.7] or [0.0001, 0.05, 0.9499] or other.

And usually that (sampling) has nothing to do with training the model on some dataset.

If I understand you correctly - no.

First, update gate value is not c (c in your picture is the hidden state, or H_t in my previous illustration), but update is z (\Gamma_u in your picture, Z_t in my illustration), which influences the hidden state (but is not the same).

Second, the update gate value is calculated at every step (from previous hidden state and current input).

Third, it depends how you understand layers - I think you confuse layers vs. steps (check this post).

So, if you understand what I wrote, then the update value controls how much this step (hidden state of this step c_t) is a copy of the previous hidden state (c_{t-1}).