GRU unit for RNN

Two things are unclear to me after the video on GRU unit:

  1. When using GRU units, does the whole RNN consist of GRU units? If not, where do GRU units appear?
  2. If the activation produced by GRU is the same as the memory, i.e. a=c, then the next unit might have no information about the previous word in the sentence, which makes no sense. What am I missing?
  1. A NN can consist of multiple types of recurrent layers like GRU / LSTM. RNN layers are added after the embedding layer if one is present. You can read more here. A single NN can be made of multiple types of RNN layers like an LSTM and GRU.
  2. C^{<t>} uses C^{<t-1>} and \widetilde{C}^{<t>} to decide whether to keep the candidate from current timestep or remember the past. So, one cannot jump to a conclusion that C^{<t>} == C^{<t - 1>}