Hello Yifu and Ajinkya,
Here’s a quick link from this post: Understanding GRU - #3 by piyush23, which can give you a broad idea on how GRU solves the vanishing gradient issue while using an RNN architecture.
DLS mentor Kic has posted a link in one of his replies:
The post aims at solving the vanishing gradient problem which comes with a standard recurrent neural network.
To solve the vanishing gradient problem of a standard RNN, GRU uses, so-called, update gate and reset gate. Basically, these are two vectors which decide what information should be passed to the output. The special thing about them is that they can be trained to keep information from long ago, without washing it through time or remove information which is irrelevant to the prediction.