C3W1 Assignment-Overview: why was this GRU model chosen?

The GRU model to be used, described in the Overview cell doesn’t use the y output of the GRU “cells”, but only the residual, hidden h. Why is that? Why and how was the decision made of using this model? Thanks.

Hi @Carlos_A_Lang-Sanou, great question.

The GRU model used does not use the output Y of the GRU cells, but only the hidden state H because the hidden state H already contains all the information needed for prediction. This is because GRU is a recurrent model that stores and transmits important information over time steps, and often the final hidden state is enough to represent the context of the input sequence.

3 Likes

Got it. Thanks for the kind and clear clarification.

1 Like