LSTM Training Process

Hello, I am interested in the training process of LSTM layers. We have implemented it from scratch in Sequences Course from DeepLearning.AI. But I have not cleared one question for me. As we train LSTM layers, when we obtain derivative with respect to cell state “c”, for the last layer we are using all zeros for “cell state’s derivative” coming from next layer (abstract). Whereas, for LSTM layers coming before the last layer, we can use “cell state’s derivative” coming from preceding layers as they are backpropagated through hidden states. Thus, this process creates confusion for me. My question is : Will this “difference between last layer’s cell state’s derivative and cell state’s derivatives for other layers” create bias in weights’ update process from layer to layer.

Were you able to resolve your doubt about this issue?

Unfortunately, Not…
I am trying to find scientific answer to this question.
But anyone has not answered yet…