Week 1 assignment 1 backprop; calculating dx

HamedGholami · August 7, 2021, 2:56pm

Hi and thanks for reading my question.
I don’t understand why in backpropagation steps we compute derivatives with respect to input x.
we never change the sentence words or one-hot representation of them, do we? so why do we calculate derivatives with respect to them?
thanks again.

arosacastillo · December 2, 2021, 12:01pm

Hi Hamed,

Thanks for your question. According to the way RNNs are designed, the backpropatation needs to cover all parameters of the cell on the way back, as you can see in this picture:

This includes the derivate of the time sample x(t). The same as in the forward propagation you are propagating forward the processed x(t), you need to do the same propagation backwards but with the dx(t). At least this is how I understood it.

Hope it helps.
Happy learning,

Rosa

David_Farago · September 21, 2022, 5:24pm

Thanks @HamedGholami for your question, I was wondering about that, too.

@arosacastillo, why does backpropatation need to cover ALL parameters of the cell on the way back? Unlike da_\text{prev}, dx^{<t>} is not used later on anywhere, is it?

My guess is, that it is useful for debugging: some XAI approaches seem to use input gradients to create inputs leading to most confident outputs, e.g. for specific classess, in the hope to gain some insights from the created inputs. Furthermore, looking at gradients seems to be a debugging approach, but I have no idea how it helps (besides seeing vanishing or exploding gradients of weights) and whether gradients of inputs also help.

arosacastillo · October 11, 2022, 3:58pm

Hi David,

Sorry for the late reply. I am not an expert in RNNs but I will share my thoughts here about your question. Indeed in the architecture shown above the dx(t) seems not to be used, however there could be other RNNs architectures where those values are used.

Best wishes,

Rosa

Yar-Nikolaev · December 10, 2023, 10:30pm

Would be nice to hear more on this. I’m also pondering why we calculate dx_t if its never used anywhere.

Apart from debugging, one additional point came from somewhere - that e.g. in case of Generative approaches - the starting inputs will matter too.

But still would be great to hear more substantiated opinion from experts

Topic		Replies	Views
Why calculate dxt and dx? Sequence Models week-module-1 , coursera-platform	3	257	January 16, 2024
One question about back propagation of RNN Sequence Models coursera-platform	2	524	December 3, 2021
RNN Assignment-1 Week 1 Sequence Models coursera-platform	2	685	April 21, 2022
Week1 Assignment1 Backpro question Sequence Models coursera-platform	3	593	August 16, 2021
Queries regarding backpropagation in RNNS Sequence Models week-module-1 , coursera-platform	1	20	January 1, 2025

Week 1 assignment 1 backprop; calculating dx

Related topics