Not understanding what backpropagation is ,can someone explain?

Kerem_Boyuk · March 16, 2023, 3:57pm

I don’t fully understand how back propagation works. What I understood (they may be wrong) was as follows: 1 - All operations taking place in Label neurons are obtained by derivatives respectively to find the w and b derivative of that neuron, so not all derivatives are taken individually for all neurons, for example the derivative of J with respect to a can be used for all neurons, and this reduces the overall transaction. 2- Loss function is specified only for outpu label. With the back propogation, the w and b of the output label are determined. Labels before the Output label update their w and b because they are linked to this w and b.

As I said, what I understand is probably wrong and incomplete. Is there anyone who can explain the back propogation to me?

rmwkwok · March 16, 2023, 7:39pm

Hello @Kerem_Boyuk,

To begin with, I recommend you to also watch this Deep Learning Specialization’s video on Backprop. Note that

don’t worry too much about the details of the maths because the main idea is I hope you to see how gradients are passed back instead of how gradients are computed. Also, this is just one video of a series of lectures, so it is normal you can’t follow everything, but I have extracted a slide from the video and explained in below how I am going to use it to respond to your questions.
in that video, whenever you see a symbol d<something>, it means \frac{\partial{J}}{\partial{<something>}}. For example, da^{[2]} = \frac{\partial{J}}{\partial{a^{[2]}}} which is the gradient with respect to the output of the second layer.
I have taken the slide below from that video and added some annotations to aid my response to your question:

I hope to develop the answer from your words, so I have added some text to it to clarify some meanings.

Kerem_Boyuk:

1 - All operations taking place in Label neurons (i.e. neurons in the output layer) are obtained by derivatives respectively to find the w and b derivative of that neuron (i.e. dw^{[2]} = \frac{\partial{J}}{\partial{w^{[2]}}} and db^{[2]} = \frac{\partial{J}}{\partial{b^{[2]}}}, where w^{[2]} and b^{[2]} are neuron parameters for the output layer),
[ this part described the Part (B) of the slide ]

so not all derivatives are taken individually for all neurons, for example the derivative of J with respect to a can be (re-)used for all neurons, and this reduces the overall transaction.
[ this part can be visualized by the green arrows in the slide that, dz^{[2]} were reused three times to compute dw^{[2]}, db^{[2]}, and dz^{[1]} ]

2- Loss function is specified only for outpu(t) label. With the back propogation, the w and b of the output label (i.e. neurons in the output layer) are determined.
[ this part described the Part (A) of the slide ]

If you would allow me to summarize it, you have made 2 points which I very much agree with.

Not all the gradients for all of the neurons in all layers were computed at once. Instead, we computed the gradients for the neurons in the output layer first [Part B], and it can be reused to further compute the gradients for previous layers’ neurons [Part C].
The cost function J is defined to compare the labels and the predictions from the output layer [Part A].

We should note that we only have labels for the output layer, but there is no “label” for any hidden layer, but I agree with your words in the last sentence of point number 2 that the gradients for the hidden layers’ parameters were linked to the output layer [as illustrated by the green arrow that point from Part B back to Part C].

Cheers,
Raymond

Kerem_Boyuk · March 16, 2023, 8:24pm

Hello @rmwkwok
Thank you very much for your excellent answer. I totally get the point. Since only the output function has a label, the loss function is valid only for this layer, and the other layers are connected to each other with back propogation and parameters from this layer. When I asked this situation to chatgpt, the answer he gave me was to say that the loss function is taken into account for all layers and the result is considered from the total, I was a little confused and thought I should open a question. You said that I shouldn’t get too hung up on these math operations, so I’m trying to fully understand the model because I believe it’s possible for us to develop neurons. Do you think it is possible for me to develop the properties of neurons among so many programmers? I have a desire to find solutions to AGI problems, how possible is that? (oh I guess I’ll have to start a new topic) I’m just wondering if such a thing is possible, I could fully understand the logic and maybe come up with a new architecture for neurons, your ideas are valuable to me.

Have a nice day,
Kerem

rmwkwok · March 16, 2023, 8:42pm

Hello @Kerem_Boyuk

I don’t see why can’t you do that. The video that I have shared comes from the Deep Learning Specialization and many learners have finished it and completed assignments that required them to build a multi-layer neural network from scratch (instructions provided, of course).

From your first post and that you could understand my reply in one-go, I don’t doubt you will manage to at least achieve the same as those learners, and perhaps even more if you develop your own set of insights from those DLS lectures and assignments. For example, in your first post of this thread, you have already shared your insights in your words which is very good job!

Haha. I think many people are on it, so that means it is not proven to be impossible yet right? Sorry that I can’t provide you with a more definite answer, because every adventure has its risk of going to a deadend or risk of taking a journey longer than anyone expected, and I don’t want to pretend that we already knew the straight way to get to a real AGI - as the lecture calls it: “do anything a human can do”. However, an adventure is fun, isn’t it?

Good luck and keep learning!
Raymond

Kerem_Boyuk · March 16, 2023, 9:26pm

Thank you so much for making me proud, I will be inspired by your ideas

TMosh · March 17, 2023, 4:41am

I recommend you not use a chat-bot as a reference source.

Topic		Replies	Views
How does Back propagation works? Advanced Learning Algorithms week-1	1	549	August 30, 2022
Backpropagation when using dropout and Regularization Improving Deep Neural Networks: Hyperparameter tun	5	591	February 11, 2022
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning	5	599	February 9, 2023
Gradients of Multi output models Custom and Distributed Training with TF week-1	3	574	May 22, 2022
One step of backward propagation on a computation graph yields derivative of final output variable Neural Networks and Deep Learning	1	421	July 20, 2023

Not understanding what backpropagation is ,can someone explain?

Related topics