I don’t fully understand how back propagation works. What I understood (they may be wrong) was as follows: 1 - All operations taking place in Label neurons are obtained by derivatives respectively to find the w and b derivative of that neuron, so not all derivatives are taken individually for all neurons, for example the derivative of J with respect to a can be used for all neurons, and this reduces the overall transaction. 2- Loss function is specified only for outpu label. With the back propogation, the w and b of the output label are determined. Labels before the Output label update their w and b because they are linked to this w and b.
As I said, what I understand is probably wrong and incomplete. Is there anyone who can explain the back propogation to me?
don’t worry too much about the details of the maths because the main idea is I hope you to see how gradients are passed back instead of how gradients are computed. Also, this is just one video of a series of lectures, so it is normal you can’t follow everything, but I have extracted a slide from the video and explained in below how I am going to use it to respond to your questions.
in that video, whenever you see a symbol d<something>, it means \frac{\partial{J}}{\partial{<something>}}. For example, da^{[2]} = \frac{\partial{J}}{\partial{a^{[2]}}} which is the gradient with respect to the output of the second layer.
I have taken the slide below from that video and added some annotations to aid my response to your question:
I hope to develop the answer from your words, so I have added some text to it to clarify some meanings.
If you would allow me to summarize it, you have made 2 points which I very much agree with.
Not all the gradients for all of the neurons in all layers were computed at once. Instead, we computed the gradients for the neurons in the output layer first [Part B], and it can be reused to further compute the gradients for previous layers’ neurons [Part C].
The cost function J is defined to compare the labels and the predictions from the output layer [Part A].
We should note that we only have labels for the output layer, but there is no “label” for any hidden layer, but I agree with your words in the last sentence of point number 2 that the gradients for the hidden layers’ parameters were linked to the output layer [as illustrated by the green arrow that point from Part B back to Part C].
Hello @rmwkwok
Thank you very much for your excellent answer. I totally get the point. Since only the output function has a label, the loss function is valid only for this layer, and the other layers are connected to each other with back propogation and parameters from this layer. When I asked this situation to chatgpt, the answer he gave me was to say that the loss function is taken into account for all layers and the result is considered from the total, I was a little confused and thought I should open a question. You said that I shouldn’t get too hung up on these math operations, so I’m trying to fully understand the model because I believe it’s possible for us to develop neurons. Do you think it is possible for me to develop the properties of neurons among so many programmers? I have a desire to find solutions to AGI problems, how possible is that? (oh I guess I’ll have to start a new topic) I’m just wondering if such a thing is possible, I could fully understand the logic and maybe come up with a new architecture for neurons, your ideas are valuable to me.
I don’t see why can’t you do that. The video that I have shared comes from the Deep Learning Specialization and many learners have finished it and completed assignments that required them to build a multi-layer neural network from scratch (instructions provided, of course).
From your first post and that you could understand my reply in one-go, I don’t doubt you will manage to at least achieve the same as those learners, and perhaps even more if you develop your own set of insights from those DLS lectures and assignments. For example, in your first post of this thread, you have already shared your insights in your words which is very good job!
Haha. I think many people are on it, so that means it is not proven to be impossible yet right? Sorry that I can’t provide you with a more definite answer, because every adventure has its risk of going to a deadend or risk of taking a journey longer than anyone expected, and I don’t want to pretend that we already knew the straight way to get to a real AGI - as the lecture calls it: “do anything a human can do”. However, an adventure is fun, isn’t it?