A hopefully usable diagram for back-propagation in a 3-layer network doing logistic regression, back to layer 3

As a followup to the diagram for feed-forward computation in a 3-layer network doing logistic regression, a first diagram for back-propagation in a 3-layer network doing logistic regression, but only back to layer 3.

This was amazingly hard to do :scream: Many diagrams were discarded until a good approach was found. As were many pages of computing derivatives and thinking about whether the symbols say what they should say.

Enjoy!

Diagrams for back-propagation to layer 2 and layer 1 are currently in the works.

The “graphml” raw diagram can be found here:

2 Likes

Hello, @dtonhofer,

I am glad that I have not missed this.

I think the diagram is getting more your-own-style because you are introducing new symbols like \rho, K and \eta that the DLS, I believe, has not used. However, I do not mean there is any problem with that :wink: :wink: .

I have not gone into every bit of it but I spotted two things:


  1. Personally, personally, personally, I myself would only consider J, the cost, as a function of W and b, and the data (X and Y) as fixed and not variables. In other words, I would write something like J(W, b; X, Y). I can differentiate J w.r.t. W or b but not X or Y, so I would consider K to be a space of only W and b.

  2. “COLLECTED DURING FEED-FORWARD PHASE” appeared a couple of times, but I wonder what exactly were collected, because I think there is no work of differentation during the feed-forward phase. Think about this: we have feed-forward phase at inference time, but there is no need for differentiation whatsoever because there is no training at inference.

You know that, the next challenge for diagram makers likes us is that, after some weeks or months, whether we will still know how to read the diagram and what it means :rofl: :rofl:

Have a good day, David.

Raymond

2 Likes

Thank you very much, Raymond, I will look int this once my RSI abates :sweat: (I will definitely need aspirin)

For “Collected During Feedforward”, it’s just that that data is available/cacheable during Feedforward during the training phase, so one can accumulate it at that point. Differentiations would just consist in locally evaluating the derivative of the activation function on their respective z’s. It’s really reaching into the domain of implementation - how do we implement the backpropagation


Also, the latest version of the diagram for backpropagation to layer 3 and backpropagation to layer 2 are out. Layer 1 is ready but I have to verify the last step, I feel I am missing something still.

All still here:

Thanks again,

– David

1 Like

Oh! Take care, David. I am sorry to hear about your RSI. I just googled for it, but I can’t say I know how it is like, not to mention how it feels like, so please take care and I hope you will get well soon.

I think now I understand the “Collected During Feedforward” part. I think as long as the diagram is for you only, then to me as a volunteer mentor in this place, I will feel fine to see your sensible explanation.

But frankly speaking, I have not gone into them bit-by-bit because the way you present them isn’t quite the way I will understand them. To me you are breaking the operations down to the smallest unit - I mean element-wise. It is ok, but the way you label them and describe them are not really my way, so for me to go through them in detail, I will have to be very focused :grin: So unless you need help, I will just glance them. But if you do really need help, please let me know which part it is and I will focus on that.

Again, I think the real challenge is, after a few weeks, whether you will still think you like those diagrams. I think we can come back to them later.

Cheers!
Raymond

2 Likes

Thank you for your concern Raymond. It’s better now, there is some damage somewhere due to ancient accidents and it seems to be woken by the yEd editor 
 it demands very precise point-click operations :joy::grimacing:

All right, we are basically done with diagramming, this seems to be correct, unless there is something I really didn’t understand at all (this is always possible). Checked with a few pages of derivation by hand. No DeepSeek-r1.

Diagrams for backpropagation to layers 1,2,3 now exist!

Again, I think the real challenge is, after a few weeks, whether you will still think you like those diagrams.

Oh, I know about that part. :sweat:

Ok, here are the direct links to the SVG versions. Graphml and PNG are in the file tree as siblings as usual under

SVG versions

Backpropagate to Layer 3

Backpropagate to Layer 2

Backpropagate to Layer 1

Previews

Absolutely unreadable scaled-down jpg versions, but they show the overall structure (and how to continue it to the left until memory/compute runs out)

Backpropagate to Layer 3

Backpropagate to Layer 2

Backpropagate to Layer 1

2 Likes

Amazing
 seeing reading serious work full of energy is the best thing here.

:raised_hands: :raised_hands: :raised_hands: :raised_hands:

Update on 2025-02-09: This is too complex and not actionable enough (also, the W’s are transposed relativer to the course’s conventions
 ARG!)

All readers please fall back to the following:

A post with a better diagram

1 Like