Diagram for the differentiation of the COST function in a 2-layer ANN

dtonhofer · March 23, 2025, 2:41pm

After some additional work, I have obtained a diagram depicting the differentiation of COST of a 2-layer ANN doing 2-class discrimination, with batch processing. Everything should be straightforward but in fact no. Some sublities were encountered as the “tensors” become 4-D and then one gets confused about how to manage the 4 dimensions correctly.

First the diagram for the ANN in feed-forward mode, very straightforward:

And then the diagram for differentiation.

The upper rung, going right to left, depicts the differentiation by any η, where η stands for one of W^{[2]}, b^{[2]}, W^{[1]}, b^{[1]} (i.e. those are matrices or vectors, so we actually differentiate by any of the components of those matrices or vectors but notationally compress these individual operations into one formula). Eventually the path through the upper rung will hit actual numeric values that need no further work.
The lower rung is the return, collecting the multiplicands or performing “operations on the return path” and, going left to right, eventually ending in the sought \partial{J}/\partial{η}

For the computationally inclined, this is very similar to doing recursive descent and performing outstanding work on return.

The backpropagation algorithm actually works on the lower rung, but right to left, accumulating values as it goes. This simplifies the matrices as, for one, there is no need to for 4-D ones, 2-D suffices.

For the computationally inclined, this is very similar to doing recursive descent, but in a tail-recursive way so there is no need to ever return once one obtains the result.

So the diagram is missing the third rung depicting back-propagation proper. To be added.

I would also have liked to add a 3D diagram of the matrices to the far left, but that would make the description clearer, but there is no way to add such a thing in my all-too-simplistic drawing tool, yEd,

The image below has been down-scaled automatically on upload, making it unreadable. For the readable image, see here

rmwkwok · March 24, 2025, 1:10am

Hello, David, our diagram master!

This is how I would call it a row/column vector. For a row vector, it’s in a row, so there is only one row and could be many columns, which means a shape of (1, n).

The resolution of the other diagram was a bit low so I couldn’t read the text.

Cheers,
Raymond

rmwkwok · March 24, 2025, 1:13am

Oh! I am just reading this post, and you were also calling (1, a) a row vector, so I think it was just a typo in your first diagram here, in the box in the upper right corner?

dtonhofer · March 24, 2025, 3:26pm

Yes, this is wrong… need to fix.

dtonhofer · March 24, 2025, 5:57pm

Okay, fixed.

Also, the software has rescaled the large image down automagically. Annoying but it is what it is.

Putting up the link to github then.

Topic		Replies	Views
A hopefully usable diagram for back-propagation in a 3-layer network doing logistic regression, back to layer 3 Neural Networks and Deep Learning week-2	6	44	February 9, 2025
My reviewed diagrams for feed-forward processing in a 3-layer network Neural Networks and Deep Learning week-2 , week-3	4	26	February 9, 2025
Diagram for "normalization" (input and batch) in a 2-layer ANN Improving Deep Neural Networks: Hyperparameter tun week-3	0	10	March 22, 2025
Backpropagation dataflow diagram for NN with 2 layers, performing 2-class classification Neural Networks and Deep Learning week-3	3	24	February 9, 2025
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks Neural Networks and Deep Learning	2	675	May 17, 2023

Diagram for the differentiation of the COST function in a 2-layer ANN

Related topics