After some additional work, I have obtained a diagram depicting the differentiation of COST of a 2-layer ANN doing 2-class discrimination, with batch processing. Everything should be straightforward but in fact no. Some sublities were encountered as the “tensors” become 4-D and then one gets confused about how to manage the 4 dimensions correctly.
First the diagram for the ANN in feed-forward mode, very straightforward:
And then the diagram for differentiation.
- The upper rung, going right to left, depicts the differentiation by any η, where η stands for one of W^{[2]}, b^{[2]}, W^{[1]}, b^{[1]} (i.e. those are matrices or vectors, so we actually differentiate by any of the components of those matrices or vectors but notationally compress these individual operations into one formula). Eventually the path through the upper rung will hit actual numeric values that need no further work.
- The lower rung is the return, collecting the multiplicands or performing “operations on the return path” and, going left to right, eventually ending in the sought \partial{J}/\partial{η}
For the computationally inclined, this is very similar to doing recursive descent and performing outstanding work on return.
The backpropagation algorithm actually works on the lower rung, but right to left, accumulating values as it goes. This simplifies the matrices as, for one, there is no need to for 4-D ones, 2-D suffices.
For the computationally inclined, this is very similar to doing recursive descent, but in a tail-recursive way so there is no need to ever return once one obtains the result.
So the diagram is missing the third rung depicting back-propagation proper. To be added.
I would also have liked to add a 3D diagram of the matrices to the far left, but that would make the description clearer, but there is no way to add such a thing in my all-too-simplistic drawing tool, yEd,
The image below has been down-scaled automatically on upload, making it unreadable. For the readable image, see here