Computation graph: N + P vs N x P

rmwkwok · August 31, 2023, 8:17am

Let’s try to argue that.

Backprop saves time because it avoids recomputing computed values. In other words, it avoids visiting the same node more than once.

Without backprop, for each parameter w, computing \frac{\partial{J}}{\partial{w}} requires to go through the chain of nodes from J to w, and the length of the chain of nodes is \sim N (or, the worst case is to go through all N nodes). Therefore, for all parameters, it requires a total of N\times P because (in the worst case) we go through the (whole) chain of nodes once per parameter. (N is the worst case. In practice, it is not necessarily the whole chain for every parameter, but it does not matter, because it is not about an absolute measurement but to deliver the idea that it will scale with N)

With backprop, each node only needs to be visited once, and each parameter also only needs to be visited once, so it is N + P.

Cheers,
Raymond

Topic		Replies	Views
https://www.coursera.org/learn/advanced-learning-algorithms/lecture/rhcTZ/computation-graph-optional Advanced Learning Algorithms week-module-2	1	11	July 9, 2024
Computation graph, where does the N+P instead of NxP comes from Advanced Learning Algorithms week-module-2	6	444	October 4, 2023
Backprop derivatives Advanced Learning Algorithms week-module-2	1	476	May 13, 2023
Didn't understand how gradient computation using back prop is order of N+P Advanced Learning Algorithms week-module-2	4	255	February 26, 2024
C2_W2_Computation graph (Optional) Advanced Learning Algorithms week-module-2	5	515	March 16, 2023

Computation graph: N + P vs N x P

Related topics