Course 1: Week 3 (backpropagation intuition)

Prof Ng has specifically designed these courses so that they do not require the students to know any calculus (even univariate calculus, let alone matrix calculus), so he does not cover the derivations of a lot of the formulas which involve calculus. If you would like to dig deeper and have the math background, there are lots of resources available. Here’s a local thread with a bibliography, which includes text books that cover the actual math behind all this. One book that is more math oriented is Goodfellow et al, which is listed there.

Here are some good websites that will also cover the derivations of back propagation:

Here’s a website from Cornell that covers the derivation.

Here’s a good introduction to the matrix calculus you need in order to follow the above.

The Matrix Cookbook from Univ of Waterloo is also a valuable resource for general Linear Algebra topics as well as matrix calculus.

Here are some notes from Stanford CS231n that give a good overview and insights on back propagation.

Here’s a bit deeper dive on the math also from Stanford CS231n.

Here are notes from EECS 442 at Univ of Michigan.

Mentor Jonas Slalin also covers all this and more on his website. That’s just the first page in his series.

24 Likes
【Week3】 how do I calculate the "dz[1]" ?
Week 3 update_parameters, how to compute partial derivative J
Explanation for derived gradients for LSTM back-prop?
Gradience Descent Backpropagatin Calculation
dZ[1] derivation
Derivation of dz=da* g'(z) ? or dz= a- y? how is derivation of dz[1] and dz[2] different?
Formal explanation of change of order in chain rule
A doubt on Week 3 Lecture
How we get dZ1 formula in backward propagation in a one hidden layer net
Calculating Backpropagation Equation for NN
Clarfication on Gradient descent for neural networks
W2_A2_Calculation of Partial derivatives
Why a transpose needed?
How we got derivative of dz[1]=w[2]T.dz[2]*g[1]`(z[1])
Am I the only one completely lost at this derivative lesson?
How to choose between matrix multiplication and element wise multiplication during BackPropagation in Chain Rule?
Element-wise multiplication or dot product in backpropagation
Dividing by "m" in back propagation using vectorized implementation
Trouble understanding b vector back propagation
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l]
Back propagation why do we start from dZ2 and why transpose
Please help with some hints as there is a difficuly with Graded assignment output achievement correctly
Transpose convolution backprop question
Neural networks Week4 Backprop da[l-1] proof
Can someone point me to basic for calculating dW[2] and db[2]
Derivative of Z1
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l]
W2_A2_Optimal "nudge" dx given for each node in computational graphs
C1_W4: Confused about da[l-1]
Week 3 - Please explain how we got to this backward propagation result?
Week 3,4: Why isn't 1/m part of dz^[L]?
W3 A1 | Ex-6 | Where were dZ [1] & dW[1] derivative equations introduced?
W3_A1_Ex-6_What's the link between dz[1] and w[2] equation?
Week 3 Backpropogation Derivation
Week 3 Backpropogation Derivation
Confusion in week 3 lesson for Backpropogation Derivations
Please explain $dz^{[1]} = {W^{[2]}}^{T} dz^{[2]} \times {g^{[1]}}^{'}(z^{[1]})$ in backpropogation
Derivation of backpropagation of RNN
Queries regarding backpropagation in RNNS
Backward propagation derivation
Should it have a rot180 on filter to calculate dA_prev?
Back Prop question
* element-wise operation in dZ[l]
Foundational math resources
Matrix Calculus
Partial Derivaties
Confused about Deep Network
Week 4 exercise 6.1
How did we calculate dz[2] in Backpropagation Intuition (8:34)?
FAQ: Frequently Asked Questions for all DLS Courses
Week 3: computing derivatives for shallow network
How to calculate dw(dL/dw)?
Course 1 Week 3 Backpropagation Intuition (Optional)
Week 3 - Please explain how we got to this backward propagation result?
Deep learning from a mathematical view
Relu/LRelu does not work for forward propagation in Planar_data_classification_with_one_hidden_layer
Week 4 backward propagation da[l-1] derivation
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks
Backpropagation algorithm derivation