Derive backpropagation in CNN

jackchan.hk · July 26, 2022, 2:19pm

Dear Mentors/classmates,

In back propagation, our goal is to find dL/dW and dL/dB and updates the weight and bias using gradient descent. When we calculate dL/dW, we need to have dL/dZ first. So that we could apply chain rule.

dL/dW = dL/dZ * dZ/dW ???

My first question is that can we just apply chain rule while W is a tensor (e.g. 3x3x3, f x f x 3)?

My 2nd questions is that
when Z=A * x which Z is matrix and x is a vector, dZ/dx = A, result a matrix
when Z = A * X which Z, A, X are all matrix, dZ/dX = A, result a matrix

Yet, when Z = W *cross X + B, here W does NOT multiply X, it is cross correlate with X, how could I express cross correlation as multiplication, so that i could apply chain rule ? OR is there a special theorem of derivative for cross correlation?

Regards,
Jack

anon57530071 · July 27, 2022, 11:51am

If you did not take the first course of this specialization, it may be better to, at least, quickly look at the lessons for a back-propagation. There are lots of hints in there.

And, your questions are basically linear algebra related, I should start with some recaps.

Scaler to Scalar:
x \in \mathbb{R}, \ y \in \mathbb{R} : A derivative is \frac{\partial y}{\partial x} \in \mathbb{R}

Vector to Scalar:
x \in \mathbb{R}^N,\ \ y \in \mathbb{R} : A derivative is Gradient. \frac{\partial y}{\partial x} \in \mathbb{R}^N, \ \ (\frac{\partial y}{\partial x})_n = \frac{\partial y}{\partial x_n}

Vector to Vector:
x \in \mathbb{R}^N,\ \ y \in \mathbb{R}^M : A derivative is Jacobian. \frac{\partial y}{\partial x} \in \mathbb{R}^{N\times M}, \ \ (\frac{\partial y}{\partial x})_{n,m} = \frac{\partial y_m}{\partial x_n}

In the case of backprop, Loss is basically a “scalar”. So, there should be no problem to start with.

For derivative of dot product, inter product, summation, etc,… we may start with breakdown into each element to calculate partial derivatives, but here is a good summary that I also sometimes refer. It is called The Matrix Cookbook.

If you want to study math for Backprop, this and this should be a good starting point. Those cover more than Andrew’s intuitions.

Topic		Replies	Views
W2_A1_Calculating gradient descent with variables Dw and db Neural Networks and Deep Learning coursera-platform	5	1026	December 8, 2023
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning coursera-platform	5	608	February 9, 2023
Course 1 Week 3 Backpropagation Intuition (Optional) Neural Networks and Deep Learning coursera-platform	5	811	December 18, 2021
Clarification grad. descent Neural Networks and Deep Learning coursera-platform	2	539	May 25, 2021
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning coursera-platform	4	789	May 27, 2023

Derive backpropagation in CNN

Related topics