I completed this but wanted to make sure I understand it right ! Please correct me know if I am mistaken somewhere.
Step 1 : it calculates dL/dA using some new formula of np.divide
Step 2 : We calculate the gradients for Lth layer. Since the last activation function is sigmoid we calculate it using linear_activation_backward with Lth later cache.
- in this step we calculate dA[L-1], dW and db for Lth layer
- caches are basically the values of A[l] and Z[l] that we have stored while calculating the forward propagation.
Step 3. First we loop from L-1 to 1 traversing through the layers in reverse.
- Then we calculate A of previous layer for each l (small L) using previously calculated A[l] cache for that particular layer and activation function “relu”
- Then we keep storing all the calculated dA[1 … L-1] and dAL
- dW [1…L]
- db [1 …L]
- then we return this so that we can use this gradients to update in each iteration later.
- This function does one backward propagation from L to 1