Week 4 Exercise 9 - Backpropagation, L_model

Hi, Code with Africa.

We do Backprop to propagate the total loss back in the NN (Neural Network) in order to identify how much of the loss every node is responsible for. It also helps in computing the derivatives of the cost w.r.t parameters that we have used while implementing the forward pass. Here, we mainly implement the ‘chain rule of calculus’.

In the instructions given in the notebook, it is clearly mentioned on how are you building a neural network:

Reminder: The general methodology to build a Neural Network is to:

  1. Define the neural network structure ( # of input units, # of hidden units, etc).
  2. Initialize the model’s parameters
  3. Loop:
    • Implement forward propagation
    • Compute loss
    • Implement backward propagation to get the gradients
    • Update parameters (gradient descent)

Now, your query on L, which simply signifies, a sort of vector output of the loss function and simultaneously, J is the average of the values of L across the samples that we have used in training this model. And that’s where, the factor of 1/m gets its significance in the given formula:

𝐽=−1𝑚∑𝑖=1𝑚(𝑦(𝑖)log(𝑎2)+(1−𝑦(𝑖))log(1−𝑎2))

To your second query on why we start with hidden layers: we do the same implementation in the backward what we had used in the forward direction using the hidden layers where caches have been generated as (linear_cache, activation_cache) for computation purposes. We can say, Backprop stands in opposition to forward prop.

Backprop is the essence of any neural network training that fine tunes the weights of a neural net based on the error rate which is also termed as loss.

Check this thread that discusses about functioning of backprop in detail. You will get a very clear idea on how it is implemented for : linear_activation_backward , which then calls linear_backward , relu_backward and sigmoid_backward.