Hey guys,
I have a small query in this assignment, just out of curiosity. In the L_model_backward function, we have used the linear_activation_backward function to calculate the gradients, and then we are storing them in a dictionary, grads, which we are further passing to the update_parameters function to update the parameters.
Instead of this approach, why are we not updating the parameters directly in the L_model_backward function? This will make sure that we don’t have to store the gradients in any dictionary (which could result in the reduction of a huge memory overhead), and will also save us from iterating over all the layers again (which could save us some computation time).
Is this being done in the assignment in order to make the code more modular and easy to understand, or is there something that I am missing out on, due to which we can’t update the parameters at the same time as calculating the gradients?
Hi, @Elemento. I am not a coding-efficiency expert, but what you say makes sense. And, you answered the question yourself. The assignments are written with the educational task paramount. We want learners to see the techniques broken down into the logical components so that they can be examined (i.e. understood) separately and as part of the whole, i.e. the “modularity” to which you refer. Of course, the code should be (and is, hopefully) as clean as is reasonable given that constraint. And that means, it will not always be “Pythonic.” For example, you haven’t seen an list or dict comprehensions have you?
Good insights from both contributors on this thread. My experience with this material is that the initial exercises go to some trouble to expose details of each step and computation. Next, we learn that you can leverage Python matrix-native operations to do away with explicit loops. Then, when using frameworks like TensorFlow, Keras, PyTorch you’ll find that the machinery of the matrix math is provided for you in highly optimized code that also natively supports distributed computing. The explicit for-loops, caching, separation of forward and backward propagation etc. all go away or get completely encapsulated. The code you would use for a production deep learning project looks substantially different than what you see early in these classes.