In Module 4, it says ConvNet backprop is out of scope of the course because it’s complicated and also because the current framework supported. Why is it complicated? Does it use derivative and the math explained in neural network?
Regards.
In Module 4, it says ConvNet backprop is out of scope of the course because it’s complicated and also because the current framework supported. Why is it complicated? Does it use derivative and the math explained in neural network?
Regards.
Hello @gmazzaglia,
It does use derivative. Andrew explained the math behind the fully connected (FC) layer, but we cannot simply copy it over to Conv because their mechanics are different: in a FC, each weight value applies to exactly one input value while in a Conv, each weight value applies to multiple input values. This introduces a new “shared weight” concept to handle when deriving the backprop equations for ConvNet.
Is it complicated to explain within the scope and goals of the course? Is it complicated to learn? That depends, because our learners come from a wide range of academic backgrounds.
However, there is an optional section in the first week 2 (edit: week 1) assignment where you can see an implementation of the Conv backprop.
Cheers!
Raymond
It’s actually the first assignment in Week 1 of DLS C4 (called Convolutional Model Step by Step) that has an optional (ungraded) section that steps you through constructing the backprop functions for a conv layer and a pooling layer. Then you would just need to paste all that together for all the layers. Professor Ng’s larger point is that here in DLS C4 we have graduated to using TensorFlow for all our real implementations and it (like PyTorch or any of the other major ML frameworks) handles back propagation for us. So we don’t really need to write that code ourselves, but it is useful to understand the fundamentals of how it works to inform our intuitions about how training works for ConvNets.
Please read about gradient tape which handles backprop automatically.
Oh, yes! It’s week 1. Thanks, Paul!
Maybe it’s worth saying a bit more to answer this part of the question.
The math is fundamentally the same as in the Feed Forward Network case that we saw in DLS C1. You have a huge composite function and you take the derivative of each function in the series and then use the Chain Rule to construct the partial derivatives at each layer of the final error w.r.t. the parameters of that layer.
The only thing that makes it more complicated is that there is a lot more variability in the architecture of a ConvNet than there is in a simple Feed Forward net. In the C1 case, all that can change is the number of layers and the activation functions. But now in C4, just look at all the different architectures we seen in the last 4 weeks: ResNet, U-Net, YOLO, Siamese networks. Eeeek! So writing the code is going to require a lot of conditional logic or else you have to create a custom implementation for each architecture.
We can understand the fundamentals of how to take the derivative of a conv layer or a pooling layer in the C4 W1 A1 assignment, which gives us the intuition about what is different here. Then the question is whether it’s really worth it to build all the scaffolding to handle the architectural variability or just let TF handle it for us. ![]()