C4 W2 Why ResNets Work?

In the lecture video, Prof. Ng mentions that for RestNets to work we must have the same dimensions of whatever 2 outputs we are adding together. So for example, in a[l+2] = g(z[l+2] + a[l]), we must have same dimensions for a[l] and z[l+2], which means that both the values nh and nw should be the same (width and height of output volume). And for that purpose, we use padding.

However, we know that the output is not 2 dimensional and we also have the 3rd dimension nc (number of filters). Thus the shape of a[l] and z[l+2] is nh x nw x nc. Now if we use padding to keep the dimensions same, it won’t affect the number of filters, the 3rd dimension, if I’m correct. So is it right to assume that to keep dimensions of a[l] and z[l+2] same, we may use padding for the first 2 dimensions, but for the 3rd dimension, manually keep check for the dimensions to be the same. i.e. to keep the number of filters same?

Yes, you are correct that the decision of the number of filters to use at each layer is up to you. So if you want them to match, you have to make them match by how you make those choices.

You could argue that it’s the same with the padding in the sense that you know what the goal is: to have them match by the time you get to the point in the architecture where you need the shapes to be the same. So for the h and w dimensions, you have to make sure that comes out the same as well. It’s actually simpler to manage the “channel” dimension.

Got it… Thank you again for answering my query.