I understood how 227 to 55 is getting .
But can you please tell me how we are getting 3 to 96 and 96 to 256?
2. I understood we are not changing Nc in ma pooling but can you please tell why?
3. Why 1313384 is the same?
4. How it is getting from 9216 to 4096 and then to 4096 then to 1000.
Thank You.
@paulinpaloalto can you help.
Thank you in advance.
-
For the channels on the Conv layers, it is the same answer I gave you yesterday: the number of output channels from a given Conv layer is always a choice that you make. It is determined by how many filters you specify for that layer. The choice is informed by experimentation. So Prof Ng is showing us an architecture that he and other people have used that turns out to work on a range of image classification problems. This knowledge of “what works” was probably gained from quite a lot of experimentation.
-
But pooling layers are fundamentally different. They are defined to work “per channel”, so the number of channels is preserved. It just reduces the h and w dimensions. That is just how they are defined.
-
The padding and stride for the h and w dimensions of a Conv layer is always a choice: you can do “same” padding or no padding with the stride you choose. The reason behind the choice is whether they want to reduce the h and w dimensions at that point in the architecture or not. That is determined by testing what works. You may have to adjust the number and shape of the layers based on how things work with your problem.
-
For Conv nets that are implementing classifiers, it is a common architecture to finish with a few Fully Connected layers. Those are exactly the same as we studied in Course 1. Here again, the number of neurons in each layer other than the first one is a choice (a hyperparameter): you generally reduce them in each layer, but it’s a choice based on what works. For the first FC layer, the input dimension is determined by how many total neurons there are in the last previous Conv or pooling layer: you just “unroll” or “flatten” the tensor into a vector to get the inputs to the first FC layer. 6 * 6 * 256 = 9216. Of course the number of neurons in the output layer is determined by the number of label classes you have (1000 in this example).