How is the fully connected layer in CNN usually designed?
Are there any rules in the design of fully connected layers.
How many hidden layers does it usually have?
thanks
I think the only rules are that the number of inputs is what you get by flattening the output of the last Conv Layer or Pooling Layer and that the output is determined by what the network is classifying. If it’s a multiclass classifier, then the number of output neurons is the number of output classes. Beyond that, it’s just a question of determining what works for whatever the particular problem is that you are trying to solve. Prof Ng gives a number of examples in Week 1 and Week 2 that have several Fully Connected layers at the end that look similar to one of the FC networks we saw in C1 and C2: the number of neurons decreases as you proceed through the FC layers. You can also examine existing networks that are known to work. It is especially helpful if you can find one that is doing something at least similar to the problem you are trying to solve.
Hello, Good day
Even I am trying to understand more on Fully Connected Layers that is explained in Week 1 in the Video: CNN Example at 6:30 mins of the video.
I understood well until how the 5 x5x16 is Flatten to 400 neurons
But I am trying to understand
- How 120 in Fully Connected3 and 84 in Fully Connected4 are determined?
- After Flatten, taking 400 neurons - what is the operation done with the FC3 & FC4 and what is the output?
Requesting your help. Thanks
Note: I am yet to start with Week2 course.
I am not aware of any hard and fast rules for what the FC layers at the end of a ConvNet look like. Suppose you were designing a feed forward Fully Connected network that had nothing to do with a ConvNet. What are the rules there for how many layers you need and how many neurons there would be in each layer and which activation functions you should use? There aren’t any hard and fast rules, right? The methodology is that you start by trying to find an existing solution that worked well on a problem that is as similar to your new problem as you can find. You start with that architecture and then need to see how well it works and tune and adjust the network until it does well on your problem. For the specific case of FC networks, that is all the material that was covered in Week 1 and Week 2 of DLS Course 2, right?
The same ideas apply here.
The 120 and 84 are just hyperparameters, meaning choices that Prof Ng made based on past experience of what worked well in similar cases. Then he may have further tuned them after seeing what the results of training were.
You can think of it as a Feed Forward FC net as in Course 1 that happens to take as its input the “flattened” output of the previous Conv layers. That will be distilled information about the features that the Conv layers have detected in the input images (or whatever the inputs are). Then the goal is to further process those to produce the output classification that you want: either a “yes/no” answer if it’s a binary classification or a multiclass “softmax” output if the classifier needs to identify multiply different classes.
I’m sorry if the answers above are not satisfying, but that is the reality. We learn from what others before us have done and then develop our own experience on the problems we need/want to solve. There are some cases in which you can take an existing network that has already been trained and apply it to a new problem with some further additions. That is the topic “Transfer Learning” which will be covered in Week 2 of the course. So maybe the best idea is to “hold that thought” and continue to learn what Prof Ng will teach about Transfer Learning as well as additional pre-existing ConvNet architectures like Residual Networks in Week 2. The state of the art at this point is very rich and there are lots of “worked examples” to learn from and use as the starting points for creating your own new solutions.
Thanks a lot for your support and time. Above information was helpful, I will also go through the above reference.