Why FC layers are always there at end in CNN?

I want to know what are the purposes of the FC layer in CNN? Why are they always kept at the end in CNN architectures? What exactly is their role and what type of profit are we gaining from it. We can directly attach the softmax layer at the end of convolution(by flattening) without any FC layer? What made the CNN architect add it? What exactly is the intuition and what purpose they are serving in CNN

There are a number of layers (pun intended) to the response here:

Prof Ng discussed this in the lecture where he introduced this architecture, didn’t he? It’s been a couple of years since I watched those lectures, so I don’t remember exactly what he said. Maybe we should both watch them again. :nerd_face: But I think the basic intuition is that if the purpose of your CNN is something like binary or softmax classification, it makes sense to use Convolutional layers early in the network to extract information that is more geometric or spatial in nature. That’s what convolutions are really powerful for. But once you’ve extracted that level of information in the form of “high level features” like “I’ve seen something that I think looks like a cat’s tail”, then putting all those together with a few FC layers can be a powerful and efficient way to distill the final answer down to “cat”, “dog”, “horse”, “elephant” or “zebra” or whatever your final softmax classes are.

The next level point is that there aren’t always FC layers in a CNN. It all depends on what you are trying to do. If you’ve only been through week 1 of the course, you might want to just “hold that thought”. In Week 3 and Week 4, you’ll see architectures like U-Net and YOLO where the outputs of a ConvNet are multi-dimensional arrays, not just 1D classifications. In those cases there are no FC layers in the architecture.

The lecture “What are Deep ConvNets Learning?” in Week 4 is also an important thing to watch in terms of gaining intuition about how CNNs work (e.g. that idea I threw out earlier about what the output of a neuron might be telling you came from that lecture).

Then maybe the final level of the answer is to say that this is an experimental science. You can take a particular problem that is trying to do a softmax classification and then try your idea of just flattening the last conv layer as input to the softmax. Then try fiddling with various combinations of FC layers added at the end. How does the performance of the various approaches compare both in terms of the accuracy of the results and the total memory and CPU cost of the training? In other words, if you think you have a better idea, you can prove to yourself whether it’s actually better or not. If your method is better, you should publish the paper!

1 Like