I have a question about the structure of the CNN shown in the course.
Why do we need another fully connected layers after 5x5x400 filter? Because they are just the same for me (outputs are 1x1x400). Do they have anything related to the sliding window method? Could we just delete the 1x1x400 FC and directly connect to the last FC layer?
Thank you so much!
Sometimes we have only one fully connected layer at the end, sometimes more. How many and how many nodes? They should be treated as any other hyperparameter search. Try it out on a devset and pick the model that gives you the best result. Maybe you find that 2 FC gives you the best model. You have many more weights, which can learn patterns in your data and help with better predictions.