Hello,
I’m confused by the explanation for why I got this question incorrect:
"The question (number 10 for me) that asks: “The sparsity of connections and weight sharing are mechanisms that allow us to use fewer parameters in a convolutional layer making it possible to train a network with smaller training sets. True/False?”
I feel I got the answer correct, and the explanation has no reference to “smaller training sets”, so I’m wondering what I’m missing (or if there’s a “bug” in the question".
(I don’t want to post the answer, or any hint, so sorry for sounding vague here…)
When I tried the quiz myself twice I did not get that question, so I cannot evaluate the provided answer and explanation. But the required size of the training set is usually seen to depend on the number of parameters that are to be calibrated, though there are exceptions. For a discussion see this post.
This question knocked me down too. Less training params indeed lead to lower training set requirements.
it is mentioned in the Lecture “Why convolutions?” around 5:35
I think it is generally true that a neural network (CNN, or fully-connected) is better with more training data, because more data means we can use a larger architecture that captures more patterns.
However, I think the focus here is not about whether a NN can do better with more data, but the comparison of data requirement between a convolutional layer and a fully-connected layer.
For example, if we have an input image of 3x3 pixels, then feeding it to a 2x2 kernel (aka filter) means there are 4 trainable parameters. However, if we instead feed it (after flattening to a 1x9 input) to a fully-connected layer, that layer will require 9x4=36 trainable parameters to possibly resemble what the 2x2 kernel did. Here the difference is 36 vs 4 and such reduction has to thank weight sharing and sparsity.
It is possible for 4 trainable parameters to require less data than 36.