In week 3, within Convolutional Implementation of Sliding Windows Video, Andrew describes how to convert fully connected layers into convolutional layers.
In later videos, the motivation for this becomes clear when we do this for multiple “slides” that some of these computations are shared leading to computational efficiencies.
However, if there was only 1 window i.e. 1 pass, does convolutional approach offer computational benefit over the Fully Connected dense layer approach?
Bump, same question here. It seems that the number of multiplications for both cases (conv vs FC) are the same for the 1x1x400 window size.
My take is when we have larger windows, instead of having multiple passes for each window, the convolutional approach allows us to process them in 1 pass. That’s probably where some of the speed ups come from?
During training, a ConvNet produces only a
single spatial output (top). But when applied at test time over a larger image, it produces a spatial
output map, e.g. 2x2 (bottom). Since all layers are applied convolutionally, the extra computa-
tion required for the larger image is limited to the yellow regions. This diagram omits the feature
dimension for simplicity.
Sorry to tag you directly, do you have any insights on this Paul? @paulinpaloalto