In the Inception motivation lecture, Andrew shows a 28x28X16 network getting convolved with 5x5x32 filter to output a 28X28x32. If the input nh and nw match the correspoinding height and width, wouldn’t a same padding be required? Or, am I missing something?
In the inception network lecture, sidetrees are shown to improve regularization. Is that because we are using the losses on the sidetrees to improve the weights of the intermediate layers and that in turn improve the performance of the overall network? kindly let me know. thank you
I think you have well summarized the idea behind the side-branches. With those additional side-branches, we effectively added new constraints on the intermediate blocks (inception modules) so that they don’t just serve the softmax in the output layer, but also a few more softmaxs in those side-branches. Since they have to serve more “bosses”, they become less free.
Consider one softmax as one model, and for an inception network that has 3 softmaxes (1 at the output layer, and 2 at side-branches), we can consider it as three models but shared some blocks and shared some training parameters. Obviously, the “three” models have different sizes, and the one that carries the softmax has the largest size.
We know that a model that has a large size is more prone to overfit, but a model that has a smaller size is less prone to that. Therefore, the other “two” models that carry the softmaxes in the side-branches are always more conservative in overfitting.
Since all “three” of them share some parameters, and since the 3 softmaxes have equal “say” in the final cost function, the side-branches have the power to restrict those training parameters shared by them, leaving only the remaining training parameters free for the output’s softmax to happily tune.