Week 2 : MobileNet

Because they have chosen to have 5 of the 1 x 1 x 3 filters at that layer. That’s the way convolutions always work, right? The filters match the channels of the previous (input) layer and you get to choose how many of them you have, which then determines the number of output channels. Or if the question is just “why did they pick 5”, the answer is “because” :grin:. That’s also the way it always works: you have to try things to figure out what works. If the model underfits, then you add more layers and/or more channels per layer. If it overfits, then you try fewer layers and/or fewer channels or you add regularization. The developers of MobilNet probably spent some serious time fiddling with different architectures to come up with what works and now we get to benefit from what they learned in the process.

1 Like