How do we take nc=Five in the final output?

Thanks!

Because they have chosen to have 5 of the 1 x 1 x 3 filters at that layer. Thatâ€™s the way convolutions always work, right? The filters match the channels of the previous (input) layer and you get to choose how many of them you have, which then determines the number of output channels. Or if the question is just â€śwhy did they pick 5â€ť, the answer is â€śbecauseâ€ť . Thatâ€™s also the way it always works: you have to try things to figure out what works. If the model underfits, then you add more layers and/or more channels per layer. If it overfits, then you try fewer layers and/or fewer channels or you add regularization. The developers of MobilNet probably spent some serious time fiddling with different architectures to come up with what works and now we get to benefit from what they learned in the process.

1 Like