Why should mobileNet have similar performance as conventional ConvNet?

yinshan · July 6, 2021, 11:32pm

Hi, Andrew explained MobileNet saved computational effort in his lectures which makes perfect sense. However, he did not have time to explain why MobileNet should have similar performance (accuracy) to conventional ConvNet. What is the intuition here? It is not too difficult to develop something computational less expensive but to show it does not reduce performance is a much more difficult matter. Any comments? Thank you!

Yousif · July 7, 2021, 2:56am

I felt the same way. I don’t quite get whether there is a disadvantage to splitting the convolution into two steps. Because if there wasn’t, why even use anything else? since it’ll just save in resources. The fact that the splitting is not featured in other networks makes me think that there is a disadvantage it and Andrew didn’t mention it.

yinshan · July 14, 2021, 6:50am

Are mentors monitoring this forum? Or is this a place for students to exchange ideas among themselves?

jonaslalin · August 12, 2021, 8:29pm

Hello @yinshan and @Yousif,

Great question! Sorry for the late reply. Sometimes, it is worthwhile bumping a thread if it becomes stale because it might happen to fly under the radar of mentors

The depthwise separable convolutions reduce the number of parameters in the convolution. As such, for a small model, the model capacity may be decreased significantly if the 2D convolutions are replaced by depthwise separable convolutions. As a result, the model may become sub-optimal. However, if properly used, depthwise separable convolutions can give you efficiency without dramatically damaging your model performance.

The key take away, again, is:

The depthwise separable convolutions reduce the number of parameters in the convolution.

As an exercise, make up a toy example and calculate the number of params for each method. Prof Andrew Ng demonstrated the number of multiplications, but not the number of params. It is a good exercise and might help you understand even better.

Yousif · August 13, 2021, 6:36am

Thanks for the reply.

Yeah, true. I have a better understanding now. It makes me think that while there is a disadvantage to sub-optimizing it, there is an advantage because you can use different networks that researchers have developed and you can sub-optimize them in a way to not mess with their over all architecture but save on resources. You’re only tweaking the way it computes convolutions.
(I am hypothesizing)

jonaslalin · August 13, 2021, 6:48am

My thoughts:

I agree, given a fixed computation budget, you might not be able to have a model with enough depth as you would if you used depthwise separable convolutions. Albeit, you will have fewer parameters, but at least parameters that can learn both low-level features and high-level features in deeper layers. Without ordinary convolutions, you might not have the computational budget to achieve the architecture you hypothesize performs the best for your task on the device you want. MobileNet then comes to the rescue. For state-of-the-art model performance, maybe another model, ordinary convolutions, unlimited computational budget, etc., will do better.

jonaslalin · August 13, 2021, 7:42am

Hi again @yinshan and @Yousif,

I wanted to give the pen and paper exercise a go

Standard convolution

has (3 \times 3 \times 3 + 1) \times 128 = 28 \times 128 = 3584 parameters.

Depthwise convolution

has (3 \times 3 \times 1 + 1) \times 3 = 10 \times 3 = 30 parameters.

Pointwise convolution

has (1 \times 1 \times 3 + 1) \times 128 = 4 \times 128 = 512 parameters.

Depthwise separable convolution

thus has 30 + 512 = 542 parameters, compared to 3584 for the standard convolution.

Moreover, 542 / 3584 = 0.15.

Hence, the depthwise separable convolution only has 15\% of the number of parameters of the standard convolution (for this example)!

Thanks to Kunlun Bai for the amazing graphics:

Topic		Replies	Views
Why MobileNet v1 architecture Convolutional Neural Networks coursera-platform	3	596	May 12, 2021
Question about mobilenet AI Discussions	1	46	May 20, 2023
Confusion about MobileNet Convolutional Neural Networks coursera-platform	1	417	July 26, 2023
MobileNets - Tradeoff Structuring Machine Learning Projects coursera-platform	4	581	May 30, 2022
Mobilenet convolution calculation vs standard Convolution Convolutional Neural Networks coursera-platform	2	390	October 25, 2023

Why should mobileNet have similar performance as conventional ConvNet?

Standard convolution

Depthwise convolution

Pointwise convolution

Depthwise separable convolution

Related topics