I searched the threads and I didn’t get a clear answer. I understand that mobilenet uses a more computationally efficient style of calculations to achieve the same shape output.
But what isn’t clear from the video lecture is are the actual values of the output matrix the same as a standard single step conv2d layer?
If they are, does TF do the more computationally efficient calculations for us in the background?
If they aren’t, what is the trade off for the more computationally efficient calculation?
The primary benefits of TensorFlow are:
- Creating a model is pretty much just the process of stacking layer objects.
- You don’t have to write any of the code for backpropagation, and
- You don’t have to manage the training process in detail.
You can get equally good predictions either way.
I would not guess that the models are exactly equivalent, though. And they don’t have to be.
Right. But TF is just the platform in this case: it is simply implementing the architecture that we tell it to with our specifications. So the computations will be whatever is implied by the layer operations we define. You’re right that the MobilNet and traditional Conv networks can both create functions that map from a given input shape to a particular desired output shape. But the functions that you get in the two cases are different. In the MobilNet case, the functions are more efficient because they have fewer parameters. That results in fewer computations to get from the input shape to the final desired output shape. So at one level, the functions are clearly different, but the interesting question is whether the more efficient MobilNet architecture can be trained to be as powerful or almost as powerful as the “full” convnet that produces the same output shape. It’s been 3 or 4 years since I really watched all these lectures in detail, so I forget exactly what Prof Ng says on that point. My guess is that there is no way to prove mathematically that you can get the same function with the simpler more efficient network. Both networks would be trained on the same data with the same cost function, so you would hope that you also get a workable solution from MobilNet. Apparently MobilNet does work, so that seems to suggest that even if the function is not as powerful as you could get with the more expensive architecture, it is “good enough” for some problems. The point being that you get an application that you can run on your cell phone, whereas that wouldn’t be possible with the deluxe model. So the results must be useful at least in some meaningful cases otherwise Prof Ng would not be telling us about it. It might be worth a quick scan of the MobilNet paper(s) to see if they address this point. You’d expect them to say at least something on this general question.