Which Convolutional Network is Better?

Throughout the Machine Learning and Deep Learning specializations, generally I noticed that we first learn a model and then systematically improve it by learning additional aspects and tunings of hyper parameters, so by the end we have an idea of what is the best tool to use.

For the Convolutional NN on the other hand, in week 2 Andrew seems to present several different architectures and I am left not really knowing what to take from it. Is there one particular Conv network that I can focus on using and improving because its better than the others or are they all just different and up to how I feel that day?

I’m trying to figure what is the metric to determine what is the best tool to use. Is it the case that Mobile Net is better than all the others presented (e.g. ResNets, Inception, etc.) because it achieves the same at a much lower computation cost?

@Antonio_Furtado I believe it will depend on a few factors like

  • Type of task (Detailed image recognition of processing task may need a more complex and deeper network)
  • DataSet Size and Quality (If dataset is small, we may not need very deep,networks this a simpler pre-trained model will be sufficient
  • Computational Resource (Deeper network need more memory and computation thus rise of cost)
  • Performance Metric (Different model which are open sourced and pre-trained shine in certain metric like speed, accuracy, memory usage etc)

The list can goes on, my humble opinion is to test out different architecture and see which work meet your requirements and use case by comparing their performance and tradeoff. And also depending on if you are doing it in real production use case or for research, the factors to determine will again be different. Hope this help a bit.

3 Likes

My interpretation is that is exactly what Prof Ng is doing in C4, but maybe the only difference is that the steps you need to go through to achieve the final results are just too numerous and complex. So he presents us with different complex architectures like Residual Networks, MobilNet and EfficientNet as “finished products”. But in each case, he does discuss the motivations and explain why the architecture was chosen, e.g. in the lectures on Residual Nets he goes over how the “skip” layers work to mitigate exploding and vanishing gradients. Then for MobilNet, he goes in detail through all the “network in network” and depthwise separable convolutions to explain how they work.

Unfortunately I don’t think there is a “metric” that you can apply to choose a model a priori. It is more that you use the higher level concepts that @chuaal described above. The metrics are applied a posteriori: the combination of compute resource and final accuracy that a model delivers on your problem.

3 Likes

Thanks guys. Makes sense.

I concur with and wholeheartedly endorse the replies above. However, you might be interested to know that there is also an emerging field of research where the optimal architecture is discovered, not by human trial and error I mean insight and experimentation, but by automation.

See Neural architecture search - Wikipedia

and related topics. HTH

2 Likes

Thanks very much for the link. Yet again we learn the “meta” lesson that the SOTA never holds still for very long in this space. :grinning: