How are the numbers in the classic networks chosen?

As I’m watching the videos on the classic networks, I see a bunch of filter and pooling sizes and am wondering, how do they select those numbers? They seem rather arbitrary. Is their a method behind their design?

The original papers sometimes describe the experiments that were performed to get to the final designed network structures. But I believe that the answer is often “experimentation”. This is part of why Prof Ng advises using a design that is proven to be effective as a starting point. He also describes some general patterns such as steadily decreasing width/height while increasing number of channels.

1 Like

There’re no magic formulas for this unfortunately. As @GordonRobinson mentioned, you would do a transfer learning when you want to deal with CNNs in your own project and tune the last layer with your own classes for an image classification problem.

You will see that something similar happens with word embeddings in the next course. In a real-world problem you will use some that are built by some company like Facebook, why? Well…they have the resources to do so…if you want to build one by yourself you could…but it would take you several weeks if you have a PC.

My recommendation at this point is…just “take from granted” that the solution proposed by others works in terms of architecture and regularization of the network. So, take that part and adapt the rest for your problem (own dataset, custom data processing, last fully connected layer customization, etc…)

1 Like