In the hand writing 0 and 1 classification example, the structure of the classification model has 25 units in layer 2 and 15 units in layer 3. Why there is 2 layers, but not 3? And why 25 and 15 units? How do we design the structure of the model?
The architecture of NN in every project is designed mostly using trial and error, there is no one single answer, and a lot of the architecture is designed based on empirical findings, this mean test a lot and finding something that fits better to your data.
There are some guidelines that you can follow to create and design NN architectures:
Start with a simple architecture: Begin with a small number of layers and units to establish a baseline performance.
Gradually increase complexity: Add more layers or increase the number of units in each layer, and observe the impact on the model’s performance. Use techniques like cross-validation to assess the generalization performance of the model.
Regularization: To avoid overfitting, apply regularization techniques such as L1 or L2 regularization, dropout, or early stopping.
Hyperparameter tuning: Perform a systematic search or optimization over the possible configurations (e.g., using grid search, random search, or Bayesian optimization) to find the best architecture for the problem.
It’s possible that the course instructors tried different numbers of units in the hidden layers before settling on a combination of 25 and 15 units for the two layers. It seems like they found that this combination struck a good balance between model complexity and performance for this particular problem.