MLS course 2 week1 - layers and units in a layer

In handwritten digit recognition with forward propagation, the model has 3 layers.
layer 1 with 25 units/neurons, layer2 with 15 units and layer3 with 1 unit.

a) How is the number of layers decided for a given classification problem ?
b) For each layer, how is the number of units chosen?
Please help understand this.

Thanks much.

The number of layers and the number of units per layer are chosen by experimentation.

The goals are to create the simplest system possible that gives good-enough results. This helps to minimize the amount of time needed for training.

Hi, @Raghavendra_Medahal

These numbers are called hyperparameters alongside the learning rate or the \alpha. the definition of hyperparameters is that they don’t have a formal way of deciding the optimal values for them to use in a model whether classification or regression.

You will always end up using trials where you will try a collection of parameters usually known as grid_params which usually has the following shape

grid_params = {
"learning_rate": [0.0001, 0.001, 0.005],
"units" : [25, 20, 10],
"layers": [2, 3, 4],
}

and use of whatever package you just try different models with different values of these params this process is called GridSearch*. see sickit-learn GridSearchCV or Keras Tuner.

Best regards,
Moaz El-Essawey

@Raghavendra_Medahal ,

To add my two cents to what @TMosh and @Moaz_Elesawey have very well explained:

The number of nodes, and even the hidden layers, are determined by the architect of the NN: by you, by me, by the person or group that is creating the NN.

There are several options:

  1. You can get started with a known model which has been created by some researchers, and this model proposes already a certain amount of layers, and nodes inside of each layer.
  2. You can start from scratch.
    2.1 Assuming you are a very experienced ML designer, may be you’ll get started with a configuration that, from experience, it is the best layout for the task at hand.
    2.2 Assuming you are a novice, you may start with your best guess. For instance, 2 hidden layers, each with 4 nodes.

In any case, you’ll design your NN, train it, and watch the results. If the results are not meeting your objectives, then you start ‘fine-tuning’ your architecture (although fine-tuning the data may be more important, but that’s another topic). At the end you will hopefully reach a design (number of layers, number of units inside each layer, etc) that meets your objective.

Cheers,

Juan

Thank you folks, for sharing your insights!

best,
Raghavendra