C2W3_Lab_01_Model_Evaluation_and_Selection - Classification

Macwin_Savio_Dsouza · April 28, 2023, 1:53pm

In this notebook in the title, in the Classification section (last part), the best model was the second neural network. I don’t understand why the second neural network performed better than the third neural network. I thought a larger neural network can in most cases provide better performance. Please advise.

saifkhanengr · April 28, 2023, 1:58pm

The larger model can overfit, performing well in training but poor in tests. Large models can have vanishing and exploding gradient problems too. So, it cannot be guaranteed that the larger model is better. Check the accuracy or loss (on testing) of the 2nd and 3rd models. Which one is better?

Macwin_Savio_Dsouza · April 28, 2023, 2:58pm

Here are the evaluation metrics from the notebook:
Model 1: Training Set Classification Error: 0.44167, CV Set Classification Error: 0.47500
Model 2: Training Set Classification Error: 0.11667, CV Set Classification Error: 0.07500
Model 3: Training Set Classification Error: 0.41667, CV Set Classification Error: 0.47500

As you can see all the models have similar training errors and cross-validation errors. So it may not be an overfitting problem.
Regarding the the statement ‘Large models can have vanishing and exploding gradient problems too’ can you help me understand this more? or a link that goes over this?

Below is the function from the notebook that defines the three models:
def build_models():

tf.random.set_seed(20)

model_1 = Sequential(
    [
        Dense(25, activation = 'relu'),
        Dense(15, activation = 'relu'),
        Dense(1, activation = 'linear')
    ],
    name='model_1'
)

model_2 = Sequential(
    [
        Dense(20, activation = 'relu'),
        Dense(12, activation = 'relu'),
        Dense(12, activation = 'relu'),
        Dense(20, activation = 'relu'),
        Dense(1, activation = 'linear')
    ],
    name='model_2'
)

model_3 = Sequential(
    [
        Dense(32, activation = 'relu'),
        Dense(16, activation = 'relu'),
        Dense(8, activation = 'relu'),
        Dense(4, activation = 'relu'),
        Dense(12, activation = 'relu'),
        Dense(1, activation = 'linear')
    ],
    name='model_3'
)

model_list = [model_1, model_2, model_3]

return model_list

saifkhanengr · April 28, 2023, 3:04pm

Which one is better? Hint: Good model error is small.

You will learn more about it in DLS course 2. I don’t remember if MLS covers this or not. Maybe.

Macwin_Savio_Dsouza · April 28, 2023, 3:17pm

My question was why did Model 3 perform poorly? Sounds like that cannot be answered by just looking at the layers in the model_3.

saifkhanengr · April 28, 2023, 3:28pm

As the number of layers increases, there is chance that performance will decrease. Try to increase the number of layers (of the third model), you will see that its error is (maybe) more than the 3rd model.

Now the question is, why this happens? There may be multiple answers but the one I mentioned earlier is vanishing or exploding gradient.
Vanishing means the slope (gradient) becomes too small to update the parameters, so it stuck. Exploding means the slope becomes too large and jumps back and forth, never converging.

Also, increasing the number of layers may also need to tweak other hyperparameters, like learning rate, number of iterations, etc. I am not sure about that.

Topic		Replies	Views
Choosing a model based on training and validation errors Advanced Learning Algorithms week-3	1	285	December 13, 2023
Simple Model better than Complex Model? Advanced Learning Algorithms week-3	5	501	September 21, 2023
Training models Advanced Learning Algorithms week-3	1	119	May 26, 2024
DLS1 week 4 assignment 2: 2 hidden layers VS 4 - sharing views Neural Networks and Deep Learning	3	531	August 30, 2021
Practice-lab C2_W3 . CV TRAIN high bias(simpler model) Advanced Learning Algorithms week-3	4	498	July 23, 2022

C2W3_Lab_01_Model_Evaluation_and_Selection - Classification

Related topics