C4W2_Assignment, MSE is too big

Hello, I am working on the assignment (C4W2), as it shows that, the submission required the MSE is smaller than 30, but I keep on having the MSE between 30 to 32, may I know how to solve it in order to submit the workbook? I tried to tune lr adn epochs, but it can’t be solved… Thanks!

Please click my name and message your notebook as an attachment.

Yes, thanks, i just sent it

Here’s a hint from the assignment: You will only need Dense layers.
This says that you can achieve the threshold of 30 for MSE with just dense layers. There is no restriction on the number of dense layers to use.

Please use this information to make your model bigger and meet the passing criteria.

umm… actually I am not sure what the hins means, do I can choose other units ? Or I can put more layers?

You can achieve MSE < 30 by using one or more Dense layers. There’s no need to use any other layer type for this assignment.

How can I design the units of each layer? Is that choose randomly? I tried some common numbers, like 1, 5, 10, but the result is not ok…

There is no fixed algorithm for choosing the number of units per layer. Here are a few heurestics for all dense layers except the output layer:

  1. Number of units is power of 2.
  2. Number of units in a dense layer increases with depth.
  3. Use a non-linear activation function such a relu.
1 Like

:laughing: Thanks a lot, I didn’t try about using ‘relu’, once when I tried it, the result is achieved the requirement!!! I fixed this issue finally! I think I need to put some efforts about this part, need to realise how to design NN architecture, Thanks!!!

I can get any MSE for my training data: 0.1, 1, 30, etc. However, my validation MSE is never below 32. This makes me think that my model is overfitting in each scenario. My question is, isn’t it counterintuitive to increase # of layers if I am overfitting? Or is there something special for the time series prediction with dense layers and relu activation function?

It’s incorrect to keep expanding your model by adding more learning layers like Dense, LSTM etc. when the training error is very low and the test error is way higher when compared to the training error. Deep learning specialization (course 3) talks about methods to address overfitting in good level of detail. I recommend you take it for deeper understanding of this issue.

Assuming you have followed my guidelines in building the NN and there are no coding errors, try building a simpler model / use techniques like dropout. This might end up increasing your training error a bit but will end up reducing the test error.

I tried different architectures such as increasing layer size, decreasing layer size, first increasing then decreasing layer size, putting Dropout layers here and there, etc. I tried to optimize each one of them orthogonally, optimizing one parameter at a time, as suggested in DLS Chapter 3. However, I am not able to achieve sub-30 MSE.

I talked with one of my friends who also took the course and he said that he could easily reach it with the architecture given in the course videos. I just wonder if it is normal to struggle that much, or I somehow broke the notebook?