C1_W1_Lab_HousePredictions (Using Adam optimizer)

In house pricing code the optimizer we used is a SGD but i tried it with the Adam optimizer.
I expected the results to be even more precise as Adam is said to be more efficient than any other optimizers.
But surprisingly the loss that i got on training the model even after 2000 epochs is around 55700 which is very high loss for any model.
The results are completely opposite to what i expected.
Can someone one tell me why the model performance was very bad on using Adam optimizer.

What did the accuracy look like?

very low.
To predict a result of 400 its predicting 20

Please see this this link on why to keep model inputs / outputs to small values.

See this hint in the markdown cell for the exercise:

Hint: Your network might work better if you scale the house price down. You don’t have to give the answer 400…it might be better to create something that predicts the number 4, and then your answer is in the ‘hundreds of thousands’ etc.

So it could be just the lack of suitable scaling of the cost input as @balaji.ambresh suggests…easy to test. But it looks like something else also going on if it was acceptably accurate with SGD. Though maybe Adam is more sensitive to input scale than vanilla SGD? I don’t know offhand and would have to research. Did you explicitly set Adam parameters?

No I didn’t set any explicit parameters for Adam .
But the interesting part is the prediction of 400 is correct for the SGD optimizer then why i need to scale it now for Adam.
And one more thing is, When i started to train the model for more number of epochs say some 10000 etc… and more the accuracy is increasing and the loss is decreasing .
But since its Adam it should take less epochs than SGD to get trained, that’s what is a Adaptiveness is for Adam!!

What it’s your input data distribution ? Probably it’s a scaling problem, and a considerable gap between your y data. So consider to rescaling before.

Easy to test, right? Set all controls to what they were with SGD, but now scale the data.

@Vaibhav_C_T_R I think it might also be worth experimenting with setting the learning rate. If the unscaled Adam version eventually learns, but just takes longer, then starting off with a different learning rate might do the trick. I took this course years ago and no longer have the code or I would do myself. Let us know what you find?

A final thought from me is that this is a truly trivial model with a tiny data set…don’t read too much into the lack of performance from Adam. You will use it again many times in these courses in more challenging situations and see it do just fine.