'sgd' optimizer

CCCC · September 12, 2021, 9:04pm

Why you get a much higher loss if you use ‘Adam’ instead of ‘sgd’ with 500 epochs?

german.mesa · September 13, 2021, 8:02am

Optimisation algorithms are designed to minimise the error rate when training your model. How efficient is it can be measured based on speed of convergence - how many epochs do you need to get a global optimum - and their generalisation capabilities - how good your model react to new data.

In our extremely simple case, SGD converges quicker compared to Adam, that’s why you get lower loss. We shouldn’t get any conclusion for that as you’ll find out many cases that Adam’s will make it much faster and better than SGD. As suggestion, you can also play with learning_rate and will notice that results can change in favour of Adam.

Topic		Replies	Views
Perform SGD vs. Adam, AI Discussions ai-discussions	1	38	November 22, 2025
Choice of SGD instead of Adam Sequences, Time Series and Prediction week-module-4	1	556	July 10, 2022
C1_W1_Lab_HousePredictions (Using Adam optimizer) Introduction to TF for Artificial Intelligence ... week-module-1	7	715	October 12, 2022
DLS Course 2, wk3 programming assignment optimizer for "Train the Model" Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	555	July 20, 2021
Optimizer and different performances Convolutional Neural Networks in TensorFlow week-module-1	2	587	March 2, 2022

'sgd' optimizer

Related topics