Hi, i had this doubt that if sgd is not that efficient ,why do we see (i have seen) it in every machine learning curriculum…is there some specialized use???
Hi there! I think its important to discuss stochastic gradient descent so that students understand the optimization process and the pros/cons of each variant, i.e. batch, mini-batch and stochastic. In my own experiments I’ve had to resort of stochastic gradient descent because I would run out of memory real fast if I used more than one sample.
2 Likes