Understanding Mini batch size

Hi Sir,


From the lecture mini batch gradient descent we had two doubts can u please help to clarify ?

At 8:05 mt, proff told that mini batch descent does not always exactly converge or oscillate in a very small region. Here in this statement, Does not always converge means should be oscillation around the minimum right sir ? We dont know why proff says does not oscillate also .

Second doubt is, if the algorithm is wandering around the minimum means, can we use small learning rate or reducing learning rate will help to converge to the global minimum ?


With a fixed learning rate, there is never any guarantee that Gradient Descent will converge either in Minibatch or Full Batch. You can always get oscillation or even actual divergence, rather than convergence. If you want better behavior, you have to use a more sophisticated algorithm that adapts the learning rate. Some coverage of this was added in the recent update of the courses. See the Optimization Assignment. Towards the end of that, they show a couple of ways to decrease the learning rate with more iterations.