Momentum without mini-batch

It is clear that if you are using mini-batch, it´s better to use momentum to “smooth” parameters update. But how it works it you don’t use mini-batch (no too much data, so the whole data is one batch)? I can see that momentum clearly does smt, because on practice I get different results if I train the model with RMSProp and Adam optimizers. :thinking:

If you plot the two training curves (loss vs epochs) on the same graph, how do they compare?

Cheers,
Raymond