L2 Regularization With Mini-batch GD

Hi, @MalayAgr.

It’s the same m, the size of the mini-batch. Here’s an interesting discussion about this scaling factor.

When you add momentum, W is still calculated asimage. But V_{dw} depends on dw, which now has an additional term: image

Hope that helped :slight_smile:

1 Like