L2 regularization: lambda divided by 2m?

efer502 · June 16, 2021, 11:04pm

Hi everyone,

I am a bit confused on the L2 regularization video on the L2 weighting part (lambda).

I am not sure if this is standard with DL training using batches.
Normally while training CNN, I use L2 regularization by lambda * ||w||2, where lambda is a constant.

In the video, it is shown that the weighting should be divided by 2m, where m is the number of data.
After reading several sources, it does make sense in that context. However, the examples are using small datasets where the whole dataset fits into 1 epoch (iteration).
How does this apply to larger dataset where we train on batches? Does m stays as the number of examples or becomes the batch size? Would a constant lambda (without dividing with 2m) still works the same?

Thank you!

Some sources on that:

github.com

sergei-bondarenko/machine-learning/blob/master/l2.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KnFg6GrtyY36"
   },
   "source": [
    "## L2 regularization for Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Jza0eDFU6dI2"
   },
   "source": [
    "Here you can find L2 regularization implementation and an empirical answer to [this question](https://stats.stackexchange.com/questions/287920/regularisation-why-multiply-by-1-2m) about why we divide L2 regularization cost by the number of training examples."

This file has been truncated. show original

jonaslalin · June 17, 2021, 5:48pm

Hi @efer502,

I think m in this case refers to batch size. So it works if the batch size is 1 (SGD) or if it contains all training examples (batch gradient descent), or in between which is minibatch gradient descent…

Topic		Replies	Views
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	563	December 21, 2021
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	515	August 30, 2022
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	481	April 28, 2023
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	2	31	April 10, 2025
Regularization question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	357	September 12, 2023

L2 regularization: lambda divided by 2m?

Related topics