Regularization

Prof Ng discusses that point in the lecture. If you missed that, I suggest you rewind and watch it again. You can use the interactive transcript to find the relevant part of the lecture.

Here’s a previous thread about this point as well. You can read from that post forward through the thread. Interestingly in the original paper by Geoff Hinton’s group, they don’t do it that way and it makes things quite a bit more complicated.

Here’s another thread about it.

And here’s one where it actually shows the effect on the L2-norm of the activation output.