Week 1 - why regularization works with ReLu

Here’s another recent thread on initialization that I think provides intuitions that are also relevant to this question. In both that case and the regularization case, we are reasoning about the effect of the magnitudes of the coefficients, so I think the reasoning there applies here as well.