Course 2 Week 1 Regularization

In L2 regularization we take a high value of Lambda, so eventually we get to see a decrease in W[1] and with the the Z[2] also decreases. If we take an example of tanh function so we tend to have a linear graph with the decrease in Z[3] which ends up computing a very basic linear network. So to decrease our high variance we end up computing a linear network, so can this affect our accuracy ??
Is there any other method other than dropout to lower the variance and also not computing a very basic linear network

  1. l ↩︎

  2. l ↩︎

  3. l ↩︎

There is also L1 regularization, but it has its own set of issues. There are several approaches to the type of problem you are describing:

  1. If too high a \lambda causes your network to be too linear and simple, then that will show up as high bias, right? So maybe that means you’ve gone too far with the \lambda.
  2. If you can’t find a “Goldilocks” \lambda value that gives you a good enough balance between variance and bias, then try dropout.
  3. If that also doesn’t work, maybe you need to consider a more radical solution: a different network architecture. Try different activation functions, try adjusting the number and sizes of the layers. Or try a completely different architecture: CNN instead of FC NN or …

Prof Ng will have much more to say about how to deal with cases in which your solution doesn’t work as well as you want it to both here in DLS Course 2 and in DLS Course 3. The high level message is that there is no one “magic bullet” solution that works in all cases.

BTW you filed this under “General Discussion” and it could be considered a general point, but you also refer to Course 2 Week 1 in the title, so you might want to move this to the DLS C2 category.


Thank you for the response, I’ll try different methods.