In L2 regularization we take a high value of Lambda, so eventually we get to see a decrease in W^{[1]} and with the the Z^{[2]} also decreases. If we take an example of tanh function so we tend to have a linear graph with the decrease in Z^{[3]} which ends up computing a very basic linear network. So to decrease our high variance we end up computing a linear network, so can this affect our accuracy ??

Is there any other method other than dropout to lower the variance and also not computing a very basic linear network

There is also L1 regularization, but it has its own set of issues. There are several approaches to the type of problem you are describing:

- If too high a \lambda causes your network to be too linear and simple, then that will show up as high bias, right? So maybe that means youâ€™ve gone too far with the \lambda.
- If you canâ€™t find a â€śGoldilocksâ€ť \lambda value that gives you a good enough balance between variance and bias, then try dropout.
- If that also doesnâ€™t work, maybe you need to consider a more radical solution: a different network architecture. Try different activation functions, try adjusting the number and sizes of the layers. Or try a completely different architecture: CNN instead of FC NN or â€¦

Prof Ng will have much more to say about how to deal with cases in which your solution doesnâ€™t work as well as you want it to both here in DLS Course 2 and in DLS Course 3. The high level message is that there is no one â€śmagic bulletâ€ť solution that works in all cases.

BTW you filed this under â€śGeneral Discussionâ€ť and it could be considered a general point, but you also refer to Course 2 Week 1 in the title, so you might want to move this to the DLS C2 category.

2 Likes

Thank you for the response, Iâ€™ll try different methods.