Is it mandatory to use regularization? strictly speaking, no, it is not ‘mandatory’ per se. You can decide to use it or not. Now, if your NN is overfitting, then regularization is one technique to cure the NN from overfitting.
What happens if the NN doesn’t have overfitting and we still use regularization? Usually nothing bad will happen. Your NN will be trained and it may even help to get a better generalization.
You’ll find 3 types of regularization: L1, L2, and Dropout.
L2 is probably the most common one. In this type, the loss function is extended by a term that penalizes the sum of squares of the weights (aka weight decay).
L1 also extends the loss function with a term that penalizes the sum of absolute values of the weights.
Dropout acts differently: it randomly drops units (neurons) from the layer. These dropped units will not have any impact in the model’s performance while training.
Check out these 3 types of regularization in lessons or in google - it is important to understand them very well.
I want to focus on your Q2. The answer is yes, but instead of saying “for each d”, I would like to introduce that we would want to say “for each weight”, which are the w_1, w_2, … that we stick next to each of the (polynomial) features. Because the number of weights can be larger than the number of “d” when we include cross terms such as x_1x_2, and also because regularization terms are getting those weights involved directly, such as for d = 1, the L2 regularization can be expressed as \lambda w_1^2.
Therefore, yes, regularization is applied on all the weights or all the w's.