When Andrew Ng was explaining regularization with the example wherein he added 1000w3 + 1000w4 to the loss function, it’s clearly evident how will this help selectively penalise the values of higher order term’s parameters and therefore reduce overfitting. But in practice, we regularise all the parameters using the same constant lambda and I do understand the impact on the model when lambda=0 or lambda is very high but I’m unable to understand how will an intermediate value selectively penalise the higher order terms and not penalise the lower order ones the way it was happening in the example.

It doesn’t selectively penalize higher order terms. It regularizes all of them.

The example used in the lecture is rather a misleading worst-case situation.