Neural network regularization: What does it mean to regularize a hidden layer?

James_Webb · April 2, 2023, 8:37pm

In lecture, Prof. Ng motivates that regularizing a neural network can help reduce variance even in large neural networks. He then describes how a neural network can be regularized, and presents the cost function:

J(\mathbf{W},\mathbf{B})=\frac{1}{m}\sum_{i=1}^mL(F(x^{(i)},y^{(i)})+\frac{\lambda}{2m}\sum w^2

He then shows what this looks like in TensorFlow code:

layer_1 = Dense(units=25, activation='relu', kernel_regularizer=L2(0.01))
layer_2 = Dense(units=15, activation='relu', kernel_regularizer=L2(0.01))
layer_3 = Dense(units=1, activation='sigmoid', kernel_regularizer=L2(0.01))
model = Sequential([layer_1,layer_2,layer_3])

The TensorFlow code in particular is confusing to me, and makes me wonder if my conception regularization is accurate.

So far in lectures, we have only seen regularization applied to a cost function, not activation functions or z values. It seems odd that in TensorFlow, the regularization is specified on each layer, rather than in the compile method where we provide a loss function as an argument.

I’m not sure what a regularizer on a hidden layer would even do. Could someone help clarify 1) if my understanding of regularization as applying to cost is accurate, and if so, 2) why does TensorFlow specify regularization on each layer?

Thank you!

TMosh · April 2, 2023, 9:48pm

The tensor flow code doesn’t show how regularization happens. It just shows that there’s a “lambda” value used for regularizing the weights in each layer.

The essence of regularization is that it adds some additional cost based on the magnitudes of the weight values. So this creates an incentive for the weight values to be reduced slightly, while at at the same time still trying to make good predictions on the training set.

Overall this causes the predictions to be less good on the training set, but the tradeoff is that you can get better predictions on data that wasn’t in the training set.

rmwkwok · April 3, 2023, 1:00am

Hi @James_Webb

Yes.

I don’t represent Tensorflow, but in my opinion, if I were in the shoes of Tensorflow developers, it makes total sense because, if we think carefully, the implementation effect of (e.g.) the L2 regularizer is that, when a layer gets updated, the layer’s weights are going to be further reduced by \alpha\lambda w. This reduction does not affect anything outside of the layer. Therefore, if the regularizer is specified within the layer, it is convenient.

Again, I am just providing you with another angle to rationalize this, and I don’t mean to say humans cannot design another neural network library that specifies a cost function in the compile() stage where cost function is provided. However, the developers will need to think about how to pass the form of the regularization and the regularization parameter back to each of the layers, or how to pass the weights forward.

Raymond

James_Webb · April 3, 2023, 4:41pm

Thank you, Raymond. Always appreciate the thoughtful replies.

rmwkwok · April 3, 2023, 4:51pm

You are welcome, @James_Webb!

Topic		Replies	Views
How does regularization work on layer with activation “relu” in neural network? Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	2	124	May 15, 2024
Which layer matters for .get_weights(...) in a neural network? Advanced Learning Algorithms week-3	4	494	February 11, 2023
How does TensorFlow compute cost in hidden layers? Advanced Learning Algorithms week-1	16	883	June 14, 2023
Regularization of output layer in neural network Advanced Learning Algorithms week-3	3	609	July 26, 2023
Questions about regularization Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	6	36	July 13, 2024

Neural network regularization: What does it mean to regularize a hidden layer?

Related topics