Tips on applying regularization

Chiang_Yuhan · November 15, 2023, 8:24am

Hello Everybody!

I’m trying to implement a regularization technique for my neural network because I suspect it is overfitting my dataset. I am using a simple one hidden layer DNN with 600 input units and 6 output units. By adding L2 regularization in the hidden layer with setting lambda to 0.001, I have some findings that are a bit confusing.

My accuracy in my testing set has dropped from 97% to 90% (which indeed decreased overfitting issue that I suspected). It decreased the the bias though.
The accuracy in my testing set has not increased. (still at around 65%)
I applied L2 regularization on the hidden layer and the softmax layer, it didn’t result in much difference.
Is the whole point of regularization about decreasing the bias so that I can train my network for longer without overfitting, and have a better result on my validation set?

Thank you for reading my message. If I have any conceptual misunderstandings about regularization, please let me know. It will be much appreciated.

Best, John

rmwkwok · November 15, 2023, 9:10am

Hi John @Chiang_Yuhan,

In your point number 1, “my testing set” should have been “my training set”?
In your point number 1, I think it should be “It increased the bias though”, not “decreased”
For your question 4, can you make a plot like the following for our discussion?

image888×401 33.7 KB

Details about the plot:
- Two curves: one for training set accuracy and one for testing set;
- For simplicity, fix to regularizing each and every layer (should be easy with Tensorflow)
- Whatever lambda value r you are using, centering around it, try a few more steps in a sequence like this: \frac{r}{10}, \frac{r}{3}, r, 3r, 10r - feel free to add a few more if training one network doesn’t cost much time.
- Make sure to include \lambda=0.
- I would like to see the \lambda values in the x-axis.
If you still have time, can you make one more plot like the above, but with an additional hidden layer of 60 units in between the current two Dense layers? Both plots use the same set of \lambda's please.

Certainly the goal of regularization would be for your model to perform better with the testing set, but it is not a guarantee For example, I can set \lambda to a unreasonably large number to destroy everything.

Let’s see what you will get.

Raymond

rmwkwok · November 15, 2023, 9:19am

Please also share how many training samples and features you have in each of your training and testing sets.

Chiang_Yuhan · November 16, 2023, 6:25am

Hello! @rmwkwok!

Thank you for your reply!

You were right! I would be editing the post shortly!

And you’re also right, what I meant is it decreased the variance … I was a bit confused yesterday haha. I will be making the plot you mentioned shortly.

Chiang_Yuhan · November 16, 2023, 6:29am

Shape: ( examples, features )

截圖 2023-11-16 14.27.18

Network (1 Hidden Layer DNN)

600 units → 100 units → 6 units

Chiang_Yuhan · November 16, 2023, 7:13am

Hello @rmwkwok :

I have finished my plot:

Here is the github repository: GitHub - JakeDaDogg/DNN: Here is the DNN that I designed for my 1D data identification task

Chiang_Yuhan · November 16, 2023, 7:40am

I also compared different number units in my hidden layer:

I have set the number of units in my hidden layer as my variable and created 5 models, evaluated their accuracy after training for 300 epochs.

I used lambda = 0.01 for this task, applied in the hidden unit and the softmax layer.

rmwkwok · November 17, 2023, 12:25am

Hey @Chiang_Yuhan,

This one’s behavior is very easily understandable. However, before I can really comment on it, I notice in your code that you didn’t use EarlyStopping, so I suppose this graph recorded the training and validation accuracies at the end of the 200th epoch.

This is not good, because the it didn’t reflect the best validation accuracies. Please allow them to early stop and remake the graph with the stopped training and validation accuracies. With earlystop, your first model training run should finish within 40 epochs (saving 80% of your time).

Raymond

rmwkwok · November 17, 2023, 12:28am

This is a very interesting graph, especially how you picked the experimented units, only it would be more usable if EarlyStopping had been applied to them.

rmwkwok · November 17, 2023, 12:35am

It might be helpful if you re-code in this way…

In the 3rd code cell, change your DNN1_model(input_shape, r) to DNN1_model(hidden_units, r), since input_shape is actually fixed and you are experimenting the number of hidden units.
In the 4th code cell, use nested loops. The outer loop goes over different values of hidden_units, and the inner goes over the lambda. Then change a bit the way of how you store the final accuracies such that in your next graph, there will be as many training (and validation) curves as the number of hidden_units.
In the 5th code cell, in those plt.plot lines, add the label argument to give each curve a name. Then before plt.show, do plt.legend to show the legend that contains the names of the curves.

Raymond

Chiang_Yuhan · November 19, 2023, 3:51am

Thank you sir I will be coding this for a while I will get back to you as soon as I finish!

Always appreciate your insights.

John

Chiang_Yuhan · November 20, 2023, 3:41am

Hello @rmwkwok

I have finished recoding as you have taught me. Here are the plots with accuracy with respect to different lambda values. There are 5 plots which means different input value. I used early stopping at 40 epochs for all cases.

To my surprise, the 30 hidden units performed better, it might stem from simpler training for the simpler network. (Thus my implementation of early stopping is wrong). I’ve also observed that regularization has more impact when there are more are more hidden units.

I’d love to hear your thoughts!

Best,
Yuhan

rmwkwok · November 20, 2023, 10:29am

Hi @Chiang_Yuhan,

Would you mind to also update your repo with the code that generated these 5 plots? I want to make sure we are on the same page, because something doesn’t seem to make sense to me.

Also, you said 30 units performed better, but based on validation accuracy, I would say 6 units is the one I would name.

Some trends are understandable:

as lambda increased, increasing validation accuracy followed by decreasing
with more hidden units, validation accuracy dropped

But some are hard to understand:

increasing training accuracies with higher lambda (plot 1, 2)
larger accuracy gaps with higher lambda (plot 1, 2)
with more hidden units, training accuracy dropped (across all plots)

I need to inspect your code before I can make any further comment. In case you also want to do some inspection, before that, would you first update your repo and notify me? We might exchange our findings later.

Raymond

Chiang_Yuhan · November 21, 2023, 3:09am

Hello @rmwkwok

I have updated the code uploaded into the repo. There were some problems like

is resolved. (I don’t know how I resolved this issue though, I only recoded because it didn’t save my file) Please refer to the file ‘1 hidden layer V3’.

I would love to hear your insights soon!

Best,
Yuhan Chiang

rmwkwok · November 21, 2023, 11:58am

Hey @Chiang_Yuhan,

Actually, all of the three problems are gone! Now their opposites are true, see if the opposites make sense to you.
You are still not using EarlyStopping. It does not mean fixing the number of epochs to a smaller value. You need to google for how to use this class to add EarlyStopping to your training process.
I hope that, besides what I have suggested, in your next post, I will see your analysis of the situation, your furher work, and the latest results.
For example, validation accuracy does increase in some cases when Layer 1’s units increases. However, the effect is not clear because you have to go up and down to compare some plots. Having said that, if you did not realize any improvement, you might need to think about what you can do to assist yourself better. For example, if you don’t scroll up and down to compare two plots, why not make them in one single plot?
Another example, accuracies gaps close as lambda increases which is reasonable. However, dropping validation accuracy isn’t something we want.
Therefore, there were both improvement and unfavourable ends, and they should inform you your next actions.

@Chiang_Yuhan, I believe your goal with this work is to train yourself to be a good model trainer. So, the key is your work, not mine. Below is an experiment loop you can jump into:

Experiment → observation → make sense of the observation → hypothesis → Experiment your hypothesis → observation-> …

If you still have no idea what to do, here are my suggestions:

Scan through your code character-by-character and make a list of all tunable parameters:
- learning rate,
- L1 hidden units,
- number of hidden layers (can be 0, why not?)
- lambda,
- number of training samples (by adjusting train/valid ratio),
- many more… I am just lazy, but you can do it for yourself.
For each, ask yourself, would increasing it help overfitting or underfitting.
Observing overfitting/underfitting is easy because you can compare train/valid losses and accuracies
Based on your observation, and your list of parameters, what you are going to do next?
You may add a step in the loop to log down the latest best validation accuracy you have got and how you have achieved it.

I won’t be suprised if you go through that experimentation loop 100 times yourself within a day, and if you do so, there are two consequences:

You are accumulating experience through ups and downs, which would be very resuable in your future. We can’t only know the ups, but if I keep pushing you to the downs, I can’t imagaine how you would feel. It’s best for you to follow your own instinct, and train it as you go through the loop.
You can have done way more in one day than just we exchanging messages here for two weeks.

Cheers,
Raymond

PS1: I would love to read and discuss your list of tunable parameters with how they deal with over-/under-fitting. However, it’s up to you for whether you want to share it

PS2: Don’t give up

rmwkwok · November 21, 2023, 12:09pm

We can put this behind. It’s good if you had known how you fixed it because it was totally only for your own gain, but if you hadn’t, it can’t be a smaller issue at all, because sooner or later, it will come back and you will find out.

rmwkwok · November 21, 2023, 12:27pm

Btw, remember the spirit of expeirmentation - one change at a time. This is the key for you to accurately credit/blame the up/down.

Chiang_Yuhan · November 22, 2023, 2:42am

Got it sir! I will be trying things out in the next few days!

rmwkwok · November 22, 2023, 11:39am

Take your time but make it count. The process is important.

Raymond

PS: I have been trying things too. We are no different in this.

Topic		Replies	Views
Week 1 Quiz 1: Questions Underlying Explanations Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	559	June 30, 2021
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	563	December 21, 2021
Course 2 Week 1 Regularization AI Discussions	2	70	December 21, 2022
Regularization increasing bias, but dev set performance stays the same Structuring Machine Learning Projects coursera-platform	3	558	May 23, 2022
Changing lambda does not change results too much Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	492	November 2, 2022

Tips on applying regularization

Related topics