Looking for help with Coursera Guided Project: Data Science Coding Challenge: Loan Default Prediction

Hello, I’m looking for some help tunning my ML model for a guided course in Coursera (https://coursera.org/share/8ecfebb08838db1ea35a29e328efa120).

I have a problem with overfitting that is way abrupt so therefore it’s not letting me get good predictions, for this I have tried (with no success) so far with multiples values for Regularization L1, L2, L1_L2, Dropout, Layers, Neurons, Data size, Learning rate, Patience and couldn’t make it better, I would really appreciate any suggestion to help me improve this training

Maybe I’m applying it wrong so if somebody could please give me some feedback I would really appreciate it

This is the link for the project in my github:

I have commented some details for some guidance so you can understand what is going on

I’ll be pending on any update and if needed more information please let me know

Thannks in advance!

I’ll just toss in a couple of comments, keeping in mind that the purpose of the Coursera Projects is for students to demonstrate their own skills.

  • You’re using both L1 and Dropout. Both of them are regularization methods. Why use both? I think that just complicates your efforts to understand what’s happening.

  • Your model is pretty complicated. Did you try simpler models first? How much performance did you gain with the more complicated models?

Thanks for you feedback @TMosh ! I really appreciate your comments

I agree with the purpose of Coursera Projects, in fact the highest score that I got with this project was 68%, this was after many attemps trying different setups with the model.

As result I wasn’t really satisfied with the decision that lead me to get that grade, because I trained the model, wasn’t that good with the loss and val_loss but I adjusted the prediction limit to pass as 1 and got a higher score, and before I got better results with the loss and val_loss but the grade was not good

This is why I decided to ask for help here for some comments like yours! That could lead me to understand better this part because I feel I’m a bit stuck.

I’m going to try making it simplier again because in the way of trying to make it better I made it more complex by adding more methods, that’s true, thanks for the answer!

1 Like

Hey @TMosh, just wanted to make an update on what I did:

  • Started with a simplier model, using 2 hidden layers, then 3 to see how it affected the results and the best result I got them with 3 hidden layers

  • Used independently Dropout and L1 and the best results I got them with Dropout, using 0.2 in the first hidden layer, when started combinating them the variance in the loss and accuracy raised

  • I tried changing the batch size but 256 was still the best size

  • The learning rate for the optimizer also decreased and increased it but wasn’t really effective in the variance

  • On the other hand, I tried taking less parameters as input but didn’t change the result

The best I have gotten so far has been 0.58 as loss and 0.69 for accuracy, I’m really aiming to decrease the loss as much as possible and increase the accuracy, I would really use some other tips or help if possible.

Thanks in advance

The magnitude of the numerical value of the loss doesn’t really matter. You’re just trying to find where it reaches its minimum.

Then you look at the test set accuracy to determine how well it works.

Ok, I will keep that in mind, and what about the variance that the model gives towards the new inputs? The best I have gotten yet is this:

image

And this other one:

Thanks for your answers!

Looking at the vertical scale, maybe that’s not a significant amount of difference.