Classification confusion

Jonathan_Germain · November 12, 2023, 2:34am

I decided to do a bit of extra work and implement the linear regression and classification tasks on my computer from scratch. I downloaded a couple of data sets from the internet for each one.

The linear regression one went well, but I am confused on classification.

As I step through gradient descent, my cost function increases but the % of predictions that are correct increases too! The algorithm goes from a cost of 0.13 with 34% correct to a cost of 1.48 with 96% correct.

I’ve gone through all of my functions repeatedly and can’t find my error. I feel like I must have made two different mistakes that somehow cancel each other out since the process generates a good final result.

Unfortunately, I don’t think that I can attach my Jupyter notebook to this post.

Edit: I uploaded my notebook to github here: GitHub - compilebunny/ML-learning: Temporary ML learning repository

Are there any common errors that could produce this result? A common pair of errors perhaps?

rmwkwok · November 12, 2023, 3:00am

Hi @Jonathan_Germain,

That’s interesting.

By “goes from a cost of 0.13 with 34% correct to a cost of 1.48 with 96% correct”, were you saying that your gradient descent increased the cost over iterations?
Continuing with the above quote, were 0.13 and 1.48 training cost or validation cost; and were 34% and 96% training accuracy or validation accuracy?

Well, you may upload your code to your Github and share the link to your GIthub here. Just to note that we cannot share any course’s lab anywhere, but I suppose you were developing your own code, right?

Raymond

Jonathan_Germain · November 12, 2023, 12:55pm

Thanks for the idea. I’ve uploaded my notebook to github here:

Jonathan_Germain · November 12, 2023, 1:38pm

To answer your other questions:

Yes, my gradient descent increases the cost over iterations.
The 34% and 96% are the percent of predicted values that match actual values at 0 and 1000 iterations, respectively. Since this is an elementary implementation, I didn’t use separate training and validation data sets.

rmwkwok · November 13, 2023, 12:10am

Hi @Jonathan_Germain,

Then this tells us that you should double check your gradient descent algorithm. You need to make sure that the training costs go down over iterations.

Are you saying that the costs are training set costs, and the accuracies are training set accuracies?

TMosh · November 13, 2023, 12:13am

That’s bad. Either there is an error in your gradient descent code, or your learning rate is too high.

TMosh · November 13, 2023, 1:13am

I don’t think a set of random numbers is a very good test. Random data doesn’t contain very much information from which you can learn a model.

Jonathan_Germain · November 13, 2023, 3:46pm

The dataset is not a set of random numbers. It is from the Wisconsin Diagnostic Breast Cancer (WDBC) database, however I changed the output to 0/1 for benign/malignant.

I found the problem. It was an error in my logistic loss function.

With that fixed, the cost function decreases from 1.2 to 0.2, and the % correct increases from 0.35 to 0.96 over the course of 1000 cycles.

TMosh · November 13, 2023, 3:58pm

From reading your notebook, I saw that somewhere in there you created a dataset using random numbers.

That’s good news that you found the problem.

TMosh · November 13, 2023, 3:59pm

This plot confused me.

Topic		Replies	Views
C1_W1_lab05: Linear regression code questions Supervised ML: Regression and Classification week-1	4	631	September 21, 2022
C1_W3_Logistic_Regression: Machine Learning Specialisation Supervised ML: Regression and Classification week-3	1	557	July 25, 2022
Week 2 Practice Lab: Linear Regression Need Help Supervised ML: Regression and Classification week-2	3	485	July 14, 2023
Doubt in a python code I’ve written Supervised ML: Regression and Classification week-3	6	522	November 30, 2022
Week 1 Community Contributions: Share Your Notes Supervised ML: Regression and Classification week-1	37	1385	July 4, 2022

Classification confusion

Related topics