I implemented binary logistic regression on iris dataset with 3 features. I ran gradient descent with 0.1 learning rate without feature scaling and it decreased the cost to 1.28 but with feature scaling it gave bigger cost than this. Is 1.28 cost reasonable?
I cannot say what a reasonable cost is without knowing the data set.
When you change the feature values (by normalization), you need to also change the learning rate and the number of iterations to get the best solution.
iris_data.csv (3.8 KB)
This is the dataset that i used and removed after 100th row to apply binary classification.
Did you try other learning rates and iterations?
I tried again with 0.05 learning rate with 2000 iterations and it gave 1.28 cost after that i increased the learning rate to 0.1 and it gave 1.27. Finally after increasing the learning rate to 0.2 gradient started to give greater cost such as 1.52.
What do you get with 0.05 and 5,000 iterations?
1.27 on all iterations
Maybe that’s as good as it gets.