Hi - While looking at coffee roasting neural network example, I was intrigued how simple logistic regression would behave on this dataset of 200 entries. So, I added below code
lr_model = LogisticRegression()
lr_model.fit(Xn, Y)
y_pred = lr_model.predict(Xn)
But there are only 7 good roast predictions, y_pred == 1 instead of 43 in original dataset. Since I am doing predictions on same dataset which is used for training, should I not get 100% accuracy?
I used below code to check for accuracy
print(“Accuracy on training set:”, lr_model.score(Xn, Y)) and it is showing 75% accuracy.
What am I doing wrong here?
Also, when I try to plot the predictions using code below, I dont get x markers for bad roast predictions (y_pred == 0)
Logistic Regression is a linear model, meaning it tries to fit a linear decision boundary in the feature space. Because the data is not linearly separable (which you can see on the plot), the model cannot perfectly classify all points, even on the training data. This explains the 75% accuracy and why y_pred doesn’t match the ground truth labels perfectly.
You can try adding non-linear features, which can significantly improve the performance of LR model:
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(Xn)
lr_model = LogisticRegression()
lr_model.fit(X_poly, Y)
y_pred = lr_model.predict(X_poly)
print("Accuracy on training set with polynomial features:", lr_model.score(X_poly, Y))
Accuracy on training set with polynomial features: 0.93