The logistic regression model seems useless

Reouven_Zana · February 23, 2022, 2:13pm

Sorry for the harsh title, I just hope to get some attention from the mentors.

So I’ve just completed the first assignment wich is a nice, comprehensive review of everything we’ve learned so far. But I was wondering what would happen when simply comparing the positive feature to the negative one to predict the sentiment.

Simply put :

If positive_feature > negative_feature, then we get 1
Else, we get 0.

That’s an extremely simple model, but I really wanted to see how it compares to our complex model (LR, then cost function, computing the gradient, implementing the gradient descent…).

I was really surprised to see that the accuracy on both model are exactly the same !

Indeed, the following code, used in the assignment notebook, returns [0.995].

y_hat = []
    
for tweet in test_x:
    
    #Exctracting features : [1,positive,negative]
    features = list(np.squeeze(extract_features(tweet,freqs)))

    #The model simply compares the positive feature to the negative one
    if features[1] > features[2]:
        # append 1.0 to the list
        y_hat.append(1.0)
    else:
        # append 0 to the list
        y_hat.append(0.0)    
  
s = 0
m = len(y_hat)

for i in range(m):
    s += y_hat[i] == test_y[i]

accuracy = s/m

print(accuracy)

Am I missing something ? Why bother so much if we can simply do this ?

Thank you.

vsnupoudel · February 23, 2022, 5:04pm

I see your point. In this case we let the training data and algorithm figure it out. Hypothetically in training data, if we have cases where ‘40% positive and 60% negative frequency bearing’ tweets are actually 1 ( positive). In that case, our simple rule would be less accurate, whereas LR would learn and tune itself to predict such scenarios more accurately.
Let me know if that makes sense.

Reouven_Zana · February 23, 2022, 9:18pm

Well, I understand, and that makes perfect sense. That was the answer I was expecting. But I didn’t expect that ‘simple model’ (SM) to perform as well as the LR on a real case. I would be curious to know if LR and SM frequently perform identically.

The conclusion I draw from this is that you should try SM first for sentiment analysis, to check if the accuracy is good enough, and if it is worth to use a more complex approach.

Thank you for your answer.

Topic		Replies	Views
C3_W1_Assignment help! Trained model is predicting every sentence negative NLP with Sequence Models week-module-1	2	500	August 17, 2022
What if we have the same frequency score on both a positive and a negative tweet NLP with Classification and Vector Spaces week-module-1	1	547	December 31, 2021
Logistic Regression Sentiment Analysis Full NLP with Classification and Vector Spaces week-module-1	2	266	February 29, 2024
Need Help with my Logistic Regression for Sentiment Analysis c# code NLP with Classification and Vector Spaces week-module-1	1	18	August 4, 2024
What are we predicting after training the model? Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	6	619	April 4, 2022

The logistic regression model seems useless

Related topics