The logistic regression model seems useless

Sorry for the harsh title, I just hope to get some attention from the mentors.

So I’ve just completed the first assignment wich is a nice, comprehensive review of everything we’ve learned so far. But I was wondering what would happen when simply comparing the positive feature to the negative one to predict the sentiment.

Simply put :

  • If positive_feature > negative_feature, then we get 1
  • Else, we get 0.

That’s an extremely simple model, but I really wanted to see how it compares to our complex model (LR, then cost function, computing the gradient, implementing the gradient descent…).

I was really surprised to see that the accuracy on both model are exactly the same !

Indeed, the following code, used in the assignment notebook, returns [0.995].

y_hat = []
    
for tweet in test_x:
    
    #Exctracting features : [1,positive,negative]
    features = list(np.squeeze(extract_features(tweet,freqs)))

    #The model simply compares the positive feature to the negative one
    if features[1] > features[2]:
        # append 1.0 to the list
        y_hat.append(1.0)
    else:
        # append 0 to the list
        y_hat.append(0.0)    
  
s = 0
m = len(y_hat)

for i in range(m):
    s += y_hat[i] == test_y[i]

accuracy = s/m

print(accuracy)

Am I missing something ? Why bother so much if we can simply do this ?

Thank you.

I see your point. In this case we let the training data and algorithm figure it out. Hypothetically in training data, if we have cases where ‘40% positive and 60% negative frequency bearing’ tweets are actually 1 ( positive). In that case, our simple rule would be less accurate, whereas LR would learn and tune itself to predict such scenarios more accurately.
Let me know if that makes sense.

1 Like

Well, I understand, and that makes perfect sense. That was the answer I was expecting. But I didn’t expect that ‘simple model’ (SM) to perform as well as the LR on a real case. I would be curious to know if LR and SM frequently perform identically.

The conclusion I draw from this is that you should try SM first for sentiment analysis, to check if the accuracy is good enough, and if it is worth to use a more complex approach.

Thank you for your answer.