Sorry for the harsh title, I just hope to get some attention from the mentors.
So I’ve just completed the first assignment wich is a nice, comprehensive review of everything we’ve learned so far. But I was wondering what would happen when simply comparing the positive feature to the negative one to predict the sentiment.
Simply put :
- If positive_feature > negative_feature, then we get 1
- Else, we get 0.
That’s an extremely simple model, but I really wanted to see how it compares to our complex model (LR, then cost function, computing the gradient, implementing the gradient descent…).
I was really surprised to see that the accuracy on both model are exactly the same !
Indeed, the following code, used in the assignment notebook, returns [0.995].
y_hat = []
for tweet in test_x:
#Exctracting features : [1,positive,negative]
features = list(np.squeeze(extract_features(tweet,freqs)))
#The model simply compares the positive feature to the negative one
if features[1] > features[2]:
# append 1.0 to the list
y_hat.append(1.0)
else:
# append 0 to the list
y_hat.append(0.0)
s = 0
m = len(y_hat)
for i in range(m):
s += y_hat[i] == test_y[i]
accuracy = s/m
print(accuracy)
Am I missing something ? Why bother so much if we can simply do this ?
Thank you.