Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence?

While trying to extract features from Positive and Negative frequencies of words in tweets we do it only for unique words in the sentence. For instance if we have a sentence like that → “This is really fine. It’s fine becuase …” we add the word “fine” 's positive and negative score only once. I want to ask why do we have such a kind of approach? Why don’t we do it for each word? What kind of bias we can add while taking into account all words not just unique ones?

From my point of view having two “fine” in the word has to increase the probability of being positive sentiment. However while taking into account only unique words having two “fine” doesn’t increase the probability.

1 Like

Treating each unique word essentially reduces the possible word features we need to look at. In your example, the word “fine” will be a unique feature with a value of two as it appears twice, which will likely stand out from those word features with a value of one. The approach follows the bag-of-words style.