I did not get why don’t we sum the frequencies of duplicates. Imagine we have this kind of sentence: “I am not happy. He is not okay” Let’s say the words “I” “am” “He” “is” “okay” are neutral words and “happy” ‘s positive frequency is “3” and negative frequency is “0”.On the other hand, “not”’ s positive frequency is “0” and negative frequency is “3”.
So, if we don’t sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 3. From this statistics we say that this sentence is neutral(since positive and negative frequencies are the same).
But if we sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 6. In that case we will say the sentence is negative.
From my point of view, the second case is right. Having same negative word two times has to increase the probability of being negative. But if we don’t sum duplicates we will miss the negativity case.
Is there a solid reason not to sum duplicates?