Summing the frequencies of unique words, not DUPLICATES

I did not get why don’t we sum the frequencies of duplicates. Imagine we have this kind of sentence: “I am not happy. He is not okay” Let’s say the words “I” “am” “He” “is” “okay” are neutral words and “happy” ‘s positive frequency is “3” and negative frequency is “0”.On the other hand, “not”’ s positive frequency is “0” and negative frequency is “3”.

So, if we don’t sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 3. From this statistics we say that this sentence is neutral(since positive and negative frequencies are the same).

But if we sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 6. In that case we will say the sentence is negative.

From my point of view, the second case is right. Having same negative word two times has to increase the probability of being negative. But if we don’t sum duplicates we will miss the negativity case.

Is there a solid reason not to sum duplicates?

Hi Nurlan_Imanov1,

If you continue with the videos you will see that a word vector will be created that does include counts of the number of times a word appears in a positive or negative sentence.

Hi reinoudbosch, thank you for answering.
But why exactly in this part they didn’t count the duplicated ones? I am wondering about the reason behind it.

I think the idea is to keep things simple at first and then increase complexity in subsequent videos. The price to pay is that it may cause some confusion.

1 Like

Okay Thanks a lot for answers.