Summing the frequencies of unique words, not DUPLICATES

Nurlan_Imanov1 · June 12, 2022, 10:56am

I did not get why don’t we sum the frequencies of duplicates. Imagine we have this kind of sentence: “I am not happy. He is not okay” Let’s say the words “I” “am” “He” “is” “okay” are neutral words and “happy” ‘s positive frequency is “3” and negative frequency is “0”.On the other hand, “not”’ s positive frequency is “0” and negative frequency is “3”.

So, if we don’t sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 3. From this statistics we say that this sentence is neutral(since positive and negative frequencies are the same).

But if we sum the frequencies of duplicates, this sentence’s sum of positive frequency will be 3 and sum of negative frequency will be 6. In that case we will say the sentence is negative.

From my point of view, the second case is right. Having same negative word two times has to increase the probability of being negative. But if we don’t sum duplicates we will miss the negativity case.

Is there a solid reason not to sum duplicates?

reinoudbosch · June 12, 2022, 8:35pm

Hi Nurlan_Imanov1,

If you continue with the videos you will see that a word vector will be created that does include counts of the number of times a word appears in a positive or negative sentence.

Nurlan_Imanov1 · June 15, 2022, 4:28am

Hi reinoudbosch, thank you for answering.
But why exactly in this part they didn’t count the duplicated ones? I am wondering about the reason behind it.

reinoudbosch · June 15, 2022, 11:28am

I think the idea is to keep things simple at first and then increase complexity in subsequent videos. The price to pay is that it may cause some confusion.

Nurlan_Imanov1 · June 15, 2022, 6:11pm

Okay Thanks a lot for answers.

Topic		Replies	Views
C1W1 - frequency extraction discrepancy between explanation and implementation NLP with Classification and Vector Spaces week-1	3	44	February 18, 2025
Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence? NLP with Classification and Vector Spaces week-1	1	531	February 7, 2022
Count words for positive and negative frequencies NLP with Classification and Vector Spaces week-1	3	545	May 26, 2023
Confusion in Logistic Regression Overview NLP with Classification and Vector Spaces week-1	5	365	October 30, 2023
Assignment inconsistent with course video: Frequencies for unique words or not? NLP with Classification and Vector Spaces week-1	4	493	April 7, 2023

Summing the frequencies of unique words, not DUPLICATES

Related topics