Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence?

Nurlan_Imanov2 · February 4, 2022, 1:11pm

While trying to extract features from Positive and Negative frequencies of words in tweets we do it only for unique words in the sentence. For instance if we have a sentence like that → “This is really fine. It’s fine becuase …” we add the word “fine” 's positive and negative score only once. I want to ask why do we have such a kind of approach? Why don’t we do it for each word? What kind of bias we can add while taking into account all words not just unique ones?

From my point of view having two “fine” in the word has to increase the probability of being positive sentiment. However while taking into account only unique words having two “fine” doesn’t increase the probability.

jackliu333 · February 7, 2022, 10:51pm

Treating each unique word essentially reduces the possible word features we need to look at. In your example, the word “fine” will be a unique feature with a value of two as it appears twice, which will likely stand out from those word features with a value of one. The approach follows the bag-of-words style.

Topic		Replies	Views
C1W1 - frequency extraction discrepancy between explanation and implementation NLP with Classification and Vector Spaces week-module-1	3	46	February 18, 2025
Summing the frequencies of unique words, not DUPLICATES NLP with Classification and Vector Spaces week-module-1	4	531	June 15, 2022
Assignment inconsistent with course video: Frequencies for unique words or not? NLP with Classification and Vector Spaces week-module-1	4	493	April 7, 2023
Confusion in Logistic Regression Overview NLP with Classification and Vector Spaces week-module-1	5	365	October 30, 2023
Count words for positive and negative frequencies NLP with Classification and Vector Spaces week-module-1	3	545	May 26, 2023

Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence?

Related topics