Why frequency dictionary and why this particular 3-D feature vector?

Michael_Prior · August 16, 2023, 7:13pm

We talk about the possibility of naively encoding the features for a tweet as a vector of length V where V is the total size of our vocabulary, and then we suggest that a way to compress this is using a frequency dictionary. But no time is spent on why this solution over other possible ways to compress.

The frequency dictionary forces each tweet to be represented by only two numbers (the # of times that shows up in all positive tweets, and the # of times that shows up in all negative tweets). If we’ve gone to that step, then why isn’t it valid to then just add up these values for each word in the tweet and then assume it’s positive if the positive count is greater or negative if the negative count is greater. What is even the point of all the ML steps?

Also why is it important to keep the count of both positive and negative? Couldn’t you reduce one dimension by just storing the positive/negative difference? Is there some scenario where a 2 point difference means one thing with larger counts than it does with smaller counts? If so, then why not store as a difference and a magnitude instead? Are there tradeoffs to consider?

Also why do we introduce a bias of 1 if our whole point in doing all this is to compresss the information we’re attempting to train against.

Also no time is spent on the choice to avoid counting the same word twice. Why is that avoided? If someone tweets “I am so happy I can’t tell you how happy” why would it be bad to count happy twice? Shouldn’t the positive sentiment there be weighted just as much as if they had chosen some synonym since we would count it in that case, i.e. “I am so happy I can’t tell you how overjoyed”

Michael_Prior · August 16, 2023, 7:19pm

Also no time is spent on the choice to avoid counting the same word twice. Why is that avoided? If someone tweets “I am so happy I can’t tell you how happy” why would it be bad to count happy twice? Shouldn’t the positive sentiment there be weighted just as much as if they had chosen some synonym since we would count it in that case, i.e. “I am so happy I can’t tell you how overjoyed”

Actually I think I can answer this part. We’re actually just counting the number of tweets that word shows up in and so this is consistent with that goal, since we want the frequency dictionary to tell us how many tweets a word has shown up in, not how many times it has shown up in all tweets… (though I’m not sure the latter is any less valid as a feature choice?)

Topic		Replies	Views
Confusion in Logistic Regression Overview NLP with Classification and Vector Spaces week-1	5	363	October 30, 2023
Features Vector NLP with Classification and Vector Spaces week-1	6	497	February 9, 2023
Count words for positive and negative frequencies NLP with Classification and Vector Spaces week-1	3	532	May 26, 2023
Assignment inconsistent with course video: Frequencies for unique words or not? NLP with Classification and Vector Spaces week-1	4	491	April 7, 2023
Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence? NLP with Classification and Vector Spaces week-1	1	530	February 7, 2022

Why frequency dictionary and why this particular 3-D feature vector?

Related topics