Confusion in Logistic Regression Overview

Muhammad_Bilal_Hanee · October 21, 2023, 1:16pm

In this image, a tweet is considered and a matrix is created of [bias,+ve freq, -ve freq].
Looking at the tweet, we can see that we can’t have 3476 as +ve frequency and 245 as -ve frequency. Because each word appears one time.

Can someone explain this?

Thanks

arvyzukai · October 23, 2023, 7:02am

Hi @Muhammad_Bilal_Hanee

I’m not sure I understand - why can’t we have 3476 and 245? Four numbers (four words) can easily add up to 3476 and 245, for example, 2000 + 1000 + 401 + 75 (positive counts) and 100 + 101 + 40 + 4 (negative counts).

Cheers

Kayvan_Karim · October 27, 2023, 4:08pm

The number represents the total number of that word in positive or negative corpse. It is like how positive/negative that word is or how often it is used in a positive/negative context.

Muhammad_Bilal_Hanee · October 27, 2023, 7:05pm

But in an example in the section:

On the left, we have a dictionary of unique words and on the right, we have a tweet, we can see that “I” is seen 3 times and added in the frequency just like “am” and so on.
Now look at the processed tweett [tun, ai, great, model], we can see that each word i unique so that they all should be added only 1 time in the frequency table for positive tweet making a sum of 4. Why is it 3476?

arvyzukai · October 30, 2023, 3:05pm

It should not add to 4, it should add to 3476 (as in the example I gave you). Imagine, that you have 10000 tweets and 5000 of them are positive and 5000 are negative. Then imagine, that:

the word “tun” appeared 2000 times in positive tweets and 100 in negative tweets,
also the word “ai” appeared 1000 times in positive and 101 negative,
“great” appeared 401 times in positive and 40 times in negative,
“model” appeared 75 times in positive and 4 times in negative

The resulting vector of that imaginary scenario would result in the picture you were asking. In other words, you would encode the whole sentence in blue into the vector of [1, 3476, 245] and make the prediction based on this vector.

Cheers

Muhammad_Bilal_Hanee · October 30, 2023, 5:39pm

Thanks, I was getting confused about the vocabulary and corpus of words. What I was thinking was that I have four unique words and if I find these words in a vocabulary then they would all appear only one time, neglecting that I have to create a frequency table by looking at the corpus.

Topic		Replies	Views
C1W1 - frequency extraction discrepancy between explanation and implementation NLP with Classification and Vector Spaces week-module-1	3	46	February 18, 2025
Count words for positive and negative frequencies NLP with Classification and Vector Spaces week-module-1	3	545	May 26, 2023
Why frequency dictionary and why this particular 3-D feature vector? NLP with Classification and Vector Spaces week-module-1	1	370	August 16, 2023
Summing the frequencies of unique words, not DUPLICATES NLP with Classification and Vector Spaces week-module-1	4	531	June 15, 2022
Why do we take into account only unique words while adding Positive and Negative frequencies in the sentence? NLP with Classification and Vector Spaces week-module-1	1	531	February 7, 2022

Confusion in Logistic Regression Overview

Related topics