Greetings,
I got stumped when I initially saw that the feature vector for a tweet contained this bias value of 1 (x_i = [1, sum_pos, sum_neg]). Not only was bias not explained, but it was counterintuitive to introduce an extra dimension, right after we went through the process of reducing dimensionality of a sparse vector.
So I tried to recreate it in my head why the value “1” is needed there. Below is my very loose attempt at explaining it. Hopefully instructors can come in and fill it in in the course content - it really is a problem when things like that show up with no explanation.
Logistic regression is one of the simplest classifiers you can think of. If you plot points on a plane, and draw a line, some points will be above, some below. Logistic regression tries to find a line that results in the smallest number of points being misclassified. If you remember from early algebra days, a line has a slope and a constant term you add to “raise” or “lower” it. A logistic regression model contains “slopes”, which are called weights, and the constant term that gets added to account for “raising” and “lowering” the classification boundary. The model is then combined with the input (the feature vector for the tweet) - each feature of the input is multiplied by its appropriate model weight and the products are added, after which the constant term is added, which is called bias in logistic regression. This can be cleverly represented as dot product multiplication of input vector and model’s weight matrix, except for the bias term. So you could just continue to remember to add it, but then all the optimizations and theorems we have from the matrix world cannot be used. So to use dot product to represent the formula for logistic regression, we make the first term of the input vector always equal to “1”, and in the model, we have the bias in the first position.
Hopefully, this helps someone move on to the rest of the week’s material…