When creating a post, please add:
- Week #1
- Link to the classroom item you are referring to:
- Description (why exactly do we add bias, I done some research and received several answers so I wanted to make sure what’s its use in the context of such a simple logistic regression model)
We use bias because, just like the equation of a straight line, you don’t want to be constrained to lines that only pass through the origin.
The bias is the same as ‘b’ in the equation of a straight line “y = m*x + b”
What Logistic Regression does is learn a hyperplane in the input space that does the most accurate job of dividing the “yes” answers from the “no” answers. It is the higher dimensional equivalent of the familiar equation for a line in the 2D plane:
y = mx + b
Think about what happens if you force b = 0 in that equation: it means you can only represent lines that go through the origin, right? It’s the same in higher dimensional space: eliminating the bias term would mean that you only accept dividing planes which contain the origin. But there is no reason that your data should be forced to conform to that limitation, right? It is what mathematicians would call “a significant loss of generality”. We are designing a general purpose algorithm and we want it to work in any many cases as possible, so it would be a mistake to put artificial limitations on the solutions that can be represented (learned).
Thank you for your response. I have an additional question. In the Week 1 case of the course, the threshold for the sigmoid function was set at 0.5, which is used for binary classification. Is it possible to create multiple threshold intervals, such as one for values lower than 0.2 and another for values greater than 0.2 but lower than 0.4? Could this approach be used to enable multiclass classification?
If you want to do a multiclass classification, then the standard way to do that is to use softmax as the output activation. With softmax, the number of neurons in the output layer will equal the number of possible classes and then softmax will convert those values to a probability distribution showing the likelihood of each class being the correct classification for that particular sample.
Softmax is related to sigmoid mathematically. You can consider it to be the generalization of sigmoid to more than two classes (yes/no). Softmax is covered in various courses here, e.g. in DLS Course 2 Week 3.
well then am I supposed to take that course too, like is the part explaining everything for logistic regression essential too for a data scientist?
It depends on what your goals are. You asked about how to do a multiclass classification and I explained that you need to know how softmax works in order to do that. You don’t have to take a whole course to learn that, though. Here’s a lecture on YouTube from Geoff Hinton that explains softmax.
Logistic Regression is just one possible solution for binary classifiers and it doesn’t work as well as real Neural Networks, because it can only express a linear decision boundary as we were discussing in the earlier part of this thread. Neural Networks can learn extremely complex non-linear decision boundaries.
But if your goal is to learn how to do NLP, Logistic Regression isn’t really that necessary these days. This first course of NLP is just showing you some classic techniques as background and to give you a framework for understanding what comes later. The real “action” is in courses 3 and 4 where they cover Sequence Models and Attention Models, which are both powerful forms of Neural Networks.