I have a question.
The first week’s concept is about the use of logistic regression on labelled data, that is, positive and negative tweets. But what about the case where the tweets or product reviews aren’t labelled? What’s the best approach to doing that?
Will it be good to conclude that when it comes to NLP, we need labeled data to train the model?
Within artificial intelligence we have several methods of learning that a model can adopt.
In various cases, which is what happens in the classification of tweets, we have supervised learning. In this case, we really need to have the sample label to train.
However, there are several other ways to develop a model, such as unsupervised, self-supervised, semi-supervised learning, Reinforcement Learning, etc. Each of these cases has its particularities, advantages and disadvantages.
For example, if we didn’t have the labels for a tweet, we could try to cluster nearby tweets in order to determine the “subject” of that group of tweets.
Here some refs: