Newbie question: Andrew says “LLMs are built by using supervised learning to repeatedly predict the next word”. I understand that Supervised learning implies use of an UNlabelled dataset. So what evaluation function can be used when the model guesses a possible next word? (I’ve read elsewhere that GPT uses “unsupervised self-supervised” training, just to add to my confusion! The Self-supervised method makes sense to me.) Thank you for any clarification…
No, that’s not correct.
Supervised learning always uses labeled data.
Unsupervised learning uses unlabeled data.
Large Language Models are a strange beast, because they’re learning to predict letters based on a big collection of written works. So the training set labels are the letters themselves.
Thank you! Yes, I see I had a brain-freeze when I entered my question, as I knew supervised implied labelled data. But given that, I’m still confused on how the “probable next word” guess is evaluated, given that the input dataset is indeed UNlabelled (unless split into test subsets with masking, i.e. using “Self-Supervised” training, which is apparently considered “Unsupervised” training.)
It’s not unlabeled. Since this is a sequence model, we’re predicting the sequence of letters. So the data set itself provides the labels.
The “supervised” and “unsupervised” terminology doesn’t directly apply to this situation, because those date back to classic batch processing methods for making simple predictions.