Week 1. I’m not seeing a ‘link’ between generating a ‘possible’ next word, and answering a prompt?
Should I see such a link (RNN to LLM?)
You mean something in between RNN and LLM? RNN is one type of NLP models, and LLMs are mostly based on transformer architecture.
Generating next word makes sense. From that to ‘answering a question’ seems a very big leap?
This is not a course to explain those architectures in detail, you have DLS and NLP specializations that do that.
Yes, it is a big leap, but the larger point is that LLMs (at least at the current SoTA) are not really “thinking” in the sense that humans do. An answer to a question is just a sequence of words and the way it chooses the next word is based on patterns that it has learned by “digesting” the training corpus of actual word sequences that the system designers gathered for that purpose. It is just doing a sophisticated version of pattern matching based on the training data that was fed to it. If the sequence starts with the question you asked, then what are the patterns of word sequences that would be likely to follow that?
Just to belabor the point a little more: there is no sense in which the LLM “understands” the meaning of the question. All it is doing is repeating patterns that it has learned from. That’s why the quality of the training set is so critical: if you just feed it junk, then the output will also be junk. Just a high powered version of the traditional GIGO axiom … E.g. if you accidentally scrape some websites that contain non-factual statements, then you may well get equivalent statements in the output of the LLM.
Even with a very carefully vetted training set, the LLM will still sometimes produce “hallucinations” or confabulations in the output. The people working on this have spent quite a bit of effort using techniques like Reinforcement Learning to damp down the frequency of hallucinations, but (at least as far as I’ve heard as of early February 2025) no-one really has a solution to that problem.
Thanks. Still a mystery to me, internal to the transformer setup I guess.
LLM’s are just a really big statistical text sequence predictor.
The transformer architecture lets them use a large sequence of words, so it implements something that imitates context-awareness.
The link between generating the next possible word and answering a prompt lies in how language models predict text. While traditional RNNs generate sequences based on past context, LLMs (like transformers) take this further by using attention mechanisms to weigh the relevance of all previous tokens efficiently.
In essence, answering a prompt is an extension of predicting the next word but guided by broader context, learned patterns, and fine-tuned training.