Hi I wanted to start a topic here as I accumulate knowledge on model architectures/ Deep Learning - mainly to clarify any confusion I come across the technical nuances of programming these modules/ applications.
To kick it off, I am midway through the PyTorch for Deep Learning certification on this platform.
Do all models have a pre-trained vocabulary that they map the input text onto and then convert to embeddings → which are then modified based on surrounding context?
Is this the case, with models like GPT-5, every interaction is mapped, interpreted ( infused with context), then the output is generated from the resulting contextual/ dynamic embeddings?
Have you taken either DLS or the NLP specialization before taking the PyTorch courses? I’ve just finished the first course of the PyTorch series and it’s great, but it seems to assume you’ve already taken other courses to learn about neural networks (what the various architectures are and what they are useful for).
In DLS, the network architectures used in LLMs and other language models are covered in Course 5 - Sequence Models. The NLP Specialization is all focussed on language processing models and the same material in DLS C5 is covered in a bit more detail in NLP C3 and C4 (Attention Models and Transformers).
But to answer your question, yes, word embeddings are fundamental to LLMs and you could consider them the “first step” in building an LLM. You can learn how embedding models work and how to train them in DLS C5.
No I have not, but was just browsing around and found those to be notable courses to explore simultaneously.
I do have peripheral knowledge of model architectures from YouTube learning so I was able to connect some dots from doing the PyTorch for DL. Appreciate your response.