Some Questions

skgadalay · March 1, 2025, 9:36am

I have gone thru the course “How Transformer LLMs Work” by Jay & Maarten.

Can you pl answer below questions.

What is the actual supervised training data and what is the label information? The next word in the sequence is label?

Where is the actual use of self attention scores? How is it used?

How is the initial embeddings of tokens generated?

How is positional encodings generated and added to the word embeddings?

What is layer normalization?

I expect a quick and prompt reply from your end to my email id skgadalay@gmail.com.

Thanks in advance.

Topic		Replies	Views
✨ New course! Enroll in How Transformer LLMs Work News and Announcements short-course , llm	6	410	February 11, 2025
How magical is the Transformer NLP with Attention Models week-2	4	613	January 29, 2022
Few doubts regarding the pre-training and working of t5 transformers NLP with Attention Models week-3	2	332	November 9, 2023
Question on how Base LLMs are trained Generative AI with Large Language Models week-2	4	425	August 3, 2023
Help! I still don't understand how transformer works! Sequence Models	3	477	August 4, 2023