Question on how Base LLMs are trained

Raja_Sekhar · August 3, 2023, 11:56am

Question on Base LLM training , they use transformers based models for training.
They train with huge amount of internet and other unstructured datasets, when they train the data using transformers.

The input would be sentence[:x] and output would be sentence[x+1] the next token is that correct understanding ?
So each sentence would be trained repeatedly for each word taking previous words as input and context, Is my understanding correct?
Each combination of tokens would have a set of parameters tuned to get next word as per sentence. That is the reason there are so many parameters for each model Correct ?
Thanks for your time.

gent.spah · August 3, 2023, 12:13pm

Its not a conventional RNN you are dealing here, there many inputs and many predictions at the same time. It involves multiplying large matrices, you better check how transformers work and the simplest thing to do is searching in google.

Raja_Sekhar · August 3, 2023, 1:15pm

Thanks for your reply.
Sorry my bad, Can you point me to an article or a video , where it has more details on how base LLM 's are trained with the internet or other datasets ? i understand the finetunning part , but base LLMs are where i am not clear.

My understanding of transformers goes like this , Considering encoder - decoder transformer,
Input string is passed to encoder goes through embedding, position encoding, multi head attention where it looks at relation between each token to all other tokens, to see how far each token influences the other token, Similarly the decoder is provided with an output which is masked so that it incremental learns each word generation at a time. Later it applies multi head attention between Encoded and decoded values. Which again goes through feedforward, linear and softmax. I understand it is not as simple as i explained it here.

Can you please provide me more references so that i can understand Base LLM training and Transformers better ?

gent.spah · August 3, 2023, 2:26pm

Yes, you should look into the NLP specialization that deeplearning.ai offers, its explains transformers in detail, I think its course 4 of the specialization.

saifkhanengr · August 3, 2023, 2:58pm

Yes, its course 4…

Topic		Replies	Views
Training of NLP models NLP with Attention Models ai-discussions	4	35	September 25, 2024
RNNs predicting next word... ( Generative AI with Large Language Models ai-discussions , introductions	8	33	February 4, 2025
Training an LLM AI Discussions ai-discussions	3	300	April 24, 2024
✨ New course! Enroll in How Transformer LLMs Work News and Announcements short-course , llm , dl-ai-learning-platform	6	446	February 11, 2025
Couple Questions From Week 1 Generative AI with Large Language Models week-module-1	1	386	October 2, 2023

Question on how Base LLMs are trained

Related topics