Question on how Base LLMs are trained

Question on Base LLM training , they use transformers based models for training.
They train with huge amount of internet and other unstructured datasets, when they train the data using transformers.

  1. The input would be sentence[:x] and output would be sentence[x+1] the next token is that correct understanding ?
  2. So each sentence would be trained repeatedly for each word taking previous words as input and context, Is my understanding correct?
  3. Each combination of tokens would have a set of parameters tuned to get next word as per sentence. That is the reason there are so many parameters for each model Correct ?
    Thanks for your time.

Its not a conventional RNN you are dealing here, there many inputs and many predictions at the same time. It involves multiplying large matrices, you better check how transformers work and the simplest thing to do is searching in google.

Thanks for your reply.
Sorry my bad, Can you point me to an article or a video , where it has more details on how base LLM 's are trained with the internet or other datasets ? i understand the finetunning part , but base LLMs are where i am not clear.

My understanding of transformers goes like this , Considering encoder - decoder transformer,
Input string is passed to encoder goes through embedding, position encoding, multi head attention where it looks at relation between each token to all other tokens, to see how far each token influences the other token, Similarly the decoder is provided with an output which is masked so that it incremental learns each word generation at a time. Later it applies multi head attention between Encoded and decoded values. Which again goes through feedforward, linear and softmax. I understand it is not as simple as i explained it here.

Can you please provide me more references so that i can understand Base LLM training and Transformers better ?

Yes, you should look into the NLP specialization that offers, its explains transformers in detail, I think its course 4 of the specialization.

Yes, its course 4…

1 Like