Differnce between Token , Weight and Parameter in a LLM

Can someone elaborate on the difference between Token, Weight and parameter in a LLM ?

Tokens: An LLM receives an input as text. This text is passed by a tokenizer to convert the words into tokens. So tokens are, lets say, partial words. Every tokenizer is different, but in general, a token is a partial word.

Weights: LLMs, as any other ML model, is essentially a set of matrices, and the values on the cells of the matrices are called ‘weights’. These weights contain a statistical representation of the language in which the LLM was trained.

Parameters: The LLM can be steered to behave slightly different by means of some ‘parameters’. For example, temperature is one such parameter. When temperature is low, the model will be very ‘precise’ in its responses, while setting the temperature at a high value will create more random (creative) responses. In particular, the ‘temperature’ acts on the logits before softmax is applied.

Hope this sheds some light!

1 Like