Can someone elaborate on the difference between Token, Weight and parameter in a LLM ?
Tokens: An LLM receives an input as text. This text is passed by a tokenizer to convert the words into tokens. So tokens are, lets say, partial words. Every tokenizer is different, but in general, a token is a partial word.
Weights: LLMs, as any other ML model, is essentially a set of matrices, and the values on the cells of the matrices are called ‘weights’. These weights contain a statistical representation of the language in which the LLM was trained.
Parameters: The LLM can be steered to behave slightly different by means of some ‘parameters’. For example, temperature is one such parameter. When temperature is low, the model will be very ‘precise’ in its responses, while setting the temperature at a high value will create more random (creative) responses. In particular, the ‘temperature’ acts on the logits before softmax is applied.
Hope this sheds some light!
Parameters depend on the model structure. Parameters include the weights & biases, activation functions, and the learning rate.
The text/images sent as inputs to the model are broken down into pre-defined smaller units called tokens through tokenisation. Tokenisation improves the model’s accuracy and precision by allowing the model to learn the inherent patterns and relationships.