Week 1: Computational challenges of training LLMs

Generative AI with Large Language Models
Week 1: Computational challenges of training LLMs

Why are only the weights quantized and not the model parameters?

Also, would it make sense to standardize/normalize the model weights and parameters?
It could harmonize the use of all bits available and help us select the best among FP32/FP16/BF16/INT8.
But maybe what the models learn to do might go haywire unless, the vector embeddings also uses the same format to enable correct encoding/decoding.

Any thoughts on this?

Which parameters because if you mean hyper-parameters, those dont take much space anyway. If you mean the model itself, it does get quantisized when used for example in mobiles, or microprocessors, but of course it reduces model performance and accuracy!

it might be possible if the encodings are not quantised!

1 Like

I mean the vector embeddings (input and positional encodings).

Yes and I mean if those vector embeddings are not quantised like the rest of your model it will be a problem when running the model!