Generative AI with Large Language Models
Week 1: Computational challenges of training LLMs
Why are only the weights quantized and not the model parameters?
Also, would it make sense to standardize/normalize the model weights and parameters?
It could harmonize the use of all bits available and help us select the best among FP32/FP16/BF16/INT8.
But maybe what the models learn to do might go haywire unless, the vector embeddings also uses the same format to enable correct encoding/decoding.
Any thoughts on this?