Run the initial model after quntization function (Lecture 4))

fornitroll · April 4, 2024, 12:58pm

After quantization the model (running the dequantize_model function) initial model does not make inference (RuntimeError: “LayerNormKernelImpl” not implemented for ‘Byte’). After the dequantization it works again (but the output of this initial model is same as for quantized / dequantized model). So I suppose that we change initial model during these steps, maybe we need to use a copy of a model if we will need to use it again for comparison (on load it again).
Could you explain this behavior?

Also is it reasonable to make a quantization not for all layers? For example 50% of layers are quantized, other left as is

Topic		Replies	Views
Why inference using dequantized model? Efficiently Serving LLMs	1	230	March 19, 2024
DLS 4, Week 2, Exercise 2. Can't get the functional layer dimensions right Convolutional Neural Networks	8	557	January 22, 2023
Week 1: Computational challenges of training LLMs Generative AI with Large Language Models large-language-model , llm	3	31	November 18, 2024
Questions on quantizing both activation and weights for inference layers? Quantization In Depth	0	98	May 28, 2024
DLS Course 1 week 4 assignment 2 two_layer model Neural Networks and Deep Learning	1	529	August 12, 2021

Run the initial model after quntization function (Lecture 4))

Related topics