W8A16 quantizer forward method question

JustKen · November 1, 2025, 1:06pm

Hi, I have the following questions: in the short video Custom Build an 8-bit Quantizer, the output of the linear layer is said to be: (input * weights*) * scale + bias.

Does this silently assume that we did symmetric quantization in the weights so the zero point is mapped to zero?
Does this silently assume that we quantized the whole weight matrix and we did not do per-row or per-group quantization?
What is the purpose of defining the weights in the __init__ method since, if I understand it correctly, their values are never used. Same goes for scales, they are calculated in the quantize method.

Any help would be appreciated.

Topic		Replies	Views
Formulas in "Quantizing Weights & Activations for Inference" do not match (video vs notebook) Quantization In Depth	1	40	March 5, 2025
Questions on quantizing both activation and weights for inference layers? Quantization In Depth	0	116	May 28, 2024
🌟 New Course! Enroll in Quantization In Depth News and Announcements short-course , dl-ai-learning-platform	4	188	May 8, 2024
Why quantisation using Quanto from HF only for Linear Layers? Quantization Fundamentals with Hugging Face ai-discussions	1	24	September 23, 2025
Run the initial model after quntization function (Lecture 4)) Efficiently Serving LLMs	0	98	April 4, 2024

W8A16 quantizer forward method question

Related topics