W8A16 quantizer forward method question

Hi, I have the following questions: in the short video Custom Build an 8-bit Quantizer, the output of the linear layer is said to be: (input * weights*) * scale + bias.

  1. Does this silently assume that we did symmetric quantization in the weights so the zero point is mapped to zero?
  2. Does this silently assume that we quantized the whole weight matrix and we did not do per-row or per-group quantization?
  3. What is the purpose of defining the weights in the __init__ method since, if I understand it correctly, their values are never used. Same goes for scales, they are calculated in the quantize method.

Any help would be appreciated.