During the course, the teachers seem to avoid to quantize convolutional Layer, but why? According to my understanding of convolutional Layer, it should also be ok to quantize it.

I have not done this course, but I think because you don’t want to compromise the structure of the neural network. You want to quantize where its not critically damaging in doing so!

Hi! Thanks for your reply. I am still confused: Why will quantizing the convolutional kernels cause greater damage, compared to linear layer?

I would guess that a linear layer is straight forward and can be simplified easier than a more complex structure like a convolutional layer!

I asked Chatgpt, and it says it should be ok to quantize convolutional layer.

Also I found one documents talking about quantizing convolutional layer:

(gemmlowp/doc/quantization.md at master · google/gemmlowp · GitHub)

So I assume maybe this is just because it is too complicated to show it in the course?

Hmm, as long as the most important information is not lost in the process, , maybe its right!

Instructor meant from the perspective of not wanting to reduce the bit-width of the weights and activations which would eventually reduce the model size and computation cost, hence stating not suitable for convolution layer.

quantization for convolution layer could be linear or non-linear, here the instructor didn’t wanted probably because didn’t wanted to reduce the model size and yes linear quantisation can be done for convolution layer.

Regards

DP

Cool! Thanks a lot.

It may also be that the FC layers in comparison with conv layers have more number of parameters which if we quantize them can save more memory storage.