Questions on quantizing both activation and weights for inference layers?

Rishan_Tan · May 28, 2024, 6:45pm

Assume that hardware provides int8, We can quantize both activations and weights. so we put everything in int8. if there are more than one FC/linear layers, Should we re-quantize from one layer to another directly without repeatedly doing quantized/de-quantized activation between layers? How will this compare with doing weights quantized alone from the perspective of precision?

Topic		Replies	Views
Why inference using dequantized model? Efficiently Serving LLMs	1	245	March 19, 2024
Convolution Confusion (ResNets) C4W2 Convolutional Neural Networks week-module-2 , coursera-platform	11	179	April 25, 2024
Run the initial model after quntization function (Lecture 4)) Efficiently Serving LLMs	0	91	April 4, 2024
Formulas in "Quantizing Weights & Activations for Inference" do not match (video vs notebook) Quantization In Depth	1	30	March 5, 2025
Why does linear quantization not suitable for Convolutional Layer？ Quantization In Depth	8	140	May 28, 2024

Questions on quantizing both activation and weights for inference layers?

Related topics