C3_M4_Lab_4_quantization about four model Accuracy

zone · April 6, 2026, 9:35am

helper_utils.evaluate_qat(baseline_model,testloader) Accuracy of the baseline model on the test set: 88.20%

helper_utils.evaluate_qat(quantized_dynamic_model,testloader) Accuracy : 88.18%

helper_utils.evaluate_qat(quantized_static_model,testloader) Accuracy : 87.81%

helper_utils.evaluate_qat(final_quantized_qat_model, testloader) Accuracy of the QAT model on the test set: 86.51%

why Accuracy of QAT is the lowest ?

Deepti_Prasad · April 6, 2026, 11:28am

If you review the videos of week 4 again., you will find instructor explaining quantized model are optimized to smaller size (typically from 32-bit floating points (FP32) to lower precision formats like 8-bit integers, reducing overall weights and activation, while doing this some layers in deep learning network are sensitive to precision.loss running on lower memory usage resulting in overall reduction in accuracy.

Usually quantized model are optimised with quantization aware training where the model keep in.mind about the quantization error, thereby running a larger model on resource constrained device like running a model on a model device with less memory capabilities or reduce hardware dependencies (CPU/GPU memory).

The benefits are having faster interference speed with almost matching accuracy with the original model but a bit lesser.

Regards

Dr. Deepti

zone · April 6, 2026, 1:40pm

Thank you for your timely response. I’m just curious, why is the QAT model performing worse than the quantized_static_model?

Deepti_Prasad · April 6, 2026, 6:31pm

hi @zone

did you check all the 3 model architecture??

dynamic, static and QAT model??

zone · April 7, 2026, 2:29am

no,I just finish all task . one step is “helper_utils.evaluate_qat(final_quantized_qat_model, testloader)“, I check the code in helper_utils.evaluate_qat which can measure Accuracy of model, so I use it to check other models .

Deepti_Prasad · April 7, 2026, 9:23am

which other models?

in the same lab before every model is being train, their is a cell which provides description about the model, did you go through those cells?

the lab you are mentioning is a upgraded lab, so there is no task to be done here, only to run the cell, instead of looking at the helper utils files, kindly go through the labs codes again thoroughly.

you will know the difference between each model variation implementation and why accuracy might have been reduced in QAT compared to overall other model variation, the answer which provided previously still holds the same reason for even static and dynamic compare QAT, but I want to know do you understand difference between the dynamic and static approach of quantized model?

zone · April 7, 2026, 12:16pm

in this lab,we have a base model. dynamic ，static and QAT all base on it. quantized_dynamic_model works on {nn.Linear},but quantized_sttistic_model working on {nn.Linear, nn.Conv2d},and QAT need extra training. this is my understanding .

I finish every cell of this lab, and I just add some cell to run the accuracy of quantized_dynamic_model and quantized_sttistic_model . I’m just curious about base on the same model struture, every quantized model ‘s accuracy. I always thought QAT should be lossless.

zone · April 7, 2026, 12:21pm

this is the result about my adding to the lab

Deepti_Prasad · April 7, 2026, 12:58pm

What do you mean by lossless?

remember accuracy is not loss, when loss reaches close to zero, usually the accuracy improves based on datasets, data distribution, features, model architecture and optimization techniques applied.

the only place a model reaches loss of 0 and yet shows poor accuracy, is indicative of lack of dataset, or features distribution issue, or model architecture issue, even optimization issue.

accuracy and loss are dependent of each other and also dependent on many other factors. like here in quantized modelling when a model with reduced image resolution can result in less accuracy than compared to the original model accuracy.

But the idea behind using quantised model is more about being able to run a model in resource available like imagine, you need to retrain a model you created with addition of newer data, but you have network speed issue, or held up computation challenge, that’s when quantized model can be run down in mobile device to check if the new added data is useful to original model with the available memory usage and hardware capabilities.

Give me sometime I will get back to you with why each model variation why might have variation in accuracy and why QAT accuracy less than compared to static or dynamic, although you already have found some part of variation between those models

Actually this topic when I listened even for me it was a bit confusing especially when not being able to implement this method practically in labs with learner’s ability to try different approach and see how accuracy and loss changes. But I understood the course is more focusing on use of pytorch based modules to be implemented in these model approach as earlier one had to separately create a step, then define a function, create a different dataset model variation, pytorch module, helps us to implement many of the function and implementation directly in the model creation.

Deepti_Prasad · April 7, 2026, 9:22pm

hi @zone

Here is my inference based on the lab

When Quantization-Aware Training (QAT) produces a lower accuracy (85%) than both post-training dynamic (88%) and static quantization (87%) on a CNN, it indicates that the QAT process has not successfully rounded the model weights to the INT8 constraints.

The reason might be that the finetuning step instead of recovering accuracy, the simulated quantization noise during training acted as an optimizer, forcing the model into a less optimal local minimum than simple calliberation.

Ineffective QAT implementation could be due to

The QAT process might be overfitting to the “fake quantization” noise rather than learning to accommodate it.
The QAT stage might need more epochs to converge or it might be starting from a suboptimal base checkpoint.
The learning rate used for QAT being too low to allow proper adaption.
the dynamic and static quantization more targets on the model architecture layers and data type, where as QAT uses fine tuning method on the pre-trained model weight before using the QAT model to train, and as the earlier step implemented more on quantization techniques on linear layer (in dynamic) which is considered primary bottleneck in model train learning, not much difference was noticed between the baseline model and dynamic model, where as static had more architecture changes with fusion layers (conv2d, batch normalisation, linear, relu) one noticed a 87% accuracy.
QAT relies on simulating INT8 operations (“fake” quantization). If the specific quantization simulation in framework (such as here PyTorch/TensorFlow) does not perfectly match how the final quantized model is executed on hardware (Real Quantization), QAT can result in worse performance.

Remember all the models where trained with much less number of epochs, so probably a model with more optimization with hyperparameters required more number of epoch training to round off the noice stimulated in the fine tuning step.

Post-Training Quantization (PTQ) is superior to QAT. The added computational cost and time of QAT are not providing the intended benefits.

Usual debug steps would be to work out around the static quantized model (as its model accuracy and architecture changes is much closer to baseline model accuracy) with freezing layer approach where earlier layer are freezed and deeper layers and output layers are used in model training, checking when the training experiences vanishing or exploding gradient are some of the approach in such case scenario.

Here is what lab mentions when to use each model

Feel free to ask if you still have any doubt.

Regards
Dr.Deepti

zone · April 8, 2026, 11:51am

Deepti_Prasad:

Here is my inference based on the lab, when a

When Quantization-Aware Training (QAT) produces a lower accuracy (85%) than both post-training dynamic (88%) and static quantization (87%) on a CNN, it indicates that the QAT process has not successfully the model weights to the INT8 constraints.

The reason might be that the finetuning step instead of recovering accuracy, the simulated quantization noise during training acted as an optimizer, forcing the model into a less optimal local minimum than simple calliberation.

Ineffective QAT implementation could be due to

The QAT process might be overfitting to the “fake quantization” noise rather than learning to accommodate it.

The QAT stage might need more epochs to converge or it might be starting from a suboptimal base checkpoint.

The learning rate used for QAT being too low to allow proper adaption.

the dynamic and static quantization more targets on the model architecture layers and data type, where as QAT uses fine tuning method on the pre-trained model weight before using the QAT model to train, and as the earlier step implemented more on quantization techniques on linear layer (in dynamic) which is considered primary bottleneck in model train learning, not much difference was noticed between the baseline model and dynamic model, where as static had more architecture changes with fusion layers (conv2d, batch normalisation, linear, relu) one noticed a 87% accuracy.

QAT relies on simulating INT8 operations (“fake” quantization). If the specific quantization simulation in framework (such as here PyTorch/TensorFlow) does not perfectly match how the final quantized model is executed on hardware (Real Quantization), QAT can result in worse performance.

Remember all the models where trained with much less number of epochs, so probably a model with more optimization with hyperparameters required more number of epoch training to round off the noice stimulated in the fine tuning step.

Post-Training Quantization (PTQ) is superior to QAT. The added computational cost and time of QAT are not providing the intended benefits.

Usual debug steps would be to work out around the static quantized model (as its model accuracy and architecture changes is much closer to baseline model accuracy) with freezing layer approach where earlier layer are freezed and deeper layers and output layers are used in model training, checking when the training experiences vanishing or exploding gradient are some of the approach in such case scenario.

Thank you for your patient explanation. I need to continue learning and practicing.

Topic		Replies	Views
C3M4 Assignment Exercise 4 Test Failure PyTorch: Advanced Architectures and Deployment week-module-4 , dl-ai-learning-platform	2	70	December 13, 2025
Run the initial model after quntization function (Lecture 4)) Efficiently Serving LLMs	0	102	April 4, 2024
Quantization on quanto course Quantization Fundamentals with Hugging Face	4	411	April 25, 2024
Could not load dynamic library 'libnvinfer.so. Allocation of 376320000 exceeds 10% of system memory. Tests failed on 1 cell(s)! These tests could be hidden6' Custom Models, Layers and Loss Functions with TF week-module-3	10	777	August 24, 2021
[Week 2] [Transfer learning] Is fine-tuning really working? Convolutional Neural Networks coursera-platform	11	1299	January 3, 2022

C3_M4_Lab_4_quantization about four model Accuracy

Related topics