If you review the videos of week 4 again., you will find instructor explaining quantized model are optimized to smaller size (typically from 32-bit floating points (FP32) to lower precision formats like 8-bit integers, reducing overall weights and activation, while doing this some layers in deep learning network are sensitive to precision.loss running on lower memory usage resulting in overall reduction in accuracy.
Usually quantized model are optimised with quantization aware training where the model keep in.mind about the quantization error, thereby running a larger model on resource constrained device like running a model on a model device with less memory capabilities or reduce hardware dependencies (CPU/GPU memory).
The benefits are having faster interference speed with almost matching accuracy with the original model but a bit lesser.
no,I just finish all task . one step is āhelper_utils.evaluate_qat(final_quantized_qat_model, testloader)ā, I check the code in helper_utils.evaluate_qat which can measure Accuracy of model, so I use it to check other models .
in the same lab before every model is being train, their is a cell which provides description about the model, did you go through those cells?
the lab you are mentioning is a upgraded lab, so there is no task to be done here, only to run the cell, instead of looking at the helper utils files, kindly go through the labs codes again thoroughly.
you will know the difference between each model variation implementation and why accuracy might have been reduced in QAT compared to overall other model variation, the answer which provided previously still holds the same reason for even static and dynamic compare QAT, but I want to know do you understand difference between the dynamic and static approach of quantized model?
in this lab,we have a base model. dynamic ļ¼static and QAT all base on it. quantized_dynamic_model works on {nn.Linear},but quantized_sttistic_model working on {nn.Linear, nn.Conv2d},and QAT need extra training. this is my understanding .
I finish every cell of this lab, and I just add some cell to run the accuracy of quantized_dynamic_model and quantized_sttistic_model . Iām just curious about base on the same model struture, every quantized model ās accuracy. I always thought QAT should be lossless.
remember accuracy is not loss, when loss reaches close to zero, usually the accuracy improves based on datasets, data distribution, features, model architecture and optimization techniques applied.
the only place a model reaches loss of 0 and yet shows poor accuracy, is indicative of lack of dataset, or features distribution issue, or model architecture issue, even optimization issue.
accuracy and loss are dependent of each other and also dependent on many other factors. like here in quantized modelling when a model with reduced image resolution can result in less accuracy than compared to the original model accuracy.
But the idea behind using quantised model is more about being able to run a model in resource available like imagine, you need to retrain a model you created with addition of newer data, but you have network speed issue, or held up computation challenge, thatās when quantized model can be run down in mobile device to check if the new added data is useful to original model with the available memory usage and hardware capabilities.
Give me sometime I will get back to you with why each model variation why might have variation in accuracy and why QAT accuracy less than compared to static or dynamic, although you already have found some part of variation between those models
Actually this topic when I listened even for me it was a bit confusing especially when not being able to implement this method practically in labs with learnerās ability to try different approach and see how accuracy and loss changes. But I understood the course is more focusing on use of pytorch based modules to be implemented in these model approach as earlier one had to separately create a step, then define a function, create a different dataset model variation, pytorch module, helps us to implement many of the function and implementation directly in the model creation.
When Quantization-Aware Training (QAT) produces a lower accuracy (85%) than both post-training dynamic (88%) and static quantization (87%) on a CNN, it indicates that the QAT process has not successfully rounded the model weights to the INT8 constraints.
The reason might be that the finetuning step instead of recovering accuracy, the simulated quantization noise during training acted as an optimizer, forcing the model into a less optimal local minimum than simple calliberation.
Ineffective QAT implementation could be due to
The QAT process might be overfitting to the āfake quantizationā noise rather than learning to accommodate it.
The QAT stage might need more epochs to converge or it might be starting from a suboptimal base checkpoint.
The learning rate used for QAT being too low to allow proper adaption.
the dynamic and static quantization more targets on the model architecture layers and data type, where as QAT uses fine tuning method on the pre-trained model weight before using the QAT model to train, and as the earlier step implemented more on quantization techniques on linear layer (in dynamic) which is considered primary bottleneck in model train learning, not much difference was noticed between the baseline model and dynamic model, where as static had more architecture changes with fusion layers (conv2d, batch normalisation, linear, relu) one noticed a 87% accuracy.
QAT relies on simulating INT8 operations (āfakeā quantization). If the specific quantization simulation in framework (such as here PyTorch/TensorFlow) does not perfectly match how the final quantized model is executed on hardware (Real Quantization), QAT can result in worse performance.
Remember all the models where trained with much less number of epochs, so probably a model with more optimization with hyperparameters required more number of epoch training to round off the noice stimulated in the fine tuning step.
Post-Training Quantization (PTQ) is superior to QAT. The added computational cost and time of QAT are not providing the intended benefits.
Usual debug steps would be to work out around the static quantized model (as its model accuracy and architecture changes is much closer to baseline model accuracy) with freezing layer approach where earlier layer are freezed and deeper layers and output layers are used in model training, checking when the training experiences vanishing or exploding gradient are some of the approach in such case scenario.