C3_W2_Lab_3_Quantization_and_Pruning Questions

A few questions on this lab:

  1. What is the recommended format to save TF2 model weights? I have seen H5 and HDF5.

  2. How does the get_gzipped_model_size method know how to properly compress the model when only the filename is passed in with no additional flags? It seems to know to how to properly compress the models by just passing in FILE_NON_QUANTIZED_H5 and FILE_PRUNED_MODEL_H5.

  3. Am I correct to say that we would always first prune the model and then quantize it afterward?

  4. Many times a data scientist will hand the model to a devops team or software engineer (or maybe even a machine learning engineer) who may perform the steps of quantization. Generally what is the best format to work with? This lab worked with H5 files, but in this scenario would it be better to use the SavedModel format, since the person performing the quantization would not need any code loading the model architecture but would have the architecture and weights encompassed together in this format?

  1. h5 uses hdf5 internally.
  2. Why do you care about how gzip works? Passing the correct flag to tensorflow will account for compression.
  3. Order of operations should depend on performance on validation set.
  4. The difference between saved model and h5 is addressed here.