Distributed Training- Ways to Reduce Model Size

  1. How does reducing the input image size, reduce the size of model? Is it because the #FLOPS are reduced?

  2. Changing hyperparameters, especially reducing #layers, depth(#kernels), makes sense, as you would reduce the total number of computations or FLOPS, but how to find the sweet spot, as reducing to much or too little can harm the performance of the model, right?

Hi @bagyaboy,

  1. Reducing the input image size does not reduce the size of the model. In general, most of the neural network parameters are independent of input size when designing.
  2. Performance vs latency is always an issue in system design. You can automate this engineering issue by coming up with an optimized function based on your resources and model performance drop. Neural architecture search is an active area of research. This paper can be your starting point.

Best Regards,
A. Sriharsha

@sriharsha0806,

  1. If reducing the input image size, does not help in resizing the model, then I am confused as why the instructor mentions-

Or if your model is taking image as an input,try to reduce the image resolution to reduce the model input.After trying these various experimentation, go back and
check if the final model fits on a single node’s memory


I was under the impression, reducing the resolution of the i/p image will help to reduce the the number of computations that the receptive field will have to perform, thus reducing the model size. Is my understanding wrong

  1. Thanks! I surely will

Hi @bagyaboy,

Can you let me know the lecture name, I would like to cross check?

It depends on tasks too. If the task is image classification, the number of parameters in a convolution layer is (KhXKwxno. of filters in previous layerxno.of filters in the current layer, Where K is kernel).
The parameters in the last linear layers is W_inXW_out.

Let’s take an example.

If you decrease the input size by half i.e., 16X16.

SL.NO. Activation Shape Activtation Size # Parameters
1. Input Layer (16,16,3) 768 0
2. CONV1(f=5, s=1) (12,12,8) 1,152 608
3. POOL1 (6,6,8) 288 0
4. CONV2(f=5, s=1) (2,2,16) 64 3216
5. POOL2 (1,1,16) 16 0
6. FC3 (120,1) 120 2040
7. FC4 (84,1) 84 10164
8. Softmax (10,1) 10 850

Note: In practice the seventh step never happens, I used it only for calculation purpose. The activation size in general for previous layer is larger than current layer.

Most of your parameters remain except fc3, as parameters of a linear layer depends on your previous layer Activation shape.

In semantic segmentation, All your layers are Fully connected so your parameters are same irrespective of your image size.

“”"
Or if your model is taking image as an input,try to reduce the image resolution to reduce the model input.After trying these various experimentation, go back and
check if the final model fits on a single node’s memory
“”"

Whenever you are running a model, you have to use GPU memory for not only parameters but also Activation sizes. As you reduce the image size, you are decreasing parameters and Activation sizes to fit the model on the GPU memory.

Debugging profilers are available in pytorch, tensorflow. You can cross verify the explanation by importing Imagenet Models.

Best Regards,
A. Sriharsha

2 Likes

@sriharsha0806 I think you are right! The model memory in bytes does not correspond to the size of input image, but the # of kernels and their dimensions. BTW, the lecture is Distributed Training Strategies in week 1

1 Like