Question about the dot (compare) between 128 features and 128 weights

Hey there, hope one of our masters can give some feedback. :pray:

From a math perspective, the following of calc CAM:

  class_activation_weights = gap_weights[:,prediction]
  class_activation_features = sp.ndimage.zoom(features_for_img, (28/3, 28/3, 1), order=2)
  cam_output  = np.dot(class_activation_features,class_activation_weights)

is very like that

takes class_activation_weights as a filter and do 1 X 1 size + 1 filter convolution on the class_activation_features:


  cam_output  = tf.nn.conv2d(np.asarray([class_activation_features]), 
                             np.reshape([class_activation_weights], 
                                        (1, 1, class_activation_features.shape[2], 1)),  
                             strides=[1, 1, 1, 1], 
                             padding='VALID')[0][:,:,0]

So I am not sure if this is just a coincidence or is there any theoretical background, maybe the idea of a different world, “though too much” :rofl:

Thank you

Hello Chris.X,

Interesting view… But, did you actually try out the new method and verify that the ‘cam_output’ obtained is the same as for the old method?

Thank you

Hey @RAJESH_CHERIAN_ROY ,

I did a mini case-study so far:

  cam_output_1  = np.dot(class_activation_features,class_activation_weights)
  print("cam_output_1 {}".format(cam_output_1.shape))

  cam_output_2  = tf.nn.conv2d(np.asarray([class_activation_features]), 
                            np.reshape([class_activation_weights], 
                                      (1, 1, class_activation_features.shape[2], 1)),  
                            strides=[1, 1, 1, 1], 
                            padding='VALID')[0][:,:,0]
  print("cam_output_2 {}".format(cam_output_2.shape))

  def compare_method(a, b):
    from numpy import isclose as isclose 

    isCloser = isclose(a, b, atol=0) 
    isCloser = tf.cast(isCloser, dtype = tf.int8)
    print("iscloser map", isCloser)

    a_sum = tf.reduce_sum(a)
    b_sum = tf.reduce_sum(b)
    a_eq_b = (a_sum == b_sum).numpy()
    print("a_sum == b_sum", a_eq_b)
  compare_method(cam_output_1, cam_output_2)

Output:

cam_output_1 (300, 300)
cam_output_2 (300, 300)
iscloser map tf.Tensor(
[[1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 ...
 [1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]], shape=(300, 300), dtype=int8)
a_sum == b_sum True

Thank you for your attention.

PS.
Another similar statement in Ungraded Lab: GradCAM, the original author even mentioned
# weight the convolution outputs with the computed gradients for the final gen of heatmap, although there is not the sum-up, instead of doing sum there is average calc (mean), but basically they are the same from my perspective.

Just my personal feelings of the explanation of CAM, either CAM or Grad CAM, is like to do 1 x 1 conv with the count of filer 1 (1 neuron contains weights of global avg pool or final gradient output).

Either sum-up or average for the convolution output is fully depending on different purposes.

Maybe my expression in English is a little bit subtle and bad, hopeful it is readable :sweat_smile: .

Hi @Chris.X ,

Your work is appreciated.

Thank you.