Decoding Concept Activation Vector

This is how I understand the flow of using CAV-s:

  1. Take a set of images commonly understood to represent a concept e.g. CEO or Model Woman. Say 10 images.
  2. Get some intermediate layer output of the trained image classification model based on training data. These are like embedding of each concept images.
  3. Use average of the concept image embedddings as Concept Vector of the concept CEO etc.
  4. Get the same embedding for each of the test images and compute cosine similarity with Concept Vector of step 3 and sort.
  5. The top k and bottom k test images should make sense for the concept.
    Is this understanding correct?