[C4W4] a question about the gram matrix

In the gram matrix description in the assignment (and also in the lecture), given vectors (v1, … vn), Gij= dot(vi, vj), and it represents the similarity of these two vectors. However, that doesn’t look correct to me.

For example, if v1= [1, 1, 0, 0, 0, 0]^T, v2= [1, 1, 1, 1, 1, 1]^T, and v3=[0, 1, 0, 0, 0, 0]^T. v1 and v2 are mostly different, but dot(v1, v2)=2; while v1 and v3 only differ in on entry, dot(v1, v3)=1, which is smaller than dot(v1, v2).

Usually similarity of 2 vectors can be measured by dot(vi, vj)/(norm(vi), norm(vj)). Is there an assumption that all the vectors are normalized as unit-length? But from the testcase for testing the gram matrix, the input A doesn’t seem to be unit-length in each column?

I wouldn’t draw any conclusions from the artificial test data they create. They frequently use test data that doesn’t fully meet the specs of the actual inputs (e.g. in the YOLO exercise in one of the test cases, they created probabilities as inputs using a Gaussian distribution instead of the uniform distribution), but it’s still a valid test of the logic.

You could check the actual inputs that are used. But my guess from a quick look at the notebook is that there is no reason to suppose that the inputs have columns normalized to be unit vectors. But it probably doesn’t matter, since they are only using cost to drive the back propagation. So it’s all relative: the gradients point downhill even if the cost values are not really comparable between columns and it’s all just statistical anyway. Mind you, I haven’t actually read any of the papers on NST, so these are just my guesses.

I think this is a good question to clarify what is the “similarity”.
As you pointed out, the norm of each vector is important to compare vectors. And, another important factor to define the “similarity” is the “direction” of a vector. If both vectors go to the same direction, then, those two are similar. And, as you know, the dot product is;

\textbf{v}_1\cdot\textbf{v}_2 =\parallel\textbf{v}_1\parallel\parallel\textbf{v}_2\parallel\cos\theta

This can be transformed

\frac{\textbf{v}_1\cdot\textbf{v}_2}{\parallel\textbf{v}_1\parallel\parallel\textbf{v}_2\parallel} =\cos\theta

I think this is exactly same as you suggested. In this sense, you are perfectly right.
In your case, if we consider the norm, then,

\cos\theta_{v1\cdot v2} = np.dot(v1,v2)/(np.linalg.norm(v1, ord=1)*(np.linalg.norm(v2, ord=1)))
= 0.16667
\cos\theta_{v1\cdot v3} = np.dot(v1,v3)/(np.linalg.norm(v1, ord=1)*(np.linalg.norm(v3, ord=1)))
= 0.5

while,

np.dot(v1,v2) = 2 , \ \ np.dot(v1,v3) = 1

As you see, if we do not normalize data, then, the result is different. Again, I agree with your math portion. And, the key point here is to calculate “direction” correctly.

In our exercise, a test program created samples from the same Gaussian distribution. So, it can be an apple-to-apple comparison. VGG19 has preprocess_input() to transform data, but, not sure it is used in this toy example for NST.