[C4W4] a question about the gram matrix

Jack_Changfan · February 11, 2022, 6:51am

In the gram matrix description in the assignment (and also in the lecture), given vectors (v1, … vn), Gij= dot(vi, vj), and it represents the similarity of these two vectors. However, that doesn’t look correct to me.

For example, if v1= [1, 1, 0, 0, 0, 0]^T, v2= [1, 1, 1, 1, 1, 1]^T, and v3=[0, 1, 0, 0, 0, 0]^T. v1 and v2 are mostly different, but dot(v1, v2)=2; while v1 and v3 only differ in on entry, dot(v1, v3)=1, which is smaller than dot(v1, v2).

Usually similarity of 2 vectors can be measured by dot(vi, vj)/(norm(vi), norm(vj)). Is there an assumption that all the vectors are normalized as unit-length? But from the testcase for testing the gram matrix, the input A doesn’t seem to be unit-length in each column?

paulinpaloalto · May 28, 2022, 8:39pm

I wouldn’t draw any conclusions from the artificial test data they create. They frequently use test data that doesn’t fully meet the specs of the actual inputs (e.g. in the YOLO exercise in one of the test cases, they created probabilities as inputs using a Gaussian distribution instead of the uniform distribution), but it’s still a valid test of the logic.

You could check the actual inputs that are used. But my guess from a quick look at the notebook is that there is no reason to suppose that the inputs have columns normalized to be unit vectors. But it probably doesn’t matter, since they are only using cost to drive the back propagation. So it’s all relative: the gradients point downhill even if the cost values are not really comparable between columns and it’s all just statistical anyway. Mind you, I haven’t actually read any of the papers on NST, so these are just my guesses.

anon57530071 · May 29, 2022, 3:34am

I think this is a good question to clarify what is the “similarity”.
As you pointed out, the norm of each vector is important to compare vectors. And, another important factor to define the “similarity” is the “direction” of a vector. If both vectors go to the same direction, then, those two are similar. And, as you know, the dot product is;

\textbf{v}_1\cdot\textbf{v}_2 =\parallel\textbf{v}_1\parallel\parallel\textbf{v}_2\parallel\cos\theta

This can be transformed

\frac{\textbf{v}_1\cdot\textbf{v}_2}{\parallel\textbf{v}_1\parallel\parallel\textbf{v}_2\parallel} =\cos\theta

I think this is exactly same as you suggested. In this sense, you are perfectly right.
In your case, if we consider the norm, then,

\cos\theta_{v1\cdot v2} = np.dot(v1,v2)/(np.linalg.norm(v1, ord=1)*(np.linalg.norm(v2, ord=1)))
= 0.16667
\cos\theta_{v1\cdot v3} = np.dot(v1,v3)/(np.linalg.norm(v1, ord=1)*(np.linalg.norm(v3, ord=1)))
= 0.5

while,

np.dot(v1,v2) = 2 , \ \ np.dot(v1,v3) = 1

As you see, if we do not normalize data, then, the result is different. Again, I agree with your math portion. And, the key point here is to calculate “direction” correctly.

In our exercise, a test program created samples from the same Gaussian distribution. So, it can be an apple-to-apple comparison. VGG19 has preprocess_input() to transform data, but, not sure it is used in this toy example for NST.

Topic		Replies	Views
A doubt on Gram matrix Convolutional Neural Networks coursera-platform	3	337	November 3, 2023
Ex. 6, normalize_rows question Neural Networks and Deep Learning coursera-platform	1	597	June 27, 2021
Why is simple matmul of embedding vectors describes theirs similarity? Embedding Models: From Architecture to Implementat	36	466	August 13, 2024
Why is the distance between vectors for the triplet loss calculated as such? Convolutional Neural Networks coursera-platform	5	632	January 12, 2022
Course 4 Week 4 Art Generation Ex 3: My value of cost is incorrect Convolutional Neural Networks coursera-platform	5	850	July 28, 2021

[C4W4] a question about the gram matrix

Related topics