I have a question about the structure of the neural network. As discussed in the lecture, the way of calculating the output of the neural network is like this:
In the image, each node has its own weight, as I understand, each next node is calculated by the multiplication of the weight of node i and all features x_1, x_2, … then sum all of them and add the bias. Do this for all other nodes, we can find an output vector. However, as I learn and read other sources before the way of calculating the NN is like the multiplication of matrix.
As the picture below, the weight should be the strength of each edge between two nodes and the output will be Y = dot(X, W)
For making sure that, I’m not asking a stupid question =))), I have also checked it with tensorflow and numpy as the code stated below:
The result calculated with tensorflow is the same with the result calculated with numpy.dot =))). Now I am really confused, what is the true way or do I miss something =(((. To my viewpoint, the way that Prof. Andrew said in the lecture is much better because we can know which node is important by its weight, I think
You said #1
each next node is calculated by the multiplication of the weight of node i and all features x_1, x_2, … then sum all of them
and then #2
calculating the NN is like the multiplication of matrix
Indeed they are the same operations but described in two different manners. i wrote down how we do matrix multiplication (the second way) and you can see that it leads back to the first way.
As for whether the matrix multiplication resides on the edge or in the node when you are explaining the story, it’s up to you, but no matter which one you choose, it doesn’t hurt the understanding of the maths behind that both way #1 and #2 are equivalent, right ?
P.S. I prefer the node.
Ohhhhh, I just realize that, thanks bro
I have a question about whether in the lab /notebooks/C2_W1_Lab02_CoffeeRoasting_TF.ipynb W1 represents W itself, or W.T. My reasoning is as follows:
Each individual X can be thought of as a column vector representing a point in 2d space. So it has shape (2, 1). We know that the first layer is going to map this to a 3d space because we have 3 units, so shape is (3, 1) for a1. The dot product w.x represents the linear transformation of x from the input space (two dimensional) to the output space (3 dimensional). So shouldn’t w have shape (3, 2), since (3, 2) dot (2, 1) produces (3, 1). When I write out the formulas by hand, the shape of the matrix W and column vector X just jump out at me. Can you please clarify how I should resolve the linear algebra thinking to the tensorflow representation? Thank you!
Each individual X can be thought of as a column vector representing a point in 2d space.
I think you agree that whether thinking it as a row or a column vector is a matter of choice, and if you change it to row-vector, then all your deductions followed will be different. Right? I agree every step of your deduction, except I have a disagreement with your initial “thought” that X is a column vector.
Let’s begin with a fact, in the lab we printed the shape of w in layer 1, which is (2, 3), accepting 2 features and outputing 3 features.
Then we work backward, to have a valid matrix multiplication, we can have X (of one sample) of the shape (1, 2) such that we can do XW or W^TX^T. And does our X uses different columns for different features? The answer is yes, because if you look at the lab again, we can see this:
So, now the story is completed. we represent one sample as a row matrix (or a row vector).
@rmwkwok the fact that you have to do XW instead of WX is, to me, indicative of a problem. From my understanding, W is the function or transformation being applied to X, not the other way around. I think we both agree that the overall matrix multiplication calculation that’s occurring is correct, I just need to resolve the linear algebra matrix multiplication representation to the tensorflow representation. When we write down the calculation via the formulas, it is clear that what is happening is WX + b, where W transforms points from the input space into the output space, and as such, is a bridge from the input space to the output space. Its rows represent the number of points in the output space, and its columns represent where the basis vectors in the input space end up in the output space. So we would expect W to have as many columns as there are dimensions in the input space, and have as many rows as there are dimensions in the output space. Somehow I get the feeling that tensorflow may be storing W.T in order to optimize the calculation but I have no basis vector in fact for this
Hi @pritamdodeja, so we do W^TX^T , a transformation W^T applied to X^T which has m column vectors. We can have either Z = (W^TX^T)^T or Z^T = W^TX^T.
Here, you have multiplied Xi with wi, not whole X(or each X element) with wi.
If this is the case, then in the lecture it should be discussed.
Here, I am discussing what I have understood using the same example that you have given.
Please, tell me if I am wrong anywhere!!