Always confusion with the transpose

Matthias_Kleine · January 9, 2023, 7:28am

Hi,

in some layer with m units and n input values, we have a matrix w with the dimensions (m, n). For example, let us have 3 input values x and 4 units, then our matrix w for this layer has 4 rows and 3 columns.

Now when we multiply this matrix with the vector of the x values, we need to transpose the matrix w, so that the dimensions match with the x vector. (Actually, we are transposing the single elements w_i (for each unit i), which are vectors. But I think you know what I mean.)

My first problem is that it seems to be inconsistent how this is organised. IIRC, in the MLS it was just the other way round. Instead of transposing w, we transposed the x vector.

However, it would be more readable and understandable if we could avoid using the transpose operation at all. So why don’t we organise our w matrix right from the start the other way round? Do I oversee something?

Best regards
Matthias

rmwkwok · January 9, 2023, 7:51am

Hello @Matthias_Kleine,

DLS Notation check first. If you check out the Standard notations for Deep Learning.pdf downloadable in this post, you will find the definition for X and W are:

X \in \mathbb{R}^{n_x \times m} where n_x is the input size.
W^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}} where n^{[l]} is the number of units in layer l

So when we multiply them together, we only need W^{[1]}X without any transpose.

You can also see this in the video below

Raymond

Matthias_Kleine · January 9, 2023, 8:15am

Hm,

what probably confuses me is that usually when I use Pandas, the single “data points” are organized as rows, and the single features are the columns.

For example in the Titanic data set, each single person is a row, and the attributes like “sex”, “age” asf. are the columns.

But Andrew organises this just the other way round for the training samples:

grafik

Is there any special reason for this?

(The screenshot that you give above must be out of one of the future videos, which I didn’t view yet … could you add the video link?)

Best regards
Matthias

rmwkwok · January 9, 2023, 8:37am

Hello @Matthias_Kleine,

Here is the link. It’s in Course 1 Week 4.

Yes, it’s very common to have rows for samples and columns for features, so to adapt that kind of data to the DLS, my suggestion is to transpose your X once right before DLS-related code starts. Since I wasn’t in the discussion of deciding the notation, I can’t explain it. However, it is a valid notation and more importantly, it is the same and default notation in the DLS.

Cheers,
Raymond

PS: Sounds like you are already playing with some data, and have fun with that

paulinpaloalto · January 9, 2023, 4:56pm

As Raymond says, all these decisions are just that: decisions. You can make them in different ways and, of course, lots of consequences flow from those decisions. Prof Ng is the boss here, so he gets to make the decisions and we just have to pay attention and understand the way he is defining everything.

When you finish the classes and start to do things on your own, then you can make your own decisions. But note that you probably will also be using packages and frameworks like Pandas, TensorFlow, PyTorch etc, so you have to understand the definitons of their various APIs. When you get to dealing with images, then in addition to the “samples” dimension, you also have to deal the position of the “channels” dimension. TF e.g. supports either “channels first” or “channels last” mode. Not sure whether PyTorch also allows the flexibility. If you stick around through DLS C4 to learn about ConvNets (highly recommended!), you’ll see that there Prof Ng switches to using “samples first” and “channels last” orientation, whereas he chooses “samples last” here in C1 and C2 when we are dealing with Feed Forward Networks.

Topic		Replies	Views
C1_General Question_Dimensions of W_ from week 2_to_ week 4 Neural Networks and Deep Learning coursera-platform	3	512	October 28, 2022
Is the reason why we transpose a matrix, is such that we orientated for it to be dot product and produce our intended results? Neural Networks and Deep Learning coursera-platform	7	896	January 20, 2025
W Transpose: inconsistent definition/notation results in much confusion! Neural Networks and Deep Learning coursera-platform	3	727	April 27, 2024
Why transpose? Advanced Learning Algorithms week-module-1	2	56	November 13, 2024
Ambiguity regarding weight matrix in Graded Quiz - Week 3 Neural Networks and Deep Learning coursera-platform	4	544	November 9, 2023

Always confusion with the transpose

Related topics